Data Analytics Tutorial for Beginners

Basic Concept of Data Analytics : Part-2

Loading a simple delimited data file from the source : Click here

  1. How to get the first row value :
#Python counts from 0


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("C:\\Chegg python\\100 Sales Records.csv")


print(df.loc[0])       

Output :

2. How to get the 100th row value :

#Python counts from 0 that's why we write 99 instead of 100

print(df.loc[99])

3. How to get the last row :

print(df.tail(n=1))

Output :

Sub-setting multiple rows

4. Select the First, 10th, and 100th rows

print(df.loc[[0,9,99]])

Output :

5. How to get 100th row through iloc :

print(df.iloc[99])

Output :

6. Using -1 through iloc get the last row :

print(df.iloc[-1])

Output :

With iloc, we can pass in the -1 to get last row – something we couldn’t do with loc.

7. Select the First, 10th, and 100th rows

print(df.iloc[[0,9,99]])

Output :

Sub-setting columns

  • The Python slicing syntax use a colon, :
  • If we have just a colon, the attribute refers to everything.
  • So, if we just want to get the first column using the loc or iloc syntax, we can write something
  • like df. loc[: , [columns]] to subset the columns(s)

8. Select column with loc

#note the position of the colon :
#it is used to select all rows

subset = df.loc[:,['Country',  'Sales Channel']]
print(subset.head())

Output :

9. Subset columns with iloc

#iloc will allows us to use integers
# -1 will select last column


subset = df.iloc[:,[2,4,-1]]
print(subset.head())

Output :

Sub-setting Columns by Range

10. Create a range of integers from 0 to 5 inclusive

small_range = list(range(5))
print(small_range)

Output :

[0, 1, 2, 3, 4]

Sub-setting Rows and Columns

11. Using loc

print(df.loc[42,'Country'])

Output :

The Gambia

12. Using iloc

#42th row and 1st column value

print(df.iloc[42,1])

Output :

The Gambia

Sub-setting Rows and Columns

13. Get the First, 10th, and 100th rows

from the 1st, 4th, and 6th column

print(df.iloc[[0,9,99], [0,3,5]])

Output :

14. If we use the column names directly, it makes code a bit easier to read

#note now we have to use loc, instead of iloc

print(df.loc[[0,9,99], ['Region','Sales Channel','Order Date']])

Output :

Grouped Means

  • For each Order ID in our data, what was the average Unit Cost?
  • To answer this question,
  • We split our data into parts by order ID
  • Then we get the “Unit Cost” column and calculate the mean

#print first five row of given column with group by and mean function

print(df.head(n=10).groupby('Order ID')['Unit Cost'].mean())

Output :

Grouped Frequency Counts

  • Use the nunique to get counts of unique values on a Pandas Series

print(df.head(n=10).groupby('Country').nunique())

Output :

Basic Plot

Country_total_profit =df.head(n=10).groupby('Country')['Total Profit'].mean()
print(Country_total_profit)

Country_total_profit.plot()

Output :

Visual Representation of data

  • Histogram
  • Frequency Polygon
  • Ogive
  • Pie-chart
  • Steam & leaf plot
  • Pareto chart
  • Scatter plot

For Visual Representation of dataVisit here

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top