Contents
Basic Concept of Data Analytics : Part-2
Loading a simple delimited data file from the source : Click here
- How to get the first row value :
#Python counts from 0
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("C:\\Chegg python\\100 Sales Records.csv")
print(df.loc[0])
Output :

2. How to get the 100th row value :
#Python counts from 0 that's why we write 99 instead of 100
print(df.loc[99])

3. How to get the last row :
print(df.tail(n=1))
Output :

Sub-setting multiple rows
4. Select the First, 10th, and 100th rows
print(df.loc[[0,9,99]])
Output :

5. How to get 100th row through iloc :
print(df.iloc[99])
Output :

6. Using -1 through iloc get the last row :
print(df.iloc[-1])
Output :

With iloc, we can pass in the -1 to get last row – something we couldn’t do with loc.
7. Select the First, 10th, and 100th rows
print(df.iloc[[0,9,99]])
Output :

Sub-setting columns
- The Python slicing syntax use a colon, :
- If we have just a colon, the attribute refers to everything.
- So, if we just want to get the first column using the loc or iloc syntax, we can write something
- like df. loc[: , [columns]] to subset the columns(s)
8. Select column with loc
#note the position of the colon :
#it is used to select all rows
subset = df.loc[:,['Country', 'Sales Channel']]
print(subset.head())
Output :

9. Subset columns with iloc
#iloc will allows us to use integers
# -1 will select last column
subset = df.iloc[:,[2,4,-1]]
print(subset.head())
Output :

Sub-setting Columns by Range
10. Create a range of integers from 0 to 5 inclusive
small_range = list(range(5))
print(small_range)
Output :
[0, 1, 2, 3, 4]
Sub-setting Rows and Columns
11. Using loc
print(df.loc[42,'Country'])
Output :
The Gambia
12. Using iloc
#42th row and 1st column value
print(df.iloc[42,1])
Output :
The Gambia
Sub-setting Rows and Columns
13. Get the First, 10th, and 100th rows
from the 1st, 4th, and 6th column
print(df.iloc[[0,9,99], [0,3,5]])
Output :

14. If we use the column names directly, it makes code a bit easier to read
#note now we have to use loc, instead of iloc
print(df.loc[[0,9,99], ['Region','Sales Channel','Order Date']])
Output :

Grouped Means
- For each Order ID in our data, what was the average Unit Cost?
- To answer this question,
- We split our data into parts by order ID
- Then we get the “Unit Cost” column and calculate the mean
#print first five row of given column with group by and mean function
print(df.head(n=10).groupby('Order ID')['Unit Cost'].mean())
Output :

Grouped Frequency Counts
- Use the nunique to get counts of unique values on a Pandas Series
print(df.head(n=10).groupby('Country').nunique())
Output :

Basic Plot
Country_total_profit =df.head(n=10).groupby('Country')['Total Profit'].mean()
print(Country_total_profit)
Country_total_profit.plot()
Output :

Visual Representation of data
- Histogram
- Frequency Polygon
- Ogive
- Pie-chart
- Steam & leaf plot
- Pareto chart
- Scatter plot
For Visual Representation of data – Visit here