Pandas
Introduction
pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real-world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open-source data analysis/manipulation tool available in any language. It is already well on its way toward this goal.
DataFrame
A DataFrame
is a 2-dimensional data structure that can store data of different types (including characters, integers, floating-point values, categorical data, and more) in columns. It is similar to a spreadsheet, a SQL table, or the data.frame
in R.
Creating a DataFrame.
Now we are going to create an array using NumPy. and a data frame.
arr = np.random.randint(0,10,(5,3))
df = pd.DataFrame(arr)
Now we are going to create the index and column names of the data frame.
df.columns = [“C1”,”C2",”C3"]
df.index = [“R1”,”R2",”R3",”R4",”R5",]
Now, the data Frame looks like this.
To read or select the data.
Purely integer-location based indexing for selection by position.
.iloc[]
is primarily integer position based (from 0
to length-1
of the axis), but may also be used with a boolean array.
df.iloc[2]
>> C1 9
C2 8
C3 6
.loc[2] shows the following output because .loc[2] reads every values R3 as the index starts from 0.
df.iloc[2,1]
>> 8
The output is 8 because after reading the values of R3 the values of the columns are read and the index of the columns also starts from 0.
df.iloc[0]
>>C1 0
C2 0
C3 4
It reads every value from the index R1
df.iloc[:,0]>> R1 0
R2 1
R3 9
R4 7
R5 8
.loc
Access a group of rows and columns by label(s) or a boolean array.
.loc[]
is primarily label based, but may also be used with a boolean array.
df.loc["R2"]
>> C1 1
C2 1
C3 2
.loc[“R2”] reads the values by calling the index name.
df.loc["R2","C1"]
>> 1
.loc[“R2”,”c1"] reads the value from index R2 and the values of C1 from index R2
df.loc["R1":"R3"]
It reads every value from R1 and R3
df.loc["R1":"R3" , "C1":"C2"]
It reads the values from the index R1 to R3 from the column C1 to C2.
Thank you!