Select_dtypes, Get_Dummies, Reset_index, Rename, Sort_values

Baris Gül
5 min readJun 12, 2022

--

With a new series, I want to handle some methods and functions in python that are useful in terms of data analysis. Python is a great language and easy to use for analyzing data. It has many packages. Pandas is one of those packages and makes dealing with data much easier.

1. Select_dtypes ()

“Select_dtypes” is a function in pandas. It returns the columns based on the column dtypes. “Select_dtypes” is a function in pandas. It returns the columns based on the column dtypes. Below you can see its usage.
It is our “DataFrame”:

If you want to get the data that are only object-type:

Otherwise, if you want to get all types except object-type:

It has two parameters: Include and exclude.

2. Get_Dummies()

It is one of the functions in pandas. It is quite useful for manipulating the data. It converts categorical data into dummy or indicator variables.

syntax: pandas.get_dummies(data, prefix=None, prefix_sep=’_’, dummy_na=False, columns=None, sparse=False, drop_first=False, dtype=None).

It has 9 parameters. They are like below;

  • data: which data is wanted to be manipulated.
  • prefix: String to append DataFrame column names. Pass a list with a length equal to the number of columns when calling get_dummies on a DataFrame.
  • prefix_sep: Separator/delimiter to use if appending any prefix. Default is ‘_’
  • dummy_na: It adds a column to indicate NaN values, the default value is false, If false NaNs are ignored.
  • columns: Column names in the DataFrame that needs to be encoded. The default value is None, If the columns are None then all the columns with object or category dtype will be converted.
  • sparse: It specifies whether the dummy-encoded columns should be backed by a SparseArray (True) or a regular NumPy array (False). the default value is False.
  • drop_first: Remove the first level to get k-1 dummies out of k categorical levels.
  • dtype: Data type for new columns. Only a single dtype is allowed. The default value is np.uint8.

We can see above our dummies-using. I tried to use some parameters here.

Df: As we already mentioned it is our data.Prefix: We wanted to transform our columns under the name “col1”.

Prefix_sep: Herewith we used “.” instead of “_” after the column.

Dummy_na: It creates new columns if there is any NaN-statement.

Columns: These are the columns that we want to transform.

Drop_first: It drops the first option.

3. Reset_Index()

Pandas reset_index() is a method to reset the index of a Data Frame. reset_index() method sets a list of integers ranging from 0 to length of data as the index.

It has 5 parameters. They are like below;

level: int, string or a list to select and remove passed column from the index.

drop: Boolean value, Adds the replaced index column to the data if False.

inplace: Boolean value, make changes in the original data frame itself if True.

col_level: Select in which column level to insert the labels.

col_fill: Object, to determine how the other levels are named.

  • We can see below our DataFrame.

As you might notice above the “DataFrame” has more than one index.

  • If the index has multiple levels, we can reset a subset of them.
  • If we are not dropping the index, by default, it is placed in the top level. We can place it on another level.
  • When the index is inserted under another level, we can specify under which one with the parameter col_fill.
  • If we specify a nonexistent level for col_fill, it is created.

4. Rename()

You can change the names of rows or columns with it.

index=dict-like or function, Alternative to specifying axis (mapper, axis=0 is equivalent to index=mapper).

columns=dict-like or function, Alternative to specifying axis (mapper, axis=1 is equivalent to columns=mapper).

axis={0 or ‘index’, 1 or ‘columns’}, default 0, Axis to target with mapper. Can be either the axis name (‘index’, ‘columns’) or number (0, 1). The default is ‘index’.

level=int or level name, default None, In case of a MultiIndex, only rename labels in the specified level.

5. Sort_values()

Pandas sort_values() function sorts a data frame in Ascending or Descending order of passed Column. It’s different than the sorted Python function since it cannot sort a data frame and particular column cannot be selected.

by : str or list of str.

Name or list of names to sort by.

if axis is 0 or ‘index’ then by may contain index levels and/or column labels.

if axis is 1 or ‘columns’ then by may contain column levels and/or index labels.

Changed in version 0.23.0: Allow specifying index or column level names.

axis : {0 or ‘index’, 1 or ‘columns’}, default 0

Axis to be sorted.

ascending : bool or list of bool, default True.

Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, must match the length of the by.

inplace : bool, default False.

if True, perform operation in-place.

kind : {‘quicksort’, ‘mergesort’, ‘heapsort’}, default ‘quicksort’.

Choice of sorting algorithm. See also ndarray.np.sort for more information. mergesort is the only stable algorithm. For DataFrames, this option is only applied when sorting on a single column or label.

na_position : {‘first’, ‘last’}, default ‘last’.

First puts NaNs at the beginning, last puts NaNs at the end.

  • We can see below our Data.

Below we have handled an example. Here;

By= What does it depend on ? Here we took the weight as the basis. We rank our data by weight.

Ascending=Here it is False. It means it goes from highest to lowest by the specified column (here: weight).

Na_position= Where you want to see “Nan-values” by the specified column (here: weight).

In this article we mentioned some pandas function and their parameters that has a lot help in Data-Analysis. And we tried to explain them with some examples. Now and then we’ll handle some other themas. Until then take care.

--

--