Published in

Eduonline 24

# Can we get SAS Proc Freq with Python?

In this article, we will discuss the SAS Proc Freq procedure and how we can achieve similar results using Python libraries.

# Introduction to Proc Freq

The frequency distribution of categorical variables is quite common in descriptive statistics. SAS provides the freq procedure to achieve these stats in a simple way. There are two most common reasons why you need to know how to get frequency distribution in Python:

• Python is the most commonly used coding language in Data Science.
• Your team is migrating the code from SAS to Python.

## Applications/Resources Used:

• SAS OnDemand (earlier University Edition) / — For writing SAS Code
• Kaggle — For writing python code
• Pandas — For dataframe operations

# Python

1. Importing the CSV
2. Frequencies sorted by labels
• The method, value_counts() returns a pandas series containing counts of unique values.
• sort_index() sorts the data based on labels.

3. Convert the Resultant Series to a Dataframe

• We will use the DataFrame method for this purpose.
• Dividing data values by their sum and then multiplying by 100 gives us the percentage for each value. We are using the sum() and cumsum() functions to get the sum and cumulative sums of the variables.
• Rounding the percentages up to two decimal places to match the SAS output

Note: The Index can be dropped while exporting the DF to excel/CSV.

# 2. Include the missing values

SAS: By default, missing values are dropped, use the missing option to include them as a group.

Python: By default, the missing values are dropped, to keep missing values in the frequency table, add the dropna parameter and set it to False.

# 3. Sort the rows from most frequent to least frequent

SAS: By default, there is no order, we can specify the option order=freq to make it descending.

Python: Just drop the sort_index() method.

SAS: Specify nopercent and nocum options for not printing the percentage and cumulative frequency and percentages, respectively.

Python: Just drop the last two columns while converting to dataframe.

# 5. Creating a Frequency Cross Tabulation

SAS: var1*var2 and dropping additional details to keep it simple.

Python: Using pandas crosstab() method.

• Specifying margins equal to true for adding row and column totals.
• By default, the column name for these subtotals is “ALL”, we will change it to “Total” using the margins_name method to match SAS output.

# 6. Frequency Procedure — Multiple Variables

SAS: use the tables method to apply the freq procedure on multiple variables

Python: we can loop through the variable in the list to get the frequency distribution for multiple variables. The function takes 2 arguments:

A. dataframe
B. a list of columns

That’s all guys for SAS Freq in Python, the one thing we can notice here is that a simple proc freq with more details is simpler in SAS as compared to python. Whereas as we go deeper, the Pandas library does the most of the work for us and we just need to utilise the ready-made methods.

Let me know your thoughts in the comment section.

--

--

--

## More from Eduonline 24

Articles Related to Web Development, Data Science, Programming, Coding, New Technologies, etc.

## Get the Medium app

Web Dev @ psrajput.com | Writer @ eduonline24.in | Analyst @ Genpact