Python Equivalent of Common SAS Statements and Functions

SolClover
SolClover
Aug 8 · 3 min read
Image for post
Image for post

There is a number of analytics professionals, especially in banking and pharmaceutical industries that use SAS for their day to day work. However, there has been a trend of migrating away from SAS towards other languages such as Python or R which are a lot more flexible in areas such as Machine Learning and Data Science in general.

While it may be scary at first, making a transition from SAS to Python does not have to be hard. I found that instead of going through a complete Python training it is often easier to first learn the bits that enable you to perform your standard tasks. Once you mastered that, you can build on it by learning all the extra bits that Python has to offer.

Below is the list of commonly used SAS statements and functions together with their equivalents in Python, which will help you to make those first steps over to Python.

Note, we will use Pandas library to perform below operations in Python. Make sure you first import Pandas before you proceed. You can do it by running the below code:

Create SAS dataset / Python Pandas DataFrame

An equivalent to that in Python would be to create a Pandas DataFrame. Here are a couple of ways to do it:

Method #1: Create Pandas DataFrame from lists of lists

Method #2: Create Pandas DataFrame from dictionary of lists

Import data from CSV

Equivalent in Python:

Datastep - Create a copy of SAS dataset / Pandas DataFrame

Creating a new Pandas DataFrame (in this example named new_df) based on the existing Pandas DataFrame (old_df) is very simple. However, please be aware of differences between creating a view and a copy:

Filtering - where condition

Now repeat the same suing Pandas DataFrame:

Rename variables (fields)

Renaming in Pandas DataFrames is just as easy:

Dropping or keeping variables (fields)

In Python Pandas you can use a few ways to achieve the same:

Add new variables / fields

Here’s the same in Python Pandas DataFrame:

IF statements

Basic if statement

Now in Pandas DataFrame. Note, for this operation we make use of an existing pd.isnull() function that identifies missing value and also make use of lambda function to apply our if statement logic to DataFrame:

Nested if statement

SAS example:

Now in Python. Note, you have to specify axis since in this scenario we use apply method on the entire DataFrame instead of using it on just one column:

Merge statement

SAS data merge:

SAS Proc SQL merge:

Now let’s look at merging with Pandas DataFrame:

Concatenate (append)

Pandas DataFrame solution:

Summary

If there are other statements that you commonly use in SAS that I did not include above, please comment down below and I will make sure I add them to the list. Also, feel free to check out the story on SAS Procedures in Python which will give you Python examples of most commonly used SAS proc statements.

Cheers!
SolClover

The Startup

Medium's largest active publication, followed by +720K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store