There is a number of analytics professionals, especially in banking and pharmaceutical industries that use SAS for their day to day work. However, there has been a trend of migrating away from SAS towards other languages such as Python or R which are a lot more flexible in areas such as Machine Learning and Data Science in general.
While it may be scary at first, making a transition from SAS to Python does not have to be hard. I found that instead of going through a complete Python training it is often easier to first learn the bits that enable you to perform your standard tasks. Once you mastered that, you can build on it by learning all the extra bits that Python has to offer.
Below is the list of commonly used SAS statements and functions together with their equivalents in Python, which will help you to make those first steps over to Python.
Note, we will use Pandas library to perform below operations in Python. Make sure you first import Pandas before you proceed. You can do it by running the below code:
Create SAS dataset / Python Pandas DataFrame
When you want to manually input your data in SAS the common solution is to use Input and Datalines:
An equivalent to that in Python would be to create a Pandas DataFrame. Here are a couple of ways to do it:
Method #1: Create Pandas DataFrame from lists of lists
Method #2: Create Pandas DataFrame from dictionary of lists
Import data from CSV
It is more common to import data from csv or other files than to type it all in yourself. Here is how you do it in SAS:
Equivalent in Python:
Datastep - Create a copy of SAS dataset / Pandas DataFrame
Below code creates a new SAS dataset (in this example named new_ds) by simply taking a copy of the existing SAS dataset (old_ds):
Creating a new Pandas DataFrame (in this example named new_df) based on the existing Pandas DataFrame (old_df) is very simple. However, please be aware of differences between creating a view and a copy:
Filtering - where condition
Filtering is essential in data manipulation. Here is how you do it in SAS:
Now repeat the same suing Pandas DataFrame:
Rename variables (fields)
Renaming fields is very simple in SAS:
Renaming in Pandas DataFrames is just as easy:
Dropping or keeping variables (fields)
You don’t always want all the variables available in your dataset. Only using what you need is also beneficial from efficiency / storage perspective. This is how you specify which variables to keep (or drop) in SAS:
In Python Pandas you can use a few ways to achieve the same:
Add new variables / fields
Sometimes we need to create new variables based on other variables within a dataset. SAS example:
Here’s the same in Python Pandas DataFrame:
Basic if statement
Many variable operations are often performed using if statements. SAS example:
Now in Pandas DataFrame. Note, for this operation we make use of an existing pd.isnull() function that identifies missing value and also make use of lambda function to apply our if statement logic to DataFrame:
Nested if statement
It is a bit more tricky when you have to deal with nested if statements.
Now in Python. Note, you have to specify axis since in this scenario we use apply method on the entire DataFrame instead of using it on just one column:
Some people prefer to use proc sql instead of SAS merge. Here, I have provided examples of both.
SAS data merge:
SAS Proc SQL merge:
Now let’s look at merging with Pandas DataFrame:
What if instead of merging datasets you want to append one to the bottom of the other? SAS solution:
Pandas DataFrame solution:
I hope you find the above code useful.
If there are other statements that you commonly use in SAS that I did not include above, please comment down below and I will make sure I add them to the list. Also, feel free to check out the story on SAS Procedures in Python which will give you Python examples of most commonly used SAS proc statements.