SAS for analytics, artificial intelligence, and Data management

Boyiq
5 min readApr 2, 2023

--

Author: Boyi Qian

What is SAS:

SAS, also called “Statistical Analysis System”, is a statistical software suite. It is a user-friendly and reliable software system designed for all computer or non-computer professionals who need to perform data processing and analysis. SAS can mine, alter, manage, and retrieve data from various sources and perform statistical analysis.

The SAS software suite has more than 200 components including:

- Base SAS — Basic procedures and data management

- SAS/STAT — Statistical analysis

- SAS/GRAPH — Graphics and presentation

- SAS/INSIGHT — Data mining

- SAS CI360 — Customer Intelligence

SAS can be a helpful and efficient tool for Artificial Intelligence solutions. Today, I will briefly go through the base SAS and talk about how to get started and do the basic analysis under the scenario of movie streaming.

Install SAS:

  • You can visit the official website to install the SAS
  • Lots of universities and institutions offer support for downloading SAS software or licensing updates, for example:

CMU students

Purdue students

Once you install the software, the user interface should look like this:

Experiment and play around with SAS:

With the right questions and analytics, we can make data do amazing things.

Importing and exploring Data:

Here I import file ratings.csv which is the data that my team member gathered from Kafka string including the user rating movies information to the table name of “ratings” which can be called:

“proc” is a basic operation we use in SAS, the same as “data”

By running the proc contents procedure we can see the data information of the table “ratings” we store.

Printing Data:

The code of SAS is similar to SQL, here I print 10 observations of the movie with a rating higher than 4 and format the user’s id to make it more readable.

Creating a new data table:

Sometime in an AI scenario, we may want to create several data tables from a data set. This can be done in SAS using several simple lines, for example, if I want to create a text attribute as a user’s comments.

An if-else condition statement can be written in SAS simply as follows:

Techniques such as regular expression can also simply be used in SAS. For example below we want to see some information of a series of movies. “%” here matches all the char after the variable “toy+story”

Advanced operation:

If we want to analyze more user’s information for example the number of movies they watch and the average rating they gave.

Keywords “by” allow SAS processes the data by a group of attributes we care.

To successfully create an accumulating column, first, we need to set the initial value to 0; second, retain the value each time that the program data vector reinitializes.

Exporting Data:

Similar to import data, SAS data table can be exported to csv, xlsx, txt, or even pdf and ppt.

Conclusion:

Based on the above experiment we can see that SAS is a language that is easy to understand and easy to right. The processing time is efficient for even large data sets. Based on the log file we can see that most of the operations are done in half a second.

Advantage:

  1. Comprehensive statistical analysis: SAS provides a wide range of statistical analysis procedures and advanced modeling techniques that enable users to perform complex analyses on large datasets.
  2. Data management: SAS offers powerful data management capabilities, including data cleansing, transformation, and integration, which make it easy to manage and prepare data for analysis.
  3. Scalability: SAS can handle large datasets and can easily scale up to meet the demands of big data applications.

Disadvantage:

  1. Cost: SAS is a proprietary software and can be expensive to license, especially for small businesses and individual users.
  2. Learning curve: SAS has a steep learning curve and requires significant training and experience to use effectively.
  3. Limited visualization options: SAS has limited visualization options compared to other data analysis tools such as Tableau and Power BI.

Code can be found on GitHub:

https://github.com/BoyiQian/SAS-intro

--

--