MLearning.ai
Published in

MLearning.ai

How to Read Data Values Separated by Blanks Using the SAS infile Statement

This is a basic piece of code before things get more complicated…

I have had a course in public health that introduced SAS (Statistical Analysis Software) to me as a public health researcher. It was the first time that I had to code or program something to get a result. I was doing OK with JMP as a point-and-click statistical program, but I still wanted to know how writing code to analyze data looked like.

Although I had an “A” at the end, you can say that actually I barely missed the “B” zone for that class. As much as coding was fun, there was still many things that I had to know in order to use SAS in the same way that I used JMP. So, I have decided to write about what I would have loved that course to be. Certainly, I would have loved to learn SAS through examples.

In that course, I used SAS 9.4. I won’t talk about how to download it or download SAS studio. I will jump straight ahead at the example at hand.

To start with, you have to know that SAS can read non-SAS datasets (txt, csv, xlsx, etc) in a number of ways. Today, we will introduce the “infile” method, and use SAS to read data that are separated by a blank and are present within a txt file.

After you open the SAS 9.4 program, the following window pops up.

The opening window in SAS 9.4

The code is written in the lower window named “Editor”, and is checked by SAS for errors in the upper window named “Log”. The window to the far left “Explorer” has files linking to the working directory or “library” that you create yourself, or to other useful libraries that are created by SAS itself.

So, let’s start clearing the log window by typing the code below.

dm log ‘clear’;

Also, let’s use code to create a personal library that I will keep and save all my files in. I will use the “libname” statment for that. Always remember that a statement always starts with a keyword=libname, and always ends with a semicolon=;

According to the code below, I am naming my library after my first name “amr”, and I am specifying a pathway to the file that I am creating for that library. It is going to be named “medium” and it is going to be inside another file named “sasdata”, which in turn will be inside my E section of my hard drive. The statement should be like code written below.

libname amr ‘E:\sasdata\medium’;

See that “running man” icon within the red circle? That is the “run” or “submit” button. We use that to run every statement that we have written. After you have run the above two statements, go to the libraries icon in the “Explorer” window, double-click it, and you will see an icon named “Amr” among the libraries. This is the library that we have created using the “libname” statement (see below).

If you open the “amr” library now, you will see nothing, because SAS only sees SAS datasets, not other datasets, and the medium folder that we specified will only contain a txt file named data1. Since that file is a txt (non-SAS) file, then SAS will not see it until it is transformed or “read” in way that will make it visible to SAS, and that is the role of the “infile” statement.

I have created the txt file (data1), and here it is below. I have put it in the “medium” file, but SAS will not be able to see it unless I use the infile statement in the manner shown below. As you see, the data are composed of five variables (columns) and five observations (rows). The variable names according to the order of the columns are: Race, Gender, Age, Weight, and Height. The “Race” and “Gender” variables are categorical, whereas the “Age”, “Weight” and “Height” variables are numeric.

We will start our code with the name of the SAS dataset that we want to create and save in the library named “Amr”. I wanted to create it and save it in the “Amr” library, and that’s why I started with the data statement, and I chose to write “amr.data1" in order to tell SAS to save it inside the “Amr” library.

data amr.data1;

If I wanted to create it without saving it, I would have written “data data1;”. That way it will still be created, but it will be found in another library called the “work” library. The “work” library is a temporary one which contains the files that you created, but these files will be deleted after you close SAS.

Now let’s follow our code.

The infile statement will tell SAS where to find the data.

infile ‘E:\sasdata\medium\data1.txt’;

The input statement will tell SAS the variable names associated with the data values. As you see in the txt image above, the data are without variable names. So the variable names for the columns will be determined by the input statement.

We must make sure that the order of the variable names in the input statement matches the order of the data values.

The presence of the dollar sign ($) after (Gender) tells SAS that the (Gender)
variable is a character variable. The absence of the dollar sign after the rest
of the variable names indicates that the (Height) and (Weight) variables are numeric.

input Race $ Gender $ Age Weight Height;

and then finish with the run statement.

run;

So all in all, our final piece of code will look like this:

data amr.data1;
infile ‘E:\sasdata\medium\data1.txt’;
input Race $ Gender $ Age Weight Height;
run;

or this:

After running this code, you will see the nameless data in the text file converted into a SAS dataset with added variable names relative to the assigned order of the variable names. If you notice on the left, I have already clicked the “Amr” library, and a SAS dataset named “data1” was there. That’s why we are both seeing it inside the “Amr” library.

If you want to check the type of the variables, double click on the variable names, and you will see the type of the variables in the resulting SAS window.

Now let’s see the output table using the proc print function.

title “Listing of dataset data1”;
proc print data=amr.data1;
run;

or …

The output will be like this:

Please let me know if you found this content helpful.

--

--

--

Data Scientists must think like an artist when finding a solution when creating a piece of code. ⚪️ Artists enjoy working on interesting problems, even if there is no obvious answer ⚪️ linktr.ee/mlearning 🔵 Follow to join our 18K+ Unique DAILY Readers 🟠

Recommended from Medium

Supply Chain Process Design using the Queueing Theory

Supply Chain Process Design using the Queueing Theory

How to Make Sure Your Analysis Actually Gets Used

5 Interesting Dataset on Kaggle

We know they are listening, but what do they hear?

Is Automation a Threat to Data Scientists?

Oracle Big Data Cloud, Event Hub and Analytics Cloud Data Lake Edition pt.3

COVID-19's Impact on SDG Data

5 reasons to boost your data literacy for effective storytelling

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Amr Ebied

Amr Ebied

Physician, Healthcare provider, Aspiring Researcher, https://www.linkedin.com/in/amr-ebied-8ba61710/ https://twitter.com/AmrEbied6

More from Medium

Exploring Pokémon data with SQL | Ricardo Simpao III

Fitbit Sleep Data Analysis (Step 1: Import)

Are You Sure Your Data Is Completely Clean? Use This Checklist to Help.

Datasets Resources for Data Science Projects — Part 2