Introduction to CRSP and Computstat via WRDS: Part 2

Using Compustat to find accounts data for a few companies

Specialist Library Support
Specialist Library Support
7 min readJun 18, 2020

--

Graphic of keyboard

Note: This is part of a series of posts about using CRSP and Compustat via WRDS. The order of these posts is important. We will create and edit text files, CSV files and Excel files in some activities which we will then use in later ones. We suggest you start at the beginning then work through parts 1 to 4.

⬅️ Part 1 | ⬆️ Back to beginning | Part 3 ➡️️

Contents of Part 2

  1. Look up three companies in Compustat: Apple, Dr Pepper Snapple and Microsoft
  2. Download revenue data and save to a CSV file
  3. Carefully open the CSV file to preserve the correct CUSIP codes
  4. Prepare a new list of company codes to use in CRSP

In the remainder of this post, we will be running queries on multiple companies at a time. WRDS is designed for efficient, bulk study and there are several ways we can specify which companies to include in a query. We will see all of these approaches in the rest of this post:

  • Select each company one-by-one
  • Upload a list of many companies
  • Search entire database, except conditions

1. Look up three companies in Compustat: Apple, Dr Pepper Snapple and Microsoft

In Part 1, we saw how to:

  • Navigate to the Compustat Daily Update-Fundamentals Annual page
  • Select a range of dates and look up one company

The process for searching for three companies is very similar: in the Code Lookup window, repeat the search process as many times as you need before finally clicking the ‘Add Codes to Query’ button. You can look up the old names of companies which have since merged or ceased trading.

^ Back to contents of Part 2

2. Download revenue data and save to a CSV file

In Part 1, we saw how to:

  • Search for variables and run a query
Company lookup for Apple, Microsoft and Dr Pepper Snapple Group
Set up a query with multiple companies which were looked up individually.
Screenshots to look up more than one company in the same query then download the data for all. Click all images to enlarge them.

Activity 1

  • Using the process above, use Compustat North America to find the total revenue from 2012 to 2017 for the following three companies:
  • Apple Inc’, CUSIP 037833100
  • Microsoft Corp’, CUSIP 594918104
  • Dr Pepper Snapple Group Inc’, CUSIP 26138E109
  • Download the data to a CSV file named compustat.csv

We will use this file in the next section.

^ Back to contents of Part 2

3. Carefully open the CSV file to preserve the correct CUSIP codes

After completing Activity 1, you now have a CSV file with Compustat data. It is a text file with a comma separating each value, with headings on the first line and regular data on the remaining lines. The headings include:

  • gvkey: the GVKEY code of the company (6 digits)
  • datadate: the date the data was produced (YYYYMMDD)
  • cusip: the CUSIP code of the company (9 characters)
  • conm: the name of the company
  • revt: the total revenue for that company on that date (millions of USD)

If you click to open the file in Excel in the usual way, Excel will predict how to format the data. Unfortunately, it will make some mistakes:

Use Excel to open CSV Compustat data badly
Opening a CSV file in Notepad++ (left) shows the original text data. Opening in Excel (right) can lead to mistakes (red).

Can you spot what is wrong from the images above? (See the red highlights.)

  • Any “leading zeros” are lost (so 03783310 becomes 3783310).
  • Any codes containing an ‘E’ might be read as scientific notation numbers (so 26138E109 becomes 2.61E+133 or 2.61 × 10¹¹³).

Using the Text Import Wizard

Text Import Wizard step 1
Text Import Wizard step 2
Text Import Wizard step 3
Use the Text Import Wizard to force Excel to format specific columns as text.

We can avoid these problems by using the ‘Text Import Wizard’ to choose the format of specific columns before the original meaning of the data is lost:

  1. Launch ‘Excel’ and create a new ‘Blank workbook’.
  2. In the ‘Data’ tab, in the ‘Get External Data’ group, click on the ‘From Text’ button.
  3. Navigate to the compustat.csv file and click the ‘Import’ button.
  4. A window will open with the title ‘Text Import Wizard-Step 1 of 3’.
    The ‘Original Data Type should be ‘Delimited’.
    Tick the box ‘My data has headers’ then click the ‘Next >’ button.
  5. The title is now ‘Text Import Wizard-Step 2 of 3’.
    For the ‘Delimiters’, tick ‘Comma’ and untick all others.
    You will see columns appear in the preview, click the ‘Next >’ button.
  6. The title is now ‘Text Import Wizard-Step 3 of 3’.
    The gvkey column is selected. Change the ‘Column data format’ from ‘General’ to ‘Text’.
  7. Select the cusip column and change the ‘Column data format’ from ‘General’ to ‘Text’.
  8. Click the ‘Finish’ button and then ‘OK’ to confirm.
  9. The data has successfully imported, there should be no mistakes in the GVKEY and CUSIP columns. Save the file as compustat.xlsx (as a regular Excel workbook).
Get External Data From Text
CSV data imported to Excel with text formatted GVKEY and CUSIP columns.
CSV data imported to Excel with text formatted GVKEY and CUSIP columns.

Activity 2

  • Open the compustat.csv file you created in Activity 1 carefully in Excel. Use the Text Import Wizard as described above.
  • Save the file as compustat.xlsx (as a regular Excel workbook).

We will use this new file in the next section.

^ Back to contents of Part 2

4. Prepare a new list of company codes to use in CRSP

After completing Activity 2, you now have an Excel file with Compustat data. If we want to repeat a query with many companies, it would be time-consuming to look them up individually each time. We might also make mistakes by performing such a repetitive task. A better approach is to use the Excel file we already have to prepare a list of the relevant company codes, then use this list in future queries.

WRDS will accept a list of company codes in the form of a plain text file, with one company per line, without duplicate entries. The type of company code accepted (CUSIP, GVKEY or other) depends on which database we are using. (We will go into details about which databases accept which code types later.)

How to prepare a list of unique codes

There are several ways to do this, we shall show you one. This way should work well if you have a large document.

Paste the codes to a new workbook (left) then remove the duplicates.

Steps to prepare a list of unique codes:

  1. Open the source file in Excel, in this case, compustat.xlsx.
  2. Open a new blank workbook as well.
  3. Choose the column of codes you wish to use. For us, this is the CUSIP codes in column H.
  4. We want to select all the codes but not the header row. Click in the first first cell with data (H2) then hold Shift + Control + Down arrow. (On a Mac, replace the Control key with the Command key.) You will see a green selection box around the data, all the cells have a grey highlight except the first one.
  5. Copy the selection and paste it into the new blank workbook.
  6. Next we will remove all duplicate codes. In the new workbook, click on the ‘Data’ tab, go to the ‘Data Tools’ group and click on ‘Remove Duplicates’.
  7. The ‘Remove Duplicates’ window appears. Make sure that the ‘My data has headers’ box remains unticked and click the ‘OK’ button. You will be informed how many values were removed (15) and how many unique values remain (3). Press ‘OK’.
  8. Save the workbook to the same folder as before but make sure to choose ‘Save as type’ as ‘Text (Tab delimited) (*.txt)’ and filename cusips.txt. Since there is only one column, we will not see any tabs or commas delimiting the columns. When you are asked to confirm saving in this format, choose ‘Yes’.
Save As text tab delimited
Save warning, click Yes
cusips in Notepad
Save the new list as ‘Text (Tab delimited) (*.txt)’ to get a plain text file.

Activity 3

  • Open the compustat.xlsx file you created in Activity 2 in Excel. Use it to prepare a unique list of CUSIP codes as described above.
  • It is good practice to also sort the list at this point. There is a ‘Sort’ button in the ‘Data’ of Excel.
  • Save the file as cusips.txt (as a plain text file).

You now have a list of companies codes. We will use this new file in Part 3.

^ Back to contents of Part 2

Unlisted

--

--