Introduction to CRSP and Computstat via WRDS: Part 2
Using Compustat to find accounts data for a few companies
--
Note: This is part of a series of posts about using CRSP and Compustat via WRDS. The order of these posts is important. We will create and edit text files, CSV files and Excel files in some activities which we will then use in later ones. We suggest you start at the beginning then work through parts 1 to 4.
⬅️ Part 1 | ⬆️ Back to beginning | Part 3 ➡️️
Contents of Part 2
- Look up three companies in Compustat: Apple, Dr Pepper Snapple and Microsoft
- Download revenue data and save to a CSV file
- Carefully open the CSV file to preserve the correct CUSIP codes
- Prepare a new list of company codes to use in CRSP
In the remainder of this post, we will be running queries on multiple companies at a time. WRDS is designed for efficient, bulk study and there are several ways we can specify which companies to include in a query. We will see all of these approaches in the rest of this post:
- Select each company one-by-one
- Upload a list of many companies
- Search entire database, except conditions
1. Look up three companies in Compustat: Apple, Dr Pepper Snapple and Microsoft
In Part 1, we saw how to:
- Navigate to the Compustat Daily Update-Fundamentals Annual page
- Select a range of dates and look up one company
The process for searching for three companies is very similar: in the Code Lookup window, repeat the search process as many times as you need before finally clicking the ‘Add Codes to Query’ button. You can look up the old names of companies which have since merged or ceased trading.
2. Download revenue data and save to a CSV file
In Part 1, we saw how to:
- Search for variables and run a query
Activity 1
- Using the process above, use Compustat North America to find the total revenue from 2012 to 2017 for the following three companies:
- ‘Apple Inc’, CUSIP 037833100
- ‘Microsoft Corp’, CUSIP 594918104
- ‘Dr Pepper Snapple Group Inc’, CUSIP 26138E109
- Download the data to a CSV file named
compustat.csv
We will use this file in the next section.
3. Carefully open the CSV file to preserve the correct CUSIP codes
After completing Activity 1, you now have a CSV file with Compustat data. It is a text file with a comma separating each value, with headings on the first line and regular data on the remaining lines. The headings include:
gvkey
: the GVKEY code of the company (6 digits)datadate
: the date the data was produced (YYYYMMDD)cusip
: the CUSIP code of the company (9 characters)conm
: the name of the companyrevt
: the total revenue for that company on that date (millions of USD)
If you click to open the file in Excel in the usual way, Excel will predict how to format the data. Unfortunately, it will make some mistakes:
Can you spot what is wrong from the images above? (See the red highlights.)
- Any “leading zeros” are lost (so 03783310 becomes 3783310).
- Any codes containing an ‘E’ might be read as scientific notation numbers (so 26138E109 becomes 2.61E+133 or 2.61 × 10¹¹³).
Using the Text Import Wizard
We can avoid these problems by using the ‘Text Import Wizard’ to choose the format of specific columns before the original meaning of the data is lost:
- Launch ‘Excel’ and create a new ‘Blank workbook’.
- In the ‘Data’ tab, in the ‘Get External Data’ group, click on the ‘From Text’ button.
- Navigate to the
compustat.csv
file and click the ‘Import’ button. - A window will open with the title ‘Text Import Wizard-Step 1 of 3’.
The ‘Original Data Type’ should be ‘Delimited’.
Tick the box ‘My data has headers’ then click the ‘Next >’ button. - The title is now ‘Text Import Wizard-Step 2 of 3’.
For the ‘Delimiters’, tick ‘Comma’ and untick all others.
You will see columns appear in the preview, click the ‘Next >’ button. - The title is now ‘Text Import Wizard-Step 3 of 3’.
Thegvkey
column is selected. Change the ‘Column data format’ from ‘General’ to ‘Text’. - Select the
cusip
column and change the ‘Column data format’ from ‘General’ to ‘Text’. - Click the ‘Finish’ button and then ‘OK’ to confirm.
- The data has successfully imported, there should be no mistakes in the GVKEY and CUSIP columns. Save the file as
compustat.xlsx
(as a regular Excel workbook).
Activity 2
- Open the
compustat.csv
file you created in Activity 1 carefully in Excel. Use the Text Import Wizard as described above. - Save the file as
compustat.xlsx
(as a regular Excel workbook).
We will use this new file in the next section.
4. Prepare a new list of company codes to use in CRSP
After completing Activity 2, you now have an Excel file with Compustat data. If we want to repeat a query with many companies, it would be time-consuming to look them up individually each time. We might also make mistakes by performing such a repetitive task. A better approach is to use the Excel file we already have to prepare a list of the relevant company codes, then use this list in future queries.
WRDS will accept a list of company codes in the form of a plain text file, with one company per line, without duplicate entries. The type of company code accepted (CUSIP, GVKEY or other) depends on which database we are using. (We will go into details about which databases accept which code types later.)
How to prepare a list of unique codes
There are several ways to do this, we shall show you one. This way should work well if you have a large document.
Steps to prepare a list of unique codes:
- Open the source file in Excel, in this case,
compustat.xlsx
. - Open a new blank workbook as well.
- Choose the column of codes you wish to use. For us, this is the CUSIP codes in column H.
- We want to select all the codes but not the header row. Click in the first first cell with data (H2) then hold Shift + Control + Down arrow. (On a Mac, replace the Control key with the Command key.) You will see a green selection box around the data, all the cells have a grey highlight except the first one.
- Copy the selection and paste it into the new blank workbook.
- Next we will remove all duplicate codes. In the new workbook, click on the ‘Data’ tab, go to the ‘Data Tools’ group and click on ‘Remove Duplicates’.
- The ‘Remove Duplicates’ window appears. Make sure that the ‘My data has headers’ box remains unticked and click the ‘OK’ button. You will be informed how many values were removed (15) and how many unique values remain (3). Press ‘OK’.
- Save the workbook to the same folder as before but make sure to choose ‘Save as type’ as ‘Text (Tab delimited) (*.txt)’ and filename
cusips.txt
. Since there is only one column, we will not see any tabs or commas delimiting the columns. When you are asked to confirm saving in this format, choose ‘Yes’.
Activity 3
- Open the
compustat.xlsx
file you created in Activity 2 in Excel. Use it to prepare a unique list of CUSIP codes as described above. - It is good practice to also sort the list at this point. There is a ‘Sort’ button in the ‘Data’ of Excel.
- Save the file as
cusips.txt
(as a plain text file).
You now have a list of companies codes. We will use this new file in Part 3.