Building the S&P 500 Constituents

The economic landscape is constantly changing. Companies like Google rise from start-ups to become members of the S&P 500. Other companies, like Kodak and Polaroid once had large market capitalizations. Polaroid went backrupt in 2001, followed by a Kodak bankrupcy in 2012.

The constituents of the S&P 500 change as well. When building models from collections of stocks, like the S&P 500, one concern is "survivor bias". Survivor bias can take place when a company is omitted because it later left the index (or went out of business). If you were actually investing at the time, you would not know the fate of the company, as you do in the future. This means that it is important when using historical data to use the S&P 500 as it is at the time, not as it is in the future.

The Wharton Research Data Service was used to obtain the CRSP/Compustat data for the S&P 500 constituents.

A number of stocks in the past S&P 500 index do not exist any more and their ticker are inactive or used by other companies. Compustat uses special symbols for these stocks. For example, in the table below the ticker for Dresser Industries is given as "DI." and Chrysler is given as "C.3". Currently the ticker C belongs to Citigroup. Some stocks, like Energy Future Holdings Corp, which was part of the S&P 500 from 1964 until 2007 have no ticker associated with them but have a code (e.g., 0033A). These stocks can be looked up either via the Compustat GVKEY value (the CUSIP does not alway seem to work).

The Compustat database includes the constituents of a number of indices, most notably including the S&P 500 Composit index.

Compustat S&P Indices
S&P 500 Comp-Ltd S&P 500/Barra Growth Index
S&P 500/Barra Value Index SP500 Energy .S
SP500 Materials .S SP500 Industrials .S
SP500 Consumr Discretion .S SP500 Consumer Staples .S
SP500 Health Care .S SP500 Financials .S
SP500 Information Tech .S SP500 Telecom Services .S
SP500 Utilities .S SP500 Energy .G
SP500 Materials .G SP500 Capital Goods .G
SP500 CMMRCL&PRFSSNL SVC.G SP500 Transportation .G
SP500 Auto & Components .G SP500 Cnsmr Durbl&Apprel .G

S&P 500 Index

The S&P 500 data starts in 1964. Each stock has a from and a thru date. If the thru date is blank that means that as of the last Compustat update that stock is still in the index. An example is Abbott Laboratories. If a Thru date is given this means that the stock left the index on that date. An example is Dresser Industries which left the index on 09/30/1998.

The index for a given date is built by reading the index file from the start and building a list of stocks for the index. Stocks that have a blank Thru date will be in the index. Any stock that has a Thru date that is after the date the index is being contructed for is added. Any stock that has a Thru date that is before the date is not added.

If this algorithm is followed there will always be 500 stocks in the index for any date boundary.

Example of Compustat S&P 500 Constituent Data
GVKEYGVKEYXfromthruTICco_conmco_tic
1078303/31/1964I0003ABBOTT LABORATORIESABT
1300303/31/1964I0003HONEYWELL INTERNATIONAL INCHON
1356303/31/1964I0003ALCOA INCAA
1408303/31/1964I0003BEAM INCBEAM
2159303/31/196406/30/1998I0003BENEFICIAL CORPBNL.1
5968303/31/196406/30/1998I0003RYERSON HOLDING CORPRYI
4073303/31/196409/30/1998I0003DRESSER INDUSTRIES INCDI.
5087303/31/196410/06/1998I0003SPX CORPSPW"
3022303/31/196411/12/1998I0003CHRYSLER CORPC.3"

R Code

An R script to process the S&P 500 constituents: s_and_p.r

Ian Kaplan
November 2013
Revised:

Back to topics in Quantitative Finance