Reading 13F SEC filings with python

Last Updated on January 30, 2018

Update: This project has been updated, please see this article to read about the most recent updates.

The SEC Form 13F is a filing with the Securities and Exchange Commission (SEC) also known as the Information Required of Institutional Investment Managers Form. It is a quarterly filing required of institutional investment managers with over $100 million in qualifying assets. -Investopedia

In this article I will show how to collect and parse 13F filing data from the SEC.

First, use EDGAR to search the company of interest.

EDGAR, the Electronic Data Gathering, Analysis, and Retrieval system, performs automated collection, validation, indexing, acceptance, and forwarding of submissions by companies and others who are required by law to file forms with the U.S. Securities and Exchange Commission (the "SEC"). -Wikipedia

Click on the Central Index Key (CIK) of the company you are search for, and then click on Documents.

You'll want to grab the HTML version of the Information Table. I have saved them in a folder with their file names cooresponding to their dates (YYYY-MM-DD format).

For this example, I have manually collected the files for a few years of data filed by a hedge fund. Here are the files I'll be working with:

files = os.listdir("13f/")
print(*sorted(files), sep="\n")
2014-02-14.html
2014-05-15.html
2014-08-14.html
2014-11-14.html
2015-02-17.html
2015-05-14.html
2015-08-14.html
2015-11-12.html
2016-02-16.html
2016-05-16.html
2016-08-12.html
2016-11-14.html
2017-02-14.html
2017-05-15.html
2017-08-10.html
2017-10-30.html

Here's a quick script we can use to parse information from each filing document:

def scrape_13f(file):
    date = file
    html = open("13f/"+file).read()
    soup = BeautifulSoup(html, 'lxml')
    rows = soup.find_all('tr')[11:]
    positions = []
    for row in rows:
        dic = {}
        position = row.find_all('td')
        dic["NAME_OF_ISSUER"] = position[0].text
        dic["TITLE_OF_CLASS"] = position[1].text
        dic["CUSIP"] = position[2].text
        dic["VALUE"] = int(position[3].text.replace(',', ''))*1000
        dic["SHARES"] = int(position[4].text.replace(',', ''))
        dic["DATE"] = date.strip(".html")
        positions.append(dic)

    df = pd.DataFrame(positions)
    return df

Using this function we can get a quick snapshot of this hedge fund by filing total over the last 4 years:

fund_growth = [sum(scrape_13f(file).VALUE) for file in sorted(files)]
dates = [f.strip('.html') for f in sorted(files)]
plt.figure(figsize=(10,5))
plt.title('Total Fund Size')
plt.xlabel('Filing Date')
plt.ylabel('USD')
plt.bar(dates, fund_growth)
plt.yticks()
plt.xticks(rotation='vertical')

png

Fund Positions with Bubble Chart

Next, it would be great to get a snapshot of the stocks owned by this fund in a given year. Let's use a D3 bubble chart. The names for each stock are quite long, so first let's convert them to stock ticker values. Here's a quick script I hacked together using a Fidelity lookup service:

cusip_nums = set()
for file in files:
    cusip_nums = cusip_nums | set(scrape_13f(file).CUSIP)

ticker_dic = {c:"" for c in cusip_nums}
for c in list(ticker_dic.keys()):
    url = "http://quotes.fidelity.com/mmnet/SymLookup.phtml?reqforlookup=REQUESTFORLOOKUP&productid=mmnet&isLoggedIn=mmnet&rows=50&for=stock&by=cusip&criteria="+c+"&submit=Search"
    html = requests.get(url).text
    soup = BeautifulSoup(html, 'lxml')
    ticker_elem = soup.find('tr', attrs={"bgcolor":"#666666"})
    ticker = ""
    try:
        ticker = ticker_elem.next_sibling.next_sibling.find('a').text
        ticker_dic[c] = ticker
    except:
        pass

    time.sleep(1)

I couldn't get all the CUSIP numbers, but I was able to get most of them. Some of the CUSIP numbers have changed for certain stocks and couldn't be looked up with this service. For now I won't fill these in. With the ticker_dic dictionary, we can make a quick edit to our scrape_13f function to populate ticker data for each holding:

ticker_dict = {'00206R102': 'T', '00507V109': 'ATVI', '00724F101': 'ADBE', ... }

def scrape_13f(file):
    date = file
    html = open("13f/"+file).read()
    soup = BeautifulSoup(html, 'lxml')
    rows = soup.find_all('tr')[11:]
    positions = []
    for row in rows:
        dic = {}
        position = row.find_all('td')
        dic["NAME_OF_ISSUER"] = position[0].text
        dic["TITLE_OF_CLASS"] = position[1].text
        dic["CUSIP"] = position[2].text
        dic["VALUE"] = int(position[3].text.replace(',', ''))*1000
        dic["SHARES"] = int(position[4].text.replace(',', ''))
        dic["DATE"] = date.strip(".html")
        dic["TICKER"] = ticker_dict[position[2].text]
        positions.append(dic)

    df = pd.DataFrame(positions)
    return df

Let's check this:

df = scrape_13f(files[2])
print(df[["CUSIP", "NAME_OF_ISSUER", "TICKER"]].head())
       CUSIP         NAME_OF_ISSUER TICKER
0  88579Y101                  3M CO    MMM
1  G1151C101  ACCENTURE PLC IRELAND    ACN
2  02209S103       ALTRIA GROUP INC     MO
3  03076C106    AMERIPRISE FINL INC    AMP
4  035710409    ANNALY CAP MGMT INC    NLY

Let's take a look at the last filing, Q4 2017.

q4_2017 = sorted(files)[-1]
df_q4_2017 = scrape_13f(q4_2017)

top_20 = df_q4_2017.sort_values(by="VALUE", ascending=False)[["TICKER", "VALUE"]][:40]
a = top_20.TICKER
b = top_20.VALUE
c = range(len(b))

fig = plt.figure(figsize=(15,5))
ax = fig.add_subplot(111)
ax.bar(c, b)

plt.xticks(c, a, rotation=90)
plt.title('Top 40 Stock Holdings by Value')
plt.xlabel('Stock Ticker')
plt.ylabel('USD (10 MM))')
plt.show()

png


Join my mailing list to get updated whenever I publish a new article.

Thanks for checking out my site!
© 2021 Brian Caffey