What is a stock split and how to adjust it with Python?

Petr Tmej
Jul 15, 2022
5 min read

WHAT IS STOCK SPLIT?

It is normal in financial markets that companies split their stocks (or merges – reverse stock splits). A stock split is simply dividing one stock into more without decreasing the company’s market capitalization. It usually happens when stocks are growing for a long time, and prices make particular stock not affordable for smaller investors. Usually, stock splits are positive in the longer term. A reverse stock split is when a company wants to decrease the number of shares.

EXAMPLE OF STOCK SPLITTING ON APPLE stock

A simple example of a stock split was Apple [AAPL] stock split Aug 31, 2020, of 4-for-1, which means if you owned 10 AAPL stocks before Aug 31, 2020, after you had 40 stocks. When capitalization is not decreased, the nominal price for one stock has to reduce. Dealing with splits is very important when backtesting strategies. A 75% drop in price was not the drop, but the split – this can confuse your strategy. You also know the adjusted close prices, which are adjusted mostly for stock splits. But it is not enough; a lot of time, your algorithm also uses open, high, low, and volume. Also, when adjusting stocks and using dividends in your backtest, you have to adjust dividends.

Adjusting is very easy. Let’s look at an AAPL example. We download adjusted data from Alpha Vantage (they consist of stock splits and dividends) and look at historical AAPL splits.

import pandas as pd
from alpha_vantage.timeseries import TimeSeries
ts = TimeSeries(key='YourAPIkey', #claim on alphavantage.co
                output_format='pandas',
                indexing_type='date')

daily = ts.get_daily_adjusted('AAPL', 'full')[0].sort_index()
daily.columns = [x[3:] for x in daily.columns]

daily[daily['split coefficient'] != 1]

open high low close adjusted close volume dividend amount split coefficient date 2000-06-21 50.50 56.94 50.31 55.63 0.8583 4375000.0 0.0 2.0 2005-02-28 44.68 45.14 43.96 44.86 1.3843 11635900.0 0.0 2.0 2014-06-09 92.70 93.88 91.75 93.70 21.1767 75414997.0 0.0 7.0 2020-08-31 127.58 131.00 126.00 129.04 129.0400 223505733.0 0.0 4.0

In the table, we can see 4 stock splits for AAPL data from the year 2000. In this case, OHLCV is not adjusted, and adjusted close has different values than close. The split coefficient is also visible and usable. The mathematical calculation is straightforward, look at prices around Aug 31, 2020.

daily[daily.index>='2020-08-28'][:3]

open high low close adjusted close volume dividend amount split coefficient date 2020-08-28 504.05 505.77 498.31 499.23 124.8075 46907479.0 0.0 1.0 2020-08-31 127.58 131.00 126.00 129.04 129.0400 223505733.0 0.0 4.0 2020-09-01 132.76 134.80 130.53 134.18 134.1800 152470142.0 0.0 1.0

We want to adjust the prices and also the volume. When we have a 4-for-1 split, each stock is multiplicated by 4, so the volume of previous days should be just multiplied by 4. Aug 28, the close price was 499.23 for 1 stock. When adjusting, we have to divide the price by the split coefficient, 499.23/4 = 124.8075 (adjusted close). It’s very simple and fast in pandas. Notice that when there are more splits in history, we have to account for them. We divide by 4 until we come to another split in history, then we have to account for it. So the prices before Jun 9, 2014, should be divided by 7 and 4, so by 28.

Similarly, when we have a dividend, which is, in this case, the dollar amount of stock price. When we adjusted stock prices by 4-for-1, the price was divided by 4, so to not change the dividend percentage, we adjust dividends the same as other prices.

def calculate_adjusted(df, dividends=False):
    # we will go from today to the past
    new = df.sort_index(ascending=False)

    split_coef = new['split coefficient'].shift(1
        ).fillna(1).cumprod()

    for col in ['open', 'high', 'low', 'close']:
    	  new['adj_' + col] = new[col] / split_coef
    new['adj_volume'] = split_coef * new['volume']

    if dividends:
        new['adj_dividends'] = new['dividend amount'] / split_coef

    return new.sort_index(ascending=True)

We sort values descending (according to DatetimeIndex), and the split coefficient is multiplicated through the history. We shift the value for 1 day into the history (+1 because we have descending dates) and make cumulative products to consider other splits. Then we divide the prices by that value but multiplicate the volume.

Let’s download minute data and adjust them. Minute data do not have dividends, so we will not adjust dividends here (function in default doesn’t adjust dividends). A simple query to download minute data in csv format (also for historical minute data, see the Alpha Vantage documentation). You can notice that there is slice=year{}month{} in the query, which we format as year1month1 to get the most recent data. Alpha Vantage gives maximum one month data per one query and up to 2 years of history. So the previous month would be year1month2, this month in the previous year would be year2month1, and so on. Note that depending on the date you are downloading the data, the event could be older than one month. You can check if the prices are adjusted correctly by downloading adjusted prices by setting adjusted=true.

download_file = 'https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY_EXTENDED&function=TIME_SERIES_INTRADAY_EXTENDED&symbol={}&interval=1min&slice=year{}month{}&adjusted=false&apikey=YourAPIkey' # change to your API key

df = pd.read_csv(download_file.format('AAPL', 1, 1))
# note that time of making this query was after the split, Sep 2020
df = df.set_index('time')
df = df.sort_index()
df[df.index < '2020-08-31 04:03:00'].tail()

We can see the split in the table when the price decreased from 501.98 USD to 128 USD, which was pre-market open on Aug 31. But our minute data don’t have a split coefficient, so we merge it with the coefficient (which is on daily data, so it is at time 00:00:00).

df = df.merge(daily[['split coefficient']], left_index=True,
              right_index=True, how='outer')
# fill split NaN values with 1 (because of cumulative product)
df['split coefficient'] = df['split coefficient'].fillna(1)
df[df.index < '2020-08-31 04:03:00'].tail()

open high low close volume split coefficient 2020-08-28 19:59:00 501.73 501.73 501.65 501.65 1683.0 1.0 2020-08-28 20:00:00 501.80 502.00 501.71 501.98 8239.0 1.0 2020-08-31 00:00:00 NaN NaN NaN NaN NaN 4.0 2020-08-31 04:01:00 128.00 137.29 125.60 126.00 61664.0 1.0 2020-08-31 04:02:00 126.92 127.00 126.00 126.50 16918.0 1.0

Apply our function and have a look at the result.

dfa = calculate_adjusted(df)
dfa[dfa.index < '2020-08-31 04:03:00'].tail()

open high low close volume split coefficient adj_open adj_high adj_low adj_close adj_volume 2020-08-28 19:59:00 501.73 501.73 501.65 501.65 1683.0 1.0 125.4325 125.4325 125.4125 125.4125 6732.0 2020-08-28 20:00:00 501.80 502.00 501.71 501.98 8239.0 1.0 125.4500 125.5000 125.4275 125.4950 32956.0 2020-08-31 00:00:00 NaN NaN NaN NaN NaN 4.0 NaN NaN NaN NaN NaN 2020-08-31 04:01:00 128.00 137.29 125.60 126.00 61664.0 1.0 128.0000 137.2900 125.6000 126.0000 61664.0 2020-08-31 04:02:00 126.92 127.00 126.00 126.50 16918.0 1.0 126.9200 127.0000 126.0000 126.5000 16918.0

Adjusted prices are prepared, and you can fully use them in your backtests. We used only built-in pandas functions, so it calculates reasonably fast for a more extended history of one-minute data or even one-second data. Do your price adjusting is very important if you store the data in the original form; most brokers also provide original data without splitting dates and coefficients.