You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this project, I will be using Deep Learning to predict future stock market returns and use those predictions to construct a portfolio with the top performing stocks that automatically rebalances on a weekly basis.
Data
All the historical stock market data will be collected using the yahoo finance API (yfinance). However, since there is no market screener API to automatically get key market analytics, I download a CSV file from the Nasdaq website containing the largest publicly traded companies and from those, I select the largest 30 to use in the model. Moreover, since I am interested in asset performance, I will convert prices to weekly returns. A weekly interval is preferred to a daily interval, as it better captures the general trend of the market and can be more reliable when making predictions.
# Read the CSV file and keep only the largest 30 companiesdf=pd.read_csv('nasdaq_screener_1613878479181.csv')
stocks=df[~(df['IPO Year']>2020)].dropna().sort_values(by='Market Cap')['Symbol'].iloc[-30:].tolist()
stocks.sort()
# Each element of the list is a dataframe with the Adjusted close price of each ticker.securities_by_date= []
foriinstocks:
df=yf.download(i,start='2008-01-01',end=None , interval='1wk')[['Adj Close']].dropna().pct_change(periods=5).dropna().rename(columns={'Adj Close':i})
securities_by_date.append(df)
Model
It is very important to scale features before training a neural network, so we must normalize the data sets. To achieve that, I split the data into training 70% and testing 30% and subtract from both the mean and divide by the standard deviation. The mean and standard deviation should only be from the training set so that the model has no access to the values in the test set.
classMomentum_Model:
def__init__(self,df, window, future):
self.df, self.window, self.future, =df, window, future# Split the data to train and testtrain_df=self.df[0:int(len(self.df)*0.7)]
test_df=self.df[int(len(self.df)*0.7):]
# Get means and std for the columns of the dataframetrain_mean=train_df.mean()
train_std=train_df.std()
# Normalize the data by subtracting the mean and std of the train settrain_df= (train_df-train_mean) /train_stdtest_df= (test_df-train_mean) /train_std# Convert the normalized data to numpy array.train_df, test_df=train_df.to_numpy(), test_df.to_numpy()
self.train_df, self.test_df, self.train_mean, self.train_std=train_df, test_df, train_mean, train_std
After normalizing the data, the next step is to reshape them in order to fit our model. The LSTM input layer has three dimensions (samples, timesteps, features). There is only one feature (weekly stock return) and the number of samples is the length of the data set. Moreover, for all samples in the model we use the 5 previous observations in order to predict the 6th, which means that the X_train , X_test sets will be five-dimentional np.array with input shapes (len(X_train),5,1) , (len(X_test),5,1) and y_train , y_test will have input shapes (len(y_train),1) , (len(y_test),1) .
I will be using a stacked LSTM model comprised of one input layer, three LSTM layers and an output layer (which will be a dense layer with only one output). The first LSTM layer will have 50 units and the second and third will have 30 units. For both the first and the second layers return_sequences=True because we want the internal state of the previous layer to pass onto the next layers. Also, I will be using a 20% dropout rate so that the model focuses on more recent data.
Moreover, I will be using the Adam algorithm as an optimizer with a learning rate of .001 and a decay of 1e-6. Once training is complete and we have predicted the values of the test set, we must rescale the predictions by multiplying the standard deviation and adding the mean of the train set. We do this because we want to establish a trading strategy that picks stocks based on higher performance; so, if we do not rescale the features, we will be unable to distinguish which stocks were performing better since each stock has different means and standard deviations.
For reference, below you can see what the model predicts for Apple.
Trading Strategy and Portfolio Rebalancing
We already saved the predictions of the model for each stock. The next step is to combine all the predictions in one dataset and keep only the top 10. After that, at each timestep (week), we implement a simple trading algorithm of buying the open market price of the top 10 stocks today and selling the open market price one week from today. Therefore, we create a portfolio comprised of 10 stocks (each stock has equal weight) that rebalances on a weekly basis. Finally, we assume zero transaction costs when buying or selling securities.