-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathMMerx_Dissertation.tex
More file actions
1264 lines (962 loc) · 90.5 KB
/
MMerx_Dissertation.tex
File metadata and controls
1264 lines (962 loc) · 90.5 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[11pt]{article}
\usepackage[a4paper,left=1in,right=1in,top=1in,bottom=1in]{geometry}
\usepackage{helvet}
\usepackage{longtable}
\usepackage{setspace}
\usepackage{amsmath}
\usepackage{amsfonts}
\usepackage{hyperref}
\usepackage{graphicx}
\usepackage{enumitem}
\graphicspath{ {./Images/} }
\usepackage{subcaption}
\hypersetup{
colorlinks=true,
linkcolor=blue,
filecolor=magenta,
urlcolor=cyan,
}
\setlength{\parindent}{4em}
\setlength{\parskip}{1em}
\usepackage{xcolor}
\definecolor{light-gray}{gray}{0.95}
\newcommand{\code}[1]{\colorbox{light-gray}{\texttt{#1}}}
\urlstyle{same}
\renewcommand{\familydefault}{\sfdefault}
\begin{document}
\begin{onehalfspace}
\title{%
Stock Price Forecasting via Imaging Techniques and Machine Learning}
\date{{\today}}
\author{MERX Mathilde \\Supervisor: LEE Jinu}
\maketitle
\pagebreak
\begin{abstract}
Forecasting time series has been the subject of many researches for at least decades. Recently, the impressive achievements of computer vision have drawn a lot of attention. The idea of imaging time series to forecast their future values started emerging. In this dissertation, I tried creating a trading algorithm using computer vision. To that extent, stock prices were imaged using different techniques. Then, a classification was trained to detect -- based on an image -- whether a company stock price would increase or decrease the next trading day. This algorithm was tested on the S\&P500 index, between 2016 and 2018, against the Buy\&Hold strategy.
\end{abstract}
\pagebreak
\tableofcontents
\pagebreak
\section{Introduction}
Prediction has interested mankind for millenniums: as early as 3,000BC, Indians were trying to develop meteorology \cite{meteo}. Indeed, knowing the future temperatures and precipitations were needed for the crops. As society became more complex and more specialized, new jobs have required new types of forecasting. For instance merchants need to predict their sales, electrical engineers need to predict the electricity demand, and so on.
It can be noted that all these objects of prediction (temperature, sales, electricity demand) can be translated into numbers which depend on time. That is what time series are: data (figures) indexed on time. As we've just noticed, time series forecasting is an area of great interest, in many different fields. In our case, we'll focus on time series forecasting in a financial context, and more precisely \textbf{forecasting the next day stock value of each S\&P500 company}.
There are many different tools for time series forecasting: mathematical and statistical analysis, among which ARIMA, regressions, curve fitting, Bayesian analysis, etc. \cite{campanharo}. Unfortunately, these methods have their limits. One of the biggest is that they often fail to predict market surges or falls. A possible explanation is that most econometric models are made under the assumptions data are stationary, and follow a Gaussian \cite{arima}.
Another (expending) tool used for time series forecasting is Machine Learning. In this case, people mostly use Long-Short Term Memory layers, or Convolutional Neural Network \cite{conv_lstm}. In both cases (statistical analysis and Machine Learning), data commonly used for forecasting are the raw data: the numbers. However, it is interesting to note that a time series can also be described in images. For instance, the plot (or graphic) of the time series is an image. In our case, we have decided to focus on imaging techniques of time series.
We want to focus on this graphical dimension because nowadays, highly efficient computer vision algorithms are made. For instance, on the MNIST dataset (a classification problem of hand-written digits), an accuracy of over 99.70\% has been reached. It means when shown a hand-written digit (between 0 and 9), over 99.70\% of the time the algorithm will make a correct inference!
Since there are very powerful computer vision algorithms, we have decided to try and exploit their strengths. As stated earlier, there is a very famous way of translating a time series into an image: the plot. Unfortunately, even though this representation is very telling for us, humans, it has a great default: it carries scarcely any info compared with its size \cite{jastrzebska}. For each column of pixels in the image, only one pixel is activated. For a 32$\times$32 pixel image, it means only a little bit more than 3\% of the pixels carry information!
Consequently, if we are to forecast time series by using an image representation, we will need and find another type of image. An idea is to highlight the time dependency of the data. For instance, if at time $t_1$ a stock price is $x_{t_1} = X$ and at time $t_2$ the same stock price is $x_{t_2} = X$ as well, then what is the probability of $x_{t_1 + 1} = x_{t_2 + 1}$? This is the general idea of the Markov Transition Fields \cite{wang}. Other imaging techniques will be discussed later on.
Using images of time series should improve the forecasting for two reasons: first, imaging time series is a way to transform them, and maybe reveal some features (for instance time dependencies). Second, image classification algorithms are known as among the best Machine Learning algorithms nowadays. Thus we want to try and adapt our problem so that we can use these algorithms.
Even though time series forecasting is considered a regression problem, we can translate it to a classification problem. For instance, for stock price forecasting, a reliable algorithm inferring whether the stock price will increase or decrease tomorrow can be all that is needed. And this is just a regular binary classification algorithm.
The purpose in this Dissertation is:
\begin{enumerate}
\item To describe relevant imaging techniques to translate time series into images
\item To describe a relevant classification Machine Learning algorithm inferring whether the time series will increase (or decrease) the next day
\item To evaluate the results on out-of-sample data
\end{enumerate}
In the next part, we will analyze the \nameref{sect_littrev} available on the matter "imaging time series to improve time series forecasting". Then, we will explain the \nameref{methodo} used to carry the research out. Afterwards, we will present our \nameref{results}, and carry out a \nameref{discuss} about them. Eventually, there will be the \nameref{concl} of this Dissertation.
\pagebreak
\section{Literature Review}
\label{sect_littrev}
The subject of this Dissertation is stock price forecasting. The idea is to image time series (the stock price), then use a binary classification algorithm to infer whether the stock price will increase or decrease the next day. The main steps are:
\begin{enumerate}
\item Data preprocessing -- choosing the time series to use and making them stationary (see \nameref{sec:LR_preprocess})
\item Translating the time series into images (see \nameref{sec:LR_image})
\item Choosing a classification algorithm (see \nameref{sec:LR_classif})
\item Evaluating the performances of the model (see \nameref{sec:LR_eval})
\end{enumerate}
In the remainder of this section, $x_t$ represents the value at time $t$ of the time series (usually stock prices) we want to image. $M$ is the matrix of the told image, and its coefficients $M_{i,j}$ are the pixels values. $M_{i,j}$ is the value of the $i$-th row and $j$-th column element of matrix $M$. $M$ has a size $(n \times n), n \in \mathbb{N}$.
\subsection{Defining the values of our time series}
\label{sec:LR_preprocess}
Stock price forecasting is usually carried out by using the last price before the end of the closing day: the closing price. Then, the time series thus obtained are made stationary, and those are the data used in forecasting algorithms.
Barra et al. \cite{barra}, who were focusing on improving time series imaging techniques, had a very interesting and novel idea. Instead of using just one time series (for instance the closing price: $x_t,...,x_T$), they suggested defining four different time series. Then they made one image for each time series, and eventually aggregated these four images in one image (each corner being the image of one time series).
Their definition for these four time series was the following: the values were actually the stock prices of a \emph{unique} company (so they only used the values of the time series $x_t,...,x_T$: they didn't introduce any new other company stock price, or macro economic data). The difference between the time series resided in the time laps between each value of the time series. The first time series was the value of the company stock collected every hour ($x_t, x_{t+1}, ...$), in the second one the values were collected every 4 hours ($x_t, x_{t + 4},...$), every 8 hours for the third image ($x_t, x_{t + 8}, ...$), and eventually every day for the fourth one ($x_t, x_{t + 24}, ...$). Barra et al. claimed it is a way to highlight different periods and trends in the company stock price.
About using stationary time series, Kwiatkowski et al. \cite{stationary} advise using a transformation of the time series, so as to make it stationary. The idea is to obtain a time series which joint probability distribution does not change when shifted in time. This results in the mean and the variance (among other) not depending on time. The different transformations they suggest testing are the first or second differentiate (using $x_t - x_{t-1}$ or $x_t - x_{t-2}$ instead of $x_t$), the logarithm ($\log(x_t)$), or a combination of both ($\log(x_t) - \log(x_{t-1})$).
The latter is particularly useful when imaging time series: this way, the values dealt with are not the stock price, but (the logarithm of) its evolution percentage. With this method, companies which stock price are especially expensive won't be considered differently than companies which stock are cheap, whereas it would be the case with the first or second differentiate.
When wondering what data to use, there are two suggestions to remember from these papers:
\begin{itemize}
\item Instead of using just one time series, trying and finding several time series to describe the stock price
\item Instead of using the raw stock price, trying and making the time series stationary
\end{itemize}
\subsection{Imaging time series techniques}
\label{sec:LR_image}
Once the preprocessing of the time series has been decided of, the next step is imaging the time series. What techniques can be used? In 2019, Jastebska \cite{jastrzebska} highlighted the impossibility to simply plot the function $f(t) = x_t$ where $t$ is the time and $x_t$ the time series. Indeed, for each column of pixels only one would give information, whereas it is preferred to \emph{maximize} the level of information per pixel. Hence, the idea is to operate some calculus revealing specific aspects of the time series, put the results in a matrix, and plot it as an image.
\subsubsection{Recurrence plot and wavelet transform}
The recurrence plot (RP) is the first imaging technique which will de discussed. Historically, a RP was an image described by the matrix:
\begin{equation}
M_{i,j} = \begin{cases}
1 & \text{if } x_i = x_j \\
0 & \text{else}
\end{cases}
\end{equation}
As we can see, this was only coding a black and white image. Then, a more sophisticated variance emerged: $M_{i,j} = \Theta(\epsilon - |x_i - x_j|)$, where $ \Theta $ is the Heaviside function ($\Theta(x) = 1$ if $x \geq 0$, else $\Theta(x) = 0$. $\epsilon$ is a constant to determine. This variance gives a grey scale image. Below are different RP for different types of time series: a gaussian (or white) noise in Figure \ref{fig:white_noise}, a random walk with white noise in Figure \ref{fig:random_walk}, and a periodic signal in Figure \ref{fig:periodic}. One can clearly see patterns in the RP of the periodic signal, whereas that of the white noise is much more erratic.
\begin{figure}[h!]
\centering
\captionsetup{justification=centering}
\begin{subfigure}{0.4\textwidth}
\includegraphics[width=6cm]{plot_noise.PNG}
\caption{Regular plot}
\label{fig:}
\end{subfigure}
\begin{subfigure}{0.4\textwidth}
\includegraphics[width=6cm]{rp_noise.PNG}
\caption{Recurrence plot}
\end{subfigure}
\caption{Plot and recurrence plot of white noise (100 samples)}
\label{fig:white_noise}
\end{figure}
\begin{figure}[h!]
\centering
\captionsetup{justification=centering}
\begin{subfigure}{0.4\textwidth}
\includegraphics[width=6cm]{plot_walk.PNG}
\caption{Regular plot}
\end{subfigure}
\begin{subfigure}{0.4\textwidth}
\includegraphics[width=6cm]{rp_walk.PNG}
\caption{Recurrence plot}
\end{subfigure}
\caption{Plot and recurrence plot of a random walk with white noise
\label{fig:random_walk}(100 samples)}
\end{figure}
\begin{figure}[h!]
\centering
\captionsetup{justification=centering}
\begin{subfigure}{0.4\textwidth}
\includegraphics[width=6cm]{plot_periodic.PNG}
\caption{Regular plot}
\end{subfigure}
\begin{subfigure}{0.4\textwidth}
\includegraphics[width=6cm]{rp_periodic.PNG}
\caption{Recurrence plot}
\end{subfigure}
\caption{Plot and recurrence plot of a periodic signal (100 samples)}
\label{fig:periodic}
\end{figure}
The use of RP is quite old. In 2004, Thiel, Romano and Kurths \cite{thiel} were already suggesting using RP to represent time series. They were only using the old black and white version because it was giving lighter images (bit wise) than the grey scale version. They wanted to represent time series in images, and then being able to reconstruct the time series from the image. This idea of bijectivity between the time series and the image is fundamental for effective classification algorithm, as we will see in subsection \ref{gaf}.
As time went on, and memory became less of a problem, researchers used more the Heaviside version of the RP: Senin and Malinchik in 2013; Silva, Souza and Batista in 2013 \cite{silva} and again (Souza, Silva and Batista) in 2014 \cite{souza}, Zheng et al in 2014 \cite{zheng}, Said and Erradi in 2019 \cite{senin}, and eventually Li, Kang and Li in 2020 \cite{li}. All of them ware using RP to image time series, for different reasons (usually classifying time series, but it could as well be feature extraction, or data reconstruction).
One great quality of the RP is underlined in each of these articles: it is a powerful feature highlighter. Silva, Souza and Batista \cite{silva} compare RP to frequency analysis when dealing with sound. However, when using RP for classification purposes, Zheng et al. \cite{zheng} explain it is more efficient, but less effective than Dynamic Time Warping (an algorithm used to compare two time series). So as to increase RP effectiveness in classification, Souza, Silva and Batista \cite{souza} used Gabor wavelets. This is a type of wavelet transform, which enables feature highlighting in an image. Using these wavelets answers to many researchers, who noticed RP weren't always highlighting time series patterns enough: Li, Kang and Li \cite{li}, or Senin and Malinchik \cite{senin}.
Eventually, Silva, Souza and Batista \cite{silva} focus on the $\epsilon$ term of the RP defined with Heaviside function. They explain it should be used as a threshold so as to get rid of the noise: they suggest the definition $\epsilon = 0.1 \times \max\limits_t x_t$.
\subsubsection{Markov Transition Field}
This is the second imaging technique which will be discussed about.
In 2011, Campanharo et al \cite{campanharo} wanted to encode time series into networks. To that extend, they needed a transition matrix: the Markov Transition Field (MTF). The general idea of the MTF is to answer the question: "If $x_i = x_j$ for some $i,j$, what is the chance that $x_{i+1} = x_{j+1}$?".
The mathematical definition of the MTF is the following one: let $n$ be the size of the MTF, and $p$ the number of samples from our time series we will be using: ($x_t,...,x_{t+p-1}$). Then, we create the sequence $(q_i)$, where $(q_i)_{i \in [1;n-1]}$ are the $n-1$ $n$-quantiles, and $q_0 = \min\limits_j x_i$, $q_n = \max\limits_j x_i$. For each sample of our time series: $x_j / j \in [t;t+p-1]$, we assign the value $X_j = \max\limits_{i \in [|0;n|]} \{ i / q_i \leq x_j\}$. $X_j$ represents $x_j$ quantile bin. Eventually, the coefficients of the MTF will be: $M_{i,j} = \sum\limits_{k=1}^{n} P(X_k = j | X_{k-1} = i)$. In other words, this is the conditional probability that when a sample is in bin $i$, the next sample will be in bin $j$. Hence we directly have $\sum\limits_{i=0}^{n-1} M_{i,j} = 1$.
The MTF was only supposed to be a transition matrix (its name is telling in that regard), but it has been used on its own. Indeed, in 2015, Wang and Oates published a series of articles (among which \cite{wang} and \cite{wang_encod}) using MTF for time series classification. They explain this encoding technique helps representing the temporal evolution of the time series. Around the main diagonal of the MTF, we can see the small changes of the time series, whereas in the top-right and bottom-left corner we can see the massive changes. Hence, MTF representation helps highlight the volatility of the stock.
Said and Erradi \cite{said} then used this idea in 2019 to forecast a time series: the supply/demand gap in crowdsourcing. The use of MTF in their algorithm \emph{deep-gap} gave them better results than with the LSTM architecture (RMSE of 11\% against 16\%), or even ARIMA (RMSE of 11\% against 13\%).
The main downside of MTF is that it requires a large $p$ compared with $n$: Campanharo et al. \cite{campanharo} advised using a lot of data samples compared to the matrix size (for instance $n = 20$ and $p = 320$). For trading data, it means over a year of samples. So as to use relatively similar $n$ and $p$, Wang and Oates \cite{wang_encod} suggested using MTF with another imaging technique: \nameref{gaf}.
\subsubsection{Gramian Angular Field}
\label{gaf}
In 2015, Wang and Oates (\cite{wang_encod} and \cite{wang}) tried classifying time series using an uncommon imaging technique: the Gramian Angular Field (GAF). The idea is to use the Gramian matrix (an inner product matrix), which inner product is based on polar coordinates. Then, this matrix is translated into an image.
Here is the definition of the MTF: first, we need to rescale our time series ($x_t, ..., x_{t+n-1}$) from -1 to 1. Let $m_+ = \max\limits_{i \in [t; t+n-1]} x_t$, and $m_- = \min\limits_{i \in [t; t+n-1]} x_t$. We create the sequence $X_i = \frac{2 \times x_i - m_+ - m_-}{m_+ - m_-}$, and $X_i \in [-1;1], \forall i$. $(X_i)$ are a component of the polar coordinates. Since the other component is not relevant here, it will not be discusses further. $(X_i)$ is used to create the sequence $\phi_i = \arccos(X_i)$. Since $X_i \in [-1;1]$ this is possible, and $\phi_i \in [0;\pi], \forall i \in [t;t+n-1]$. Eventually, for $M$ the matrix of the GAF, $M_{i,j} = \cos(\phi_i + \phi_j)$.
This representation has many advantages. First, it is bijective: the main diagonal is enough to find back the sequence $(X_i)$, and having only two values of the stock between $t$ and $t+n-1$ is enough to have all the values $x_t, ..., x_{t+n-1}$. Wang and Oates stress the importance of having "bijective" imaging techniques: algorithms have much better results than with images with which the time series can't be found back. Second, the GAF encodes some features of the time series: the correlation (in the coefficients $M_{i,j / |i-j| = k}$ we have the relative correlation by superposition of direction with respect to time interval $k$).
On their own, training classification algorithms with GAF gave rather positive results, depending on the time series used. Trained on electrocardiogram data for instance, Wang and Oates \cite{wang_encod} obtained 11\% error rate, compared with 21\% for the MTF. However, what brought even better results, was using GAF and MTF together, stacking them like several color channels of the image. In that case, Wang and Oates \cite{wang} obtained a 9\% error rate, still on the electrocardiogram dataset.
This idea of different images superposed as channels of the image has been re-used several times: for instance, Said and Erradi tried in 2019 \cite{said} to superpose GAF and RP images.
\subsection{Classification techniques}
\label{sec:LR_classif}
Once time series have been encoded into images, several researchers focused on the type of neural network best suited for classification.
Barra et al. \cite{barra} wanted to forecast the S\&P500 index. Their purpose was to have an algorithm predicting whether the index would increase or decrease the next day. Since they didn't have a lot of data, they stated having one very deep network would be too prone to overfitting. Hence, they wanted having limited size models (for instance a simplified version of VGG-16, and not ResNet34).
This architecture being chosen, they didn't want the results to depend on the initialization: they wanted to ensure a proper convergence of their algorithm. So as to address this problem, they decided training \emph{20 models}, each of them having the same architecture but a different initialization. Then, if more than $n$\% of the models agree on "increase" or "decrease", it was the overall prediction. Else, the overall prediction was "unsure" -- $n$ being an hyperparameter, $50 \leq n \leq 70$.
Berat Sezer and Ozbayoglua \cite{berat} tried using directly three classes ("increase", "decrease" and "stagnate"), but their algorithm tended to forecast stagnation too often (it had a recall of 55\% - see \ref{sec:LR_eval}). Barra et al. \cite{barra} had more promising results, which will be detailed in \ref{sec:LR_eval}.
The simplified version of VGG-16 Barra et al. used is constituted by 5 CNN layers, and a fully connected one. Indeed, CNN are a type of network commonly used for image classification. In 2010, Le et al. \cite{le} suggested an improvement of the CNN: the tiled CNN. In a CNN, the image is abstracted to a feature map in which all the weights are tied together. The idea of Tiled CNN is to only tie weights which are $k$ steps from each other ($k$ being an hyper parameter). When $k = 1$, it becomes a regular CNN. Barra et al. used a regular CNN, whereas Wang and Oates \cite{wang} used a Tiled CNN.
Tiled CNN let more the model learn by itself: instead of hard-coding the translational invariances, the network will find them out on its own. Even though Wang and Oates initially used Tiled CNN, the code they suggest today is that of a simplified ResNet, with regular CNN layers. They only use a dozen of CNN layers (hence ResNet12), with the same intention as Barra et al.'s: avoiding overfitting.
\subsection{Evaluating the model}
\label{sec:LR_eval}
When training a model, and later evaluating it, a relevant metric must be chosen.
In this article \cite{ouannes}, Ouannès explains the most basic evaluation for classification is the accuracy: $\frac{n_{right}}{n_{tot}}$, where $n_{right}$ is the number of prediction the algorithm got right, and $n_{tot}$ is the total number of prediction made by the algorithm. Unfortunately, this metric isn't always relevant: if one class is abundantly more represented than the other(s), the accuracy would be very high if the algorithm were to always predict this very class, which is not desired. Hence, we need other metrics. For instance, the precision $P$ and the recall $R$. For a binary classification (0 or 1), let:
\begin{itemize}
\item $T_0$ be the number of 0 correctly predicted
\item $T_1$ be the number of 1 correctly predicted
\item $F_0$ be the number of 0 the algorithm mistakenly predicted as 1
\item $F_0$ be the number of 1 the algorithm mistakenly predicted as 0
\end{itemize}
Then, $R = \frac{T_0}{T_0 + F_1}$: that is the percentage of 0 that are rightly predicted by the algorithm (recall). $P = \frac{T_0}{T_0 + F_0}$: that is the percentage of times the algorithm is right when it predicts 0 (precision). This is a first (numeric) way to evaluate one algorithm.
Barra et al. \cite{barra} suggested a more financial evaluation: comparing the predictions of their algorithm with the Buy\&Hold strategy. The idea is the following: many great investors (among which Warren Buffett) emphasize the best strategy when doing long term trading is to constitute a diversified portfolio, and then keep the values of this portfolio for as long as possible. For instance, despite the 2008 crisis and the coronavirus outbreak, the S\&P500 index has more than doubled between 2007 and today. Hence, for their testing data, Barra et al. compared these two strategies:
\begin{itemize}
\item Buying a share of the S\&P500 at the beginning of the testing period, hold on to it, and see its value at the end of the period
\item When the algorithm predicts "increase", buy a long action (buy a share of the index, then sell it before the market closes). When the algorithm predicts "decrease", buy a short action (sell a share of the index, then buy it before the market closes). If the algorithm predicts "unsure", do nothing.
\end{itemize}
What they noticed is that their strategy (relying on their algorithm) was on average better than the Buy\&Hold strategy. Furthermore, when they were doing poorer than Buy\&Hold, it was only slightly, whereas when they were doing better it was with a much bigger margin.
In the \nameref{methodo} will be exposed how to use all the articles previously reviewed to make a new stock price forecasting algorithms, using 1) imaging time series techniques and 2) classification models.
\pagebreak
\section{Methodology}
\label{methodo}
In this dissertation, we aim at forecasting time series, using image classification algorithms. Our time series will be S\&P500 company stock price. The steps or this methodology are:
\begin{enumerate}
\item Downloading the time series data
\item Imaging the time series (the imaging techniques are the subject of this dissertation)
\item Training a classification algorithm (more precisely a binary classification: whether the stock price will increase or decrease)
\item Evaluating this model (are we satisfied with the forecasting we obtain?)
\end{enumerate}
\subsection{Data collection}
The purpose is to have a lot of data. Indeed, Machine Learning algorithms require massive amounts of data for the training. So as to constitute a large dataset, the S\&P500 is very interesting: one year of stock price history represents around 125,000 samples!
The data used are 200 of the S\&P500 505\footnote{In the S\&P500, there are 500 companies, which issued a total of 505 common stocks.} common stock prices, from January 1st 2010 until December 31st 2015 for the training part, and from January 1st 2015 until December 31st 2018 for the testing part.
\subsubsection{Company names}
So as to obtain the names of the 505 common stocks from the S\&P500, one can use the list from Wikipedia \cite{500names}. It is necessary to do some web scraping (using python library \code{BeautifulSoup}) to obtain the common stock symbols\footnote{The CSS class we are interested in is "\textit{external text}".}.
The 505 symbols can be found in the Appendix \ref{505}, as well as the 200 common stocks used in this dissertation.
\subsubsection{Downloading the stock prices}
The stock prices can be downloaded from Yahoo Finance, using the function \code{DataReader} from the Python library \code{pandas\_datareader.data}.
We have now obtained four different time series per common stock:
\begin{itemize}
\item The maximum reached every trading day
\item The minimum reached every trading day
\item The price at the opening of the trading day
\item The price at the closing of the trading day
\end{itemize}
These time series begin on January 1st, 2010 and end on December 31st 2018.
We want to obtain stationary data\cite{stationary}, which those time series are not. Furthermore, we're interested in the evolution of the stock price, and not just in its value: an increase by 10\% is meaningful whatever the price, whereas a stock gaining 10\$ does not mean the same whether the stock costs 10\$ or 300\$. Hence, the following transformation is made: for each time series $(x_t)$, we will use the first difference of the logarithm: $X_t = \left(\log\left(\frac{x_t}{x_{t+1}}\right)\right)$. Hence, when we will talk about a time series, it will be meant the first differentiate of the logarithm of the said time series.
\subsubsection{Data description}
Here are some statistics on the data used:
\begin{itemize}
\item From 2010 until 2014 I have gathered data on 200 companies, for about 1,250 trading days, with 4 data points per day (opening, closing, minimum and maximum prices). It represents a total of roughly 1,000,000 samples.
\item From 2015 until 2018, I have gathered data on 75 companies (plus the S\&P500 index itself), for about 1,000 trading days, with 4 data points per day. It represents a total of roughly 300,000 samples.
\item On average, from one day to the next between 2010 and 2014, a stock price will increase by 0.64\%.
\begin{itemize}
\item We can notice this average is \emph{not} what happens every day. If that was the case, and someone were to invest on day 0, then on trading day 1250 (which represents about 5 years) they would have gained almost 2,000 times their investment! Hence, this proves that S\&P500 increase is not linear, and if a powerful trading strategy were found, the index could be beaten.
\end{itemize}
\item The standard deviation from one day to the next between 2010 and 2014 is 0.018. If the data were assumed to follow a Gaussian low, it would mean 68\% of the time, the daily evolution of the stock price is comprised between +2.4\% and -1.1\% (the 95\% interval is [-2.8\%; +4.2\%]).
\begin{itemize}
\item Those intervals are quite wide -- especially when compared with the mean. It means stock prices can be rather volatile, which is precisely what was noticed when finding the average increase by 0.06\% of the stocks.
\end{itemize}
\end{itemize}
\subsection{Imaging the time series}
In this subsection, we will detail the imaging techniques. There are three of them, and the three (greyscale) images obtained will be used as the three channels of the final image. 16 trading days data will be used to constitute one image.
For each technique, we will make four small images (16x16 pixels): one for the time series of the maxima, one for the time series of the minima, one for the time series of the opening prices, and one for the time series of the losing prices. Then, we will concatenate these four images into a big one (32x32 pixels). This image will be one (out of three) channels of the final image. Here are the next steps:
\begin{enumerate}
\item Detailing how to obtain a recurrence plot, and then making the wavelet transform of the recurrence plot
\item Detailing the Markov Transition Field
\item Detailing the Gramian Angular Field
\item Stacking the three images as channels to constitute the final image
\end{enumerate}
When it will be mentioned a time series $X_t$, without loss of generality, it could be the time series of the maxima, minima, opening or closing prices.
\subsubsection{Recurrence plot and wavelet transform}
This section contains four parts:
\begin{enumerate}
\item How to make a recurrence plot
\item How to make a wavelet transform
\item The concatenation of the wavelet transforms
\item The concatenation of the wavelet transforms of each recurrence plot
\end{enumerate}
For $t \in [0; T-15]$, we take the time series $X_t, ..., X_{t+15}$. It represent 16 days of data of a common stock price.
\noindent \textbf{First, let's explain how to make a recurrence plot of this time series.}
Let $M_t = \max\limits_{k \in [t;t+31]} X_k$, and $\epsilon = M_t * 0.1$. We compute the recurrence plot $R_{i,j} = \Theta (\epsilon - |X_i - X_j|)$, with $\Theta$ the Heaviside function. We rescale the recurrence plot so that it uses all shades of grey from white to black. That is, for $M = \max\limits_{i,j} R_{i,j}$ and $m = \min\limits_{i,j} R_{i,j}$, we do $R_{i,j} \leftarrow \frac{R_{i,j} - m }{M - m }$.
In our case, for each $t \in [0; T-15]$, we have four recurrence plots (one for the opening, closing, maximum, and minimum prices). In Figure \ref{fig:rec_plot} are some examples of recurrence plots, using the stock price of the company Snap-on, from January, 4th 2010 until January, 27th 2010 (which represents 16 trading days).
\begin{figure}[h!]
\centering
\captionsetup{justification=centering}
\begin{subfigure}{0.24\textwidth}
\includegraphics[width=0.95\linewidth]{rp_open.PNG}
\caption{Opening prices}
\end{subfigure}
\begin{subfigure}{0.24\textwidth}
\includegraphics[width=0.95\linewidth]{rp_close.PNG}
\caption{Closing prices}
\end{subfigure}
\begin{subfigure}{0.24\textwidth}
\includegraphics[width=0.95\linewidth]{rp_max.PNG}
\caption{Maximum prices}
\end{subfigure}
\begin{subfigure}{0.24\textwidth}
\includegraphics[width=0.95\linewidth]{rp_min.PNG}
\caption{Minimum prices}
\end{subfigure}
\caption{Recurrence plots of four time series representing the S\&P500 index between December, 11th and 31st, 2015}
\label{fig:rec_plot}
\end{figure}
\noindent \textbf{Second, let's explain how to make one wavelet transform (for each of our four recurrence plots).}
Now we need to compute the wavelet transform of these recurrence plots. It will be a way to highlight the texture of our recurrence plots. A wavelet transform reveals texture in four preferred directions: the vertical, the horizontal and both diagonal. Hence, we obtain four different images when computing the wavelet transform. Figure \ref{fig:wave_transf} is an example of the four textures given by wavelet transform on the recurrence plot of the opening prices of Snap-on.
\begin{figure}[h!]
\centering
\captionsetup{justification=centering}
\begin{subfigure}{0.24\textwidth}
\includegraphics[width=0.95\linewidth]{wt_lH.PNG}
\caption{Vertical texture}
\end{subfigure}
\begin{subfigure}{0.24\textwidth}
\includegraphics[width=0.95\linewidth]{wt_HL.PNG}
\caption{Horizontal texture}
\end{subfigure}
\begin{subfigure}{0.24\textwidth}
\includegraphics[width=0.95\linewidth]{wt_ll.PNG}
\caption{Main diagonal texture}
\end{subfigure}
\begin{subfigure}{0.24\textwidth}
\includegraphics[width=0.95\linewidth]{wt_HH.PNG}
\caption{Second diagonal texture}
\end{subfigure}
\caption{Four different textures given by the wavelet transform on the recurrence plot of the opening value of S\&P500 index between December, 11th and 31st, 2015}
\label{fig:wave_transf}
\end{figure}
In this case, the \textit{biorthogonal wavelet} has been used. Indeed, this kind of wavelet:
\begin{itemize}
\item Can be used with discrete data
\item Captures the texture equally well in all directions (horizontal, vertical and both diagonals)
\item The image obtained with this kind of wavelet is four time smaller (here 8x8 pixels) than the initial image (here 16x16 pixels), which serves our purpose
\end{itemize}
\noindent \textbf{Third, let's concatenate the wavelet transforms }
For each recurrence plot, we obtain four wavelet transforms (revealing horizontal, vertical and diagonal textures). Furthermore, one wavelet transform image is four times smaller than the recurrence plot it describes. Hence, we can concatenate all four wavelet images and obtain a 16x16 pixels wavelet image representing one recurrence plot. In Figure \ref{fig:wave_concat} are the concatenated wavelet transforms each four recurrence plot from Figure \ref{fig:rec_plot}.
\begin{figure}[h!]
\centering
\captionsetup{justification=centering}
\begin{subfigure}{0.24\textwidth}
\includegraphics[width=0.95\linewidth]{wt_open.PNG}
\caption{Opening prices}
\label{fig:open_concat}
\end{subfigure}
\begin{subfigure}{0.24\textwidth}
\includegraphics[width=0.95\linewidth]{wt_close.PNG}
\caption{Closing prices}
\end{subfigure}
\begin{subfigure}{0.24\textwidth}
\includegraphics[width=0.95\linewidth]{wt_high.PNG}
\caption{Maximum prices}
\end{subfigure}
\begin{subfigure}{0.24\textwidth}
\includegraphics[width=0.95\linewidth]{wt_low.PNG}
\caption{Minimum prices}
\end{subfigure}
\caption{Concatenate of the four textures given by the wavelet transform on four different wavelet transforms from Figure \ref{fig:rec_plot}}
\label{fig:wave_concat}
\end{figure}
\noindent \textbf{Fourth, let's concatenate the wavelet transforms of all four time series }
We have now obtained four 16x16 pixel images: one for each time series (opening, closing, minimum and maximum prices). Each image is the concatenation of four wavelet transform image (which were 8x8 pixels). We will now concatenate the four 16x16 pixel images of the wavelet transform so as to obtain one 32x32 pixel image (Figure \ref{fig:wt}). This image is the wavelet transforms (four texture) of the recurrence plots (from four time series). In this case, the data used are those of the company Snap-on, between January 4th and 27th, 2010.
\begin{figure}[h!]
\centering
\captionsetup{justification=centering}
\includegraphics[width=7cm]{wt_CONCAT.PNG}
\caption{Wavelet transform of the recurrence plots of the S\&P500 index between December, 11th and 31st, 2015}
\label{fig:wt}
\end{figure}
This image is the first channel of the final picture. In \nameref{sec:mtf}, we will detail how to obtain the second channel.
\subsubsection{Markov Transition Field}
\label{sec:mtf}
\textit{\textbf{Reminder}: $X_k, k \in [t - 240, t+15]$ are the values of our time series -- either the opening, closing, maximum or minimum price of the trading day.}
This section details how to obtain the second channel image: using the Markov Transition Field (MTF). This section contains parts:
\begin{enumerate}
\item How to make the MTF
\item The meaning of each coefficient
\begin{itemize}
\item Why using 240?
\end{itemize}
\item The concatenation of four MTF
\end{enumerate}
\noindent \textbf{First, let's detail how to make a MTF.}
The idea is to assign each $X_k$ to a subset of the time series, each subset containing an equal number of samples. There will be a total of 16 subset: they are the 16-quantile bins. The 1/16th smallest $X_k$ will be assigned to bin 1, the next 1/16th smallest $X_k$ will be assigned to bin 2, etc. until the 1/16th biggest $X_k$ will be assigned to bin 16.
Hence, for each sample $X_k, k \in [t - 240, t+15]$ is assigned its 16-quantile bin $b_k \in [|1;16|]$. Then, for $\#A$ the cardinal of an ensemble $A$, we compute the matrix $M_{i,j} = \# \{b_{k+1} = j \cap b_k = i \}$.Eventually, in the matrix, we divide each column by the sum of its elements.
\noindent \textbf{Second, let's explain the meaning of the MTF coefficients.}
One coefficient $M_{i,j}$ represents the probability that an element belongs to the $j/16$-th biggest of the time series knowing that the element the day before belonged to the $i/16$-th biggest of the time series. Figure \ref{fig:mtf4} represents the MTF of the opening, closing, maximum and minimum stock price of Snap-on, for every trading day between January 4th and 27th, 2010.
\begin{figure}[h!]
\centering
\captionsetup{justification=centering}
\begin{subfigure}{0.24\textwidth}
\includegraphics[width=0.95\linewidth]{mtf_open.PNG}
\caption{Opening prices}
\label{fig:open_concat}
\end{subfigure}
\begin{subfigure}{0.24\textwidth}
\includegraphics[width=0.95\linewidth]{mtf_close.PNG}
\caption{Closing prices}
\end{subfigure}
\begin{subfigure}{0.24\textwidth}
\includegraphics[width=0.95\linewidth]{mtf_high.PNG}
\caption{Maximum prices}
\end{subfigure}
\begin{subfigure}{0.24\textwidth}
\includegraphics[width=0.95\linewidth]{mtf_low.PNG}
\caption{Minimum prices}
\end{subfigure}
\caption{MTF of four different time series: the values of S\&P500 index between January, 12th and December 31st, 20150}
\label{fig:mtf4}
\end{figure}
\noindent \textit{Why using 240?}
As said earlier, the desired size of the images is 16x16 pixels (which will then be concatenated into 32x32 pixels images). Hence, the bins used will be 16-quantile bins (so as to have 16 bins, each column of the image representing one bin). If (similarly as for the other imaging techniques) only $X_k,..., X_{k+15}$ samples were used, then each bin would only contain one sample. Which means only one pixel per column would be used -- the others being blanks pixels. And this is the precise reason why regular plots have not been used! The purpose of this Dissertation is to have images containing as much information as possible.
Using 256 samples per image enables to have 16 samples per bin, hence each pixel can take 16 different values. Of course, one MTF image requires much more days of trading than the RP ones for instance. Yet, since those data are not difficult to find, and since the computational running time is barely increased, the choice of using 256 samples per image seems the best alternative. This is the reason why the samples used for one image run between $X_{k-240},... ,X_{k+15}$, and not just between $X_k,...,X_{k+15}$ as for the other imaging techniques.
\noindent \textbf{Third, let's concatenate the four MTF.}
Like before, we have four different images representing one company stock prices during 16 trading days: one for the opening prices, one for the closing prices, one for the maximum prices, and one for the minimum prices. We concatenate them, so as to form one unique bigger image: see Figure \ref{fig:mtf_concat}. This image is 32x32 pixels, it is the second channel of our final image. In \nameref{sec:gaf}, we will detail how to obtain the second channel.
\begin{figure}[h!]
\centering
\captionsetup{justification=centering}
\includegraphics[width=7cm]{mtf_concat.PNG}
\caption{MTF of S\&P500 index between January 12th, and December 31st, 2015}
\label{fig:mtf_concat}
\end{figure}
\subsubsection{Gramian Angular Field}
\label{sec:gaf}
\textit{\textbf{Reminder}: $X_k, k \in [t, t+15]$ are the values of our time series -- either the opening, closing, maximum or minimum price of the trading day.}
This section details how to obtain the third and last channel image: using the Gramian Angular Field (GAF). This section contains parts:
\begin{enumerate}
\item Rescaling the data
\item Making the GAF
\item The concatenation of four GAF
\end{enumerate}
\noindent \textbf{First, let's rescale our data.}
Let $M_t = \max\limits_{k \in [t;t+31]} X_k$, and $m_t = \min\limits_{k \in [t;t+31]} X_k$. Using the function $f_t(x) = \frac{2 x - (M_t + m_t)}{M_t - m_t}$, we rescale all $X_t$ between -1 and 1. Sometimes, values may be rounded above 1 or below -1, so double checking is important here (with an \code{if} statement: if the rescaled value is above 1 or below -1, then we assign it the value 1 or -1). That is important because we will then compute the arccosines of each value, and arccosines is not defined below -1 or above 1.
\noindent \textbf{Second, let's make the GAF.}
Then, we compute the arccosines of each value: $\phi_k = \arccos(f_t(X_k))$. Now, we can create the GAF matrix: $G_{i,j} = \cos(\phi_i + \phi_j)$. Figure \ref{fig:gaf4} shows the GAF of Snap-on with time series between January 4th and January 27th 2010.
\begin{figure}[h!]
\centering
\captionsetup{justification=centering}
\begin{subfigure}{0.24\textwidth}
\includegraphics[width=0.95\linewidth]{gaf_open.PNG}
\caption{Opening prices}
\label{fig:open_concat}
\end{subfigure}
\begin{subfigure}{0.24\textwidth}
\includegraphics[width=0.95\linewidth]{gaf_close.PNG}
\caption{Closing prices}
\end{subfigure}
\begin{subfigure}{0.24\textwidth}
\includegraphics[width=0.95\linewidth]{gaf_high.PNG}
\caption{Maximum prices}
\end{subfigure}
\begin{subfigure}{0.24\textwidth}
\includegraphics[width=0.95\linewidth]{gaf_low.PNG}
\caption{Minimum prices}
\end{subfigure}
\caption{GAF of four different time series: the values of S\&P500 index between December, 11th and 31st, 2015}
\label{fig:gaf4}
\end{figure}
\noindent \textbf{Third, let's concatenate the four GAF.}
Like before, we have four images representing one company stock prices during 16 trading days: one for the opening prices, one for the closing prices, one for the maximum prices, and one for the minimum prices. We concatenate them, so as to form one unique bigger image: see Figure \ref{fig:gaf_concat}. This image is 32x32 pixels, it is the third and last channel of our final image.
\begin{figure}[h!]
\centering
\captionsetup{justification=centering}
\includegraphics[width=7cm]{gaf_concat.PNG}
\caption{GAF of S\&P500 index between December, 11th and 31st, 2015}
\label{fig:gaf_concat}
\end{figure}
\subsubsection{Aggregating all three images}
We now have three images: the wavelet transform of the recurrence plots, the Markov Transition Field, and the Gramian Angular Field. They are all in shades of grey. The last step is to make an image of the aggregation of these three image, each being considered a channel (like in an RGB image). In Figure \ref{fig:3channels} you can see the three images in greyscale, and in Figure \ref{fig:final_image} you can see the stacking of all three channels.
\begin{figure}[h!]
\centering
\captionsetup{justification=centering}
\begin{subfigure}{0.32\textwidth}
\includegraphics[width=1\linewidth]{wt_CONCAT.PNG}
\caption{Wavelet transform of the recurrence plot}
\label{fig:open_concat}
\end{subfigure}
\begin{subfigure}{0.32\textwidth}
\includegraphics[width=1\linewidth]{mtf_concat.PNG}
\caption{Markov Transition Field}
\end{subfigure}
\begin{subfigure}{0.32\textwidth}
\includegraphics[width=1\linewidth]{gaf_concat.PNG}
\caption{Gramian Angular Field}
\end{subfigure}
\caption{GAF of four different time series: the value of S\&P500 index between December, 11th and 31st, 2015}
\label{fig:3channels}
\end{figure}
\begin{figure}[h!]
\centering
\captionsetup{justification=centering}
\includegraphics[width=7cm]{total_image.PNG}
\caption{RP, MTF and GAF as the three channels using the value of S\&P500 index between December, 11th and 31st, 2015}
\label{fig:final_image}
\end{figure}
We have now built \textit{one} image: that of the company Snap-on between January 4th and 27th, 2010. The same must be done with all the companies on which data has been gathered, between January 1st 2010 and December 31st 2018.
\subsection{Classification algorithm}
\subsubsection{Labels}
The dataset being done, we need to attribute labels to our images. The data used are the 16 (or 256 for the MTF) closing, maximum and minimum prices until the day before, and the 16 (or 256) opening prices until the present day. We want to know if today at the closing of the trading day, the stock price will be higher or lower than at the opening of the trading day. Hence, this is a classification problem. I tried labelling the images this way: assigning the label 1 if the stock price is to increase, or 0 else.
However, I noticed it didn't give conclusive results. Then I tried assigning the label 0 if the evolution of the opening price from the day before to the current day was higher than the evolution of the closing price from two days before to the day before. In mathematical writing, that is: to $\frac{O_t}{O_{t-1}} \leq \frac{C_{t-1}}{C_{t-2}}$ is assigned the label 0 -- else the label 1. I do not know why it works better this way, but since it is the case I chose using these labels.
\subsubsection{Architecture}
When they first wrote their article on imaging time series to classify them \cite{wang}, Wang and Oates (2015) suggested using Tiled CNN. This is a type of CNN invented in 2010 \cite{le} by Le et al. which does not hard code translational invariances, but let the neural network learn them on its own. However, the code which can be found today about Wang and Oates article does not use Tiled CNN, but ResNet. This more recent neural network has shown conclusive results for image classification problems. A ResNet is a neural network which layers are not all connected: some layers are jumped.
Hence,the architecture used is a ResNet with 12 CNN (we don't want a too big network, to avoid overfitting). For that, we use the \href{https://github.com/cauchyturing/UCR_Time_Series_Classification_Deep_Learning_Baseline/blob/master/ResNet.py}{code} from Wang and Oates \cite{wang}. The structure of a ResNet12 can be seen in Figure \ref{fig:resnet12}. This network begins with two CNN and max pooling layers, then there are 5 ResNet blocks (with the structure of connections jumping layer), and it finishes with one average pooling, and a fully connected layer.
\begin{figure}[h!]
\centering
\captionsetup{justification=centering}
\includegraphics[width=10cm]{resnet12.PNG}
\caption{Structure of ResNet12 - image from Choi, Ryu and Kim \cite{choi}}
\label{fig:resnet12}
\end{figure}
Eventually, we have to train our model. To that extend, we'll split the dataset into 3 groups: the training, the validation and the testing sets. The training and validation datasets are the stock prices from January 1st, 2010 until December 31st, 2014 (with the training data being 70\% of these data). The testing dataset is the stock prices from January 1st 2015, until December 31st, 2018. The evaluation will be explained in next section.
\subsubsection{Evaluation}
\label{sec:meth_eval}
5 models with the architecture described above are created. Their difference will be the initialization of their CNN layers: see Table \ref{table:initialization}. The initialization of a layer is the distribution law used to set the initial value of each weight (before the fitting). The initialization can change a lot the final outcome of the algorithm, that is why it is important to try different initializations.
\begin{table}[h!]
\begin{center}
\begin{tabular}{ | c | c |}
\hline
\textbf{Name} & \textbf{Initialization} \\
\hline \hline
Model 1 & Random normal \\ \hline
Model 2 & Uniform \\ \hline
Model 3 & Glorot normal \\ \hline
Model 4 & Orthogonal \\ \hline
Model 5 & LeCun Uniform \\
\hline
\end{tabular}
\end{center}
\caption{Initialization method for each model}
\label{table:initialization}
\end{table}
First, each of these six models will be evaluated individually. The idea is the following: on January 1st 2016\footnote{Since the MTF image needs about a year of data, and strictly no overlapping of the samples was desired to avoid overfitting, the testing effectively begins on January 1st, 2016, even though the data used cover all year 2015.}, the algorithm is given \$1,000. Then every day it predicts whether the S\&P500 index should increase of decrease the next day. If it is sure enough, it invests a part of its money on the index (either with a long or short action). Else, it does nothing (that is an idle action). On December 31st, 2018, we see how much money it has left. Here are the mathematical details:
\begin{enumerate}
\item Initially, the algorithm is given: $m_0 = \$1,000$ (with $m_t$ the money the algorithm holds on day $t$)
\item Every day, the algorithm is given the imaged S\&P500 index values it will use to make the prediction. Let $p_t = [1 - d_t;d_t]$ be the answer of the algorithm. $d_t$ is the inferred probability of an increase, and $1-d_t$ is the inferred probability of a decrease.
\item If $d_t > 0.6$ or $1 - d_t > 0.6$, the model is considered sure of itself enough: it tries and invest. Let $I_{0,t+1}$ be the index at next day opening, and $I_{1,t}$ the index at present day closing.
\item If $d_t > 0.6$ (the model predicts an increase of the index), the algorithm invests a part of its current money: $m_t * d_t$, buying a long action. Next day, it holds $m_{t+1} = m_t * (1-d_t) + m_t * d_t * \frac{I_{0,t+1}}{I_{1,t}}$.
\item If $1 - d_t < 0.6$ (the model predicts a decrease of the index), the algorithm invests a part of its current money: $m_t * (1- d_t)$, buying a short action. Next day, it holds $m_{t+1} = m_t * d_t + m_t * (1 - d_t) * \frac{I_{1,t}}{I_{0,t+1}}$.
\item If $0.4 < d_t < 0.6$, the model is not considered sure enough, it does nothing: $m_t = m_{t+1}$ (that is an idle action).
\item Then, we train the algorithm with all S\&P500 company stock prices (so not just the index) of the past 16 days.
\item We repeat from step 2. until December 31st 2018, and see how much money the algorithm has gained eventually.
\end{enumerate}
It will then be time to compare the result of our investment:
\begin{enumerate}
\item With the inflation. We obviously do not want to loose any money, not even inflation adjusted.
\item With the Buy\&Hold strategy: would using \$1,000 to buy a share of the index, then holding on to it give a better result?
\end{enumerate}
\pagebreak
\section{Results}
\label{results}
The main idea of this dissertation was trying and forecasting the future (S\&P500) stock price only based on their previous values. If I were to be successful, it would be a rejection of the efficient market hypothesis, even in its weak form.
Instead of using the raw time series data, harnessing the strengths of computer vision algorithms via imaging the time series was at stake. 5 different Machine Learning algorithms have been trained, all having the same architecture (ResNet12) but different initializations. These algorithms have been trained using S\&P500 companies stock prices from 2010 until 2014. They have then been tested on the S\&P500 index with data between 2015 and 2018.
On January 1st, 2016, \$1,000 have been given to the algorithms to trade with. On December 31st, 2018 the money they had gained was compared:
\begin{itemize}
\item With the money they had at the beginning, inflation adjusted
\begin{itemize}
\item The inflation between 2016 and 2018 was 6.2\%. Hence, when beginning with \$1,000 on January 1st, 2016, we hope ending up with over \$1,062 on December 31st, 2018.
\end{itemize}
\item With the money they would have had had they just invested the \$1,000 on the index, and then not touched it for 3 years
\begin{itemize}
\item The S\&P500 index was at 2,012.66 at the beginning of the period, and reached 2,506.85 at the end of it. It represents an increase of 24.6\%. For a \$1,000 investment at the beginning of 2016, it would represent \$1,246 at the end of 2018.
\end{itemize}
\end{itemize}
For each initialization method of the algorithms, it will be detailed the results obtained. They will be compared with the inflation and the Buy\&Hold strategy.
Furthermore, it can be interesting to know on a day-to-day basis which investment is more profitable: Buy\&Hold, or those made by the 6 models. To evaluate that, I will use the Mean Percentage Error (MPE). It represents the average percentage of difference between the prediction and the real value. The MPE is defined as follows: let $p_0,...,p_T$ be the predicted values, and $t_0,...,t_T$ the target values. Then $MPE = \frac{1}{n} \sum\limits_{i=0}^{T} \frac{p_T - t_T}{t_T}$. In our case, $(p_i)$ represents the investment made by the algorithm at time $i$, and $(t_i)$ represents the value of the index at the same moment. The higher the MPE, the more the investment algorithm beats (on average) the Buy\&Hold strategy.
Before giving the return on investment and the MPE of each model, here is a quick review of the abilities of the classification algorithms. The average accuracy (percentage of correct inferences) is 68.4\%. The precision (that is the percentage of opening price increasing more than closing price that are rightly predicted by the algorithm) is 59.3\%. The recall of the algorithm (that is the percentage of times the algorithm is right when it predicts the opening price increases more rapidly than the closing price) is 72.5\%. Since the labels of this algorithms are rather special ("does the opening price increase more rapidly than the closing price?", and not "is the closing price higher than the opening price?"), interpreting the accuracy, precision and recall is rather difficult.
In the following subsections will be reviewed the different models.
\subsection{Model 1: Random normal initialization}
On January 1st, 2016 the S\&P500 index was valued at 2,012.66. So as to compare my results with the index more easily, I have divided the index value by 2.01266. This way, the initial value of the index (1,000) is equal to the money initially invested (\$1,000), which makes understanding and interpreting the results easier. In Figure \ref{fig:plot_model1} one can compare the evolution of the index with the investment made by Model 1 (with a random normal initialization).
\begin{figure}[h!]
\centering
\includegraphics[width=10cm]{model1_sp.PNG}
\caption{Comparison between the investment made by Model 1 and a Buy\&Hold strategy from 2016 until 2018 - each grey dotted line corresponds to the beginning of a year}
\label{fig:plot_model1}
\end{figure}
On 31st December 2018, the algorithm has managed to obtain \$1,125.38 out of its initial investment, which represents a 12.5\% increase. It is twice better than the inflation (+6.2\%), but twice worse than the index increase (+24.6\%). It means no money has been lost, not even inflation adjusted. However, the Buy\&Hold strategy would be more profitable.
For Model 1, $MPE_1 = -10\%$: it means on average, the investment realized by Model 1 is 10\% lower than the Buy\&Hold strategy at the same moment. We want the MPE to be as high as possible.
This first result is not very conclusive: the money obtained eventually is greatly lower than with the Buy\&Hold strategy, not only at the end of the investment period, but also every day on average. Fortunately, it has gained money (even when taking into account the inflation).
\subsection{Model 2: Uniform initialization}
In Figure \ref{fig:plot_model2}, one can compare the evolution of the index with the investment made by Model 2 (with a uniform initialization).
\begin{figure}[h!]
\centering
\includegraphics[width=10cm]{model2_sp.PNG}
\caption{Comparison between the investment made by Model 2 and a Buy\&Hold strategy from 2016 until 2018 - each grey dotted line corresponds to the beginning of a year}
\label{fig:plot_model2}
\end{figure}
On December 31st, 2018, the algorithm has managed to obtain \$1,194.89. It means it has obtained a 19.4\% return with its investment strategy. This is getting closer to the Buy\&Hold (24.6\% return) strategy than Model 1 (12.5\% return).
However, one can notice than from year 1 (the second dotted line in Figure \ref{fig:plot_model2}), there is very often a great gap between the algorithm investment and the index. That is underlined by the MAP of Model 2: $MPE_2 = -8.86\%$. This large figure highlights the great gap of the two final years.
This second result is reassuring: the algorithm is much closer to the Buy\&Hold strategy at the end of the investing period. However, the large MPE shows the success of the algorithm depends on the investment beginning and ending dates. We need models having more certain results.
\subsection{Model 3: He normal initialization}
In Figure \ref{fig:plot_model3}, one can compare the evolution of the index with the investment made by Model 3 (with He normal initialization).
\begin{figure}[h!]
\centering
\includegraphics[width=10cm]{model3_sp.PNG}
\caption{Comparison between the investment made by Model 3 and a Buy\&Hold strategy from 2016 until 2018 - each grey dotted line corresponds to the beginning of a year}
\label{fig:plot_model3}
\end{figure}
On December 31st, 2018, the algorithm has managed to obtain \$1,169.94, which represents a 16.9\% return on investment. Even though this figure indicated a slightly less effective algorithm then Model 2 (19.4\%), when watching Figure \ref{fig:plot_model3} one can notice the black and grey curves seem closer than for Figure \ref{fig:plot_model2}.
This is highlighted by the MPE of Model 3. $MPE_3 = -5.3\%$, which means on average Model 3 is only around 5\% less effective than the Buy\&Hold strategy. A pattern seems to emerge from this plot: when the S\&P500 index decreases strongly, this model seems to manage flattening the curve -- and even make a small profit sometimes! The Buy\&Hold strategy seems to beat the model during long periods of increase: there, the model does not take advantage enough of the situation. This will be addressed in the \nameref{discuss}.
\subsection{Model 4: Orthogonal initialization}
In Figure \ref{fig:plot_model4}, one can compare the evolution of the index with the investment made by Model 4 (with an orthogonal initialization).
\begin{figure}[h!]
\centering
\includegraphics[width=10cm]{model4_sp.PNG}
\caption{Comparison between the investment made by Model 4 and a Buy\&Hold strategy from 2016 until 2018 - each grey dotted line corresponds to the beginning of a year}
\label{fig:plot_model4}
\end{figure}
On December 31st, 2018, the algorithm has managed to obtain \$1,263.174, which represents a 23.3\% return on investment. For the first time, one of the models obtains a very similar result as the Buy\&Hold strategy (24.6\%). It can be interesting to note that the trading day before December 31st, 2018, the Buy\&Hold strategy represented a 18.3\% return, whereas Model 4 enabled a 25.6\% return. Hence, when considering the end of this investment period, both strategies can be considered pretty equivalent.
When looking at Figure \ref{fig:plot_model4}, the black line seems even closer to the grey line as before. This is confirmed by the MPE: $MPE_4 = -4.2\%$. Once again, the pattern described in the previous subsection can be noticed: for instance in 2017 (between both middle dotted grey lines), the S\&P500 index regularly increases whereas Model 4 investment stagnates. The same effect can be seen between trading day 600 and 700. However, between trading day 500 and 600, the index decreases a lot whereas Model 4 investment increases strongly. This happened again between trading day 730 and 780.
\subsection{Model 5: LeCun uniform initialization}
In Figure \ref{fig:plot_model5}, one can compare the evolution of the index with the investment made by Model 5 (with LeCun uniform initialization).
\begin{figure}[h!]
\centering
\includegraphics[width=10cm]{model5_sp.PNG}
\caption{Comparison between the investment made by Model 5 and a Buy\&Hold strategy from 2016 until 2018 - each grey dotted line corresponds to the beginning of a year}
\label{fig:plot_model5}
\end{figure}
On December 31st, 2018, the algorithm has managed to obtain \$1,158.08, which represents a 15.8\% return on investment. This may seem bad compared with the Buy\&Hold strategy (24.5\%), however the trading day before December 31st, 2018, the Buy\&Hold strategy represented a 18.3\% return, whereas Model 4 enabled a 17.6\% return. Hence, this last model is not so far away behind the Buy\&Hold strategy.
Furthermore, this model reaches the best MPE among all others: $MPE_5 = 0.2\%$. It means on average, this model gives (very slightly I must admit) better results than the Buy\&Hold strategy. Indeed, during the first year and half of investment, Model 5 investment remains continuously above the Buy\&Hold investment, until it became outstripped.
\subsection{Overview of the different results}
No model has lost any money, not even inflation adjusted. The worst model had a return of 12.5\%, which is still twice as much as the inflation between 2016 and 2018. The best model reached a return (23.6\%) almost as high as the Buy\&Hold strategy would have (24.5\%).
However, it has been noticed that the comparison with the Buy\&Hold model depends a lot on the final day chosen. Hence, another metric has been introduced: the MPE. Once again, the worst model was worse than the Buy\&Hold strategy (-10\% on average). However, the best model was very similar to the Buy\&Hold strategy, and even very slightly better (+0.2\% on average)!
In the \nameref{discuss}, these results will be dissected, as well as compared with the research papers detailed in the \nameref{sect_littrev}.
\pagebreak
\section{Discussion}
\label{discuss}
\subsection{Achievements}
Beating the Buy\&Hold strategy often is a researcher's goal to prove their forecasting method is efficient. In this dissertation, 5 different models have been trained to forecast whether the S\&P500 index would increase or decrease the next training day. Two of them in particular may be considered relevant forecasting technique: Model 4, which return on investment was 23.6\% (compared with 24.5\% for the Buy\&Hold strategy), and Model 5, which Mean Percentage Error is 0.2\% (so on average equivalent to the Buy\&Hold strategy).
These figures show the Buy\&Hold strategy hasn't been \emph{properly} beaten. However, results show a similar return on investment has been reached. In the subsection \nameref{sec:poss_imp} will be detailed ideas of modifications to (hopefully) improve the results obtained. Then, a \nameref{sec:compare} will be made. But first of all, an \nameref{sec:interpret} will be carried out.
\subsection{Interpretation of the results}
\label{sec:interpret}
The investments made by all five models show a similar pattern: at first the model seems to beat the model, but then sooner or later the index returns on top of the investment. Two different hypotheses can explain that:
\begin{enumerate}
\item At the beginning of the investment period, the models have been trained on 5 years of data (2010-2014), and 200 different companies. During the testing phase, the models are only trained on 75 companies (day after day). This adjustment may not be enough, entailing a decrease of the "knowledge" of the algorithm.
AND/OR
\item The models do better in periods of decrease. Here are a few moments when this is particularly obvious: in Figure \ref{fig:plot_model1}, at the beginning and at the end of year 3, the S\&P500 index declines, whereas the investment made by Model 1 is clearly successful. But during the middle of year 3, the model investment remains flat -- contrarily to the index which increases substantially (please see Figure \ref{fig:plot_model2} and \ref{fig:plot_model4} at the same moments, when these phenomena can be observed). In Figure \ref{fig:plot_model5}, between trading day 170 and 220, the index has a downward pick whereas the model investment clearly flourishes. However, when the index goes back up between trading day 220 and 300, the model investment stagnates.
\end{enumerate}
So as to check whether these hypothesis are correct or not, different tests can be carried out:
\begin{enumerate}
\item During the testing phase, more companies could be used to daily train the model so as to ensure a proper update of its knowledge. Furthermore, the learning technique used here is a mix of fixed and rolling forecasting. Here, the fixed learning period (training and validation data) represents barely 60\% of the whole dataset. Using more training data could change the forecasting ability of the models.
\item The testing period contains a mix of increasing and decreasing moments. Testing on periods with various economic situations (crisis or growth) could help evaluating whether the trend of the models to do better only in periods of decrease is a real characteristic, or was just a coincidence.
\end{enumerate}
\subsection{Possible improvements}
\label{sec:poss_imp}
If the models really were to do better in crisis (or at least in periods of decrease), then using them only at such times could be interesting. Many previous crisis (in 1929, oil shocks, in 2001 and 2008) happened in the US first, then in Europe. Hence, when a crisis begins on the other side of the Atlantic, the models presented in this dissertation could provide a winning trading strategy for some months.
If the models really were to do better in crisis, then comparing them to Buy\&Hold strategy may not be the best evaluating technique. Indeed, Buy\&Hold is a long-term investment strategy and crisis rarely last more than several months, or a few years for the worst.
A lot of data are available for the S\&P500 index, during many years. In the context of this project, it was not possible using all of them because it would have required very powerful GPUs -- or a very long running time. If more hardware were allocated to this project, then using more data would be possible. Then, so as to avoid overfitting, more complex deep learning algorithms should be used.
Another problem entailed by the lack of GPU is that the testing period was rather short for a long term investment. Indeed, running my algorithms as they are was already taking a lot of time, so multiplying by 2 or 3 the testing period was not really possible. Having a powerful GPU could enable a more proper testing during a decade for instance.
Only raw stock prices were analyzed in this dissertation. It represented an attempt to reject the weak for of the efficient market hypothesis. Should one want to focus on the semi-strong form of the hypothesis, including economic data could be a beginning. For instance, instead of three imaging techniques, using one imaging technique for three radically different time series -- and then stacking them again.
Eventually, 3 imaging techniques have been used here. One could try and use variants of them (for instance two versions of GAF exist, or maybe using the wavelet transform on the MTF) or even totally different techniques.
Here is a summary of the possible improvements:
\begin{itemize}
\item Using the models only when an economic crisis has been detected
\item Using a short-term investment strategy to evaluate the models OR testing them during a long-term period
\item Using more data
\item Using more complex algorithms (ResNet101 for instance, or even the new GPT-3!)
\item Using other data -- like economic data
\item Try other imaging techniques
\end{itemize}
\subsection{Comparison with similar research}
\label{sec:compare}
In 2018, Berat Sezer and Ozbayoglua \cite{berat} tried using deep learning algorithms to classify images of stock price. The two main differences between their work and this dissertation is that: 1) they used technical indicators to compute the images (and not just the stock prices), and 2) their classification algorithm had 3 classes (advising the trader to buy a short, a long, or an idle action). They tested their idea on various ETFs against the Buy\&Hold strategy as well, between 2007 and 2017. During this period, the Buy\&Hold annualized return was 4.65\%, and with their algorithm they reached 13.01\%. In this dissertation, the Buy\&Hold strategy had an annualized return of 7.6\%, whereas the model suggested had an annualized return of 7.3\%. Hence, Berat Sezer and Ozbayoglua strategy was twice as better as that of this dissertation in absolute value, or three time as better when comparing with the Buy\&Hold strategy.
In 2020, Barra et al. \cite{barra} tried as well using deep learning algorithms to classify images of stock price. The two main differences between their work and this dissertation is that: 1) they didn't use opening, closing, minimum and maximum stock prices to form their images, but closing prices with 1-hour, 4-hour, 8-hour and 24-hour lags, and 2) their images were only computed with GAF, and not three channels as in this dissertation. They tested their idea on the S\&P500 index against the Buy\&Hold strategy, between 2010 and 2015. During this period, the Buy\&Hold annualized return was 13.7\%, and with their algorithm they reached 21\% (or 1.5 times more).
These two research papers, in which was carried out a work very similar to this dissertation, show a better return on investment than that of the Buy\&Hold strategy can be reached. However, the fact that almost equivalent results to the Buy\&Hold show the idea behind this dissertation can lead to relevant trading techniques -- and, if improved, might reach proper return on investment.
\pagebreak
\section{Conclusion}
\label{concl}
Time series forecasting have been a matter of interest for millenniums, in many different fields. Trading is one of those fields. Many forecasting techniques have been developed, and for the past decades computers have been one of the -- if not the only -- common denominator of these techniques.
Since computer vision has become increasingly powerful nowadays, there have been a few researchers trying to translate time series into images, and then making inferences on them -- instead of on the raw figures. The idea is to try and harness the effectiveness of classification computer vision algorithms -- an this was the aim of this dissertation.
The most important feature of this dissertation was finding relevant imaging technique to visually highlight the characteristics of financial time series. Three techniques seemed particularly praised (the recurrence plot, the Gramian Angular Field, and the Markov Transition Field), so they were all used, and stacked together as channels of an RGB image.
A differentiating point of this dissertation was using the opening, closing, maximum and minimum prices of each trading day, whereas usually only the closing price is used. Indeed, these 4 values seemed to carry more information on a day than a single figure.
The classification algorithm used (ResNet12) was not as complex as those which are used recently (ResNet 101 for instance). Using a more powerful computer, with a GPU, seems the first thing to do to improve this dissertation. It would enable using much more data, and thus upgrading the classification algorithm without risking overfitting.
So as to overcome these difficulties, 5 different models were trained: all having the same architecture, but different initialization methods (as suggested by Barra et al. \cite{barra}). This has given quite noticeable differences, with two algorithms giving similar returns to Buy\&Hold strategy. However, it must be noted that Barra et al., or Berat Sezer and Ozbayoglua \cite{berat} managed to obtain largely better results than Buy\&Hold, with comparable methods.
There is room for many \ref{sec:poss_imp} in this dissertation. However, the results achieve may be a proof that imaging time series for forecasting may be a real trading technique to further investigate.
\pagebreak
\section{Bibliography}
\begin{thebibliography}{9}
\bibitem{meteo}
Wikipedia. Timeline of Meteorology. Retrieved on 02/08/2020 from \url{https://en.wikipedia.org/wiki/Timeline_of_meteorology}
\bibitem{campanharo}
Campanharo et al., 2011, \textit{Duality between Time Series and Networks}
\bibitem{arima}
Petrica, Stancu, Tindeche. 2016. \textit{Limitation of ARIMA models in financial and monetary economics}
\bibitem{conv_lstm}
Wan et al. 2019 \textit{Multivariate Temporal Convolutional Network: A Deep Neural Networks Approach for Multivariate Time Series Forecasting}
\bibitem{jastrzebska}
Agnieszka Jastrzebska. 2019. \textit{Time series classification through visual pattern recognition}
\bibitem{wang}
Wang et Oates. 2015. \textit{Imaging Time-Series to Improve Classification and Imputation}
\bibitem{barra}
Barra et al. 2020. \textit{Deep Learning and Time Series-to-Image Encoding for Financial Forecasting}
\bibitem{stationary}
Kwiatkowski et al. 1992. Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root? \textit{Journal of Econometrics}, 54(1-3), 159–178
\bibitem{thiel}
Thiel, Romano and Kurths. 2004. \textit{How much information is contained in a recurrence plot?}
\bibitem{senin}
Senin and Malinchik. 2013. \textit{SAX-VSM: Interpretable Time Series Classification Using SAX and VectorSpace Model}
\bibitem{silva}
Silva, Souza and Batista. 2013. \textit{ Time Series Classification Using Compression Distance of RecurrencePlots}
\bibitem{souza}
Souza, Silva and Batista. 2014. \textit{Extracting Texture Features for Time Series Classification}
\bibitem{zheng}
Zheng et al. 2014. \textit{Time Series Classification Using Multi-Channels Deep Convolutional Neural Networks}
\bibitem{li}
Li, Kang and Li. 2020. \textit{Forecasting with time series imaging}
\bibitem{said}
Said and Erradi. 2019.\textit{Deep-Gap: A deep learning framework for forecasting crowdsourcing supply/demand gap based on imaging time series and residual learning}
\bibitem{wang_encod}
Wang and Oates. 2015. \textit{Encoding Temporal Markov Dynamics in Graph for Visualizing and Mining Time Series}
\bibitem{le}
Le et al. 2010. \textit{Tiled convolutional neural networks}
\bibitem{ouannes}
Ouannès. Precision and Recall. Retrieved on 08/08/2020 from \url{https://pouannes.github.io/blog/precision-recall/}
\bibitem{berat}
Berat Sezer and Ozbayoglua. 2018. \textit{Algorithmic Financial Trading with Deep Convolutional Neural Networks:Time Series to Image Conversion Approach}
\bibitem{500names}
Wikipedia. List of S\&P500 companies. Retrieved on 09/08/2020 from \url{https://en.wikipedia.org/wiki/List_of_S%26P_500_companies}
\bibitem{fastai_head}
Fastai. Vision.learner. Retrieved on 02/08/2020 from \url{https://docs.fast.ai/vision.learner.html#create_head}
\bibitem{choi}
Choi, Ryu and Kim. 2018. \textit{Short-Term Load Forecasting based on
ResNet and LSTM}
\end{thebibliography}
\pagebreak
\section{Appendix}
\subsection{List of the S\&P500 companies used in this dissertation}
\label{505}
\subsubsection{For the training and validation}
\begin{longtable}[h!]{|c|c|c|}
\hline
\textbf{Symbol} & \textbf{Security} & \textbf{Less than 5 years of data} \\
& & (entered the S\&P after January 2010) \\
\hline \hline
A & Agilent Technologies Inc & \\ \hline
ABC & AmerisourceBergen Corp & \\ \hline
ADI & Analog Devices, Inc. & \\ \hline
AEP & American Electric Power & \\ \hline
AIZ & Assurant & \\ \hline
AMZN & Amazon.com Inc. & \\ \hline
AVY & Avery Dennison Corp & \\ \hline
BAC & Bank of America Corp & \\ \hline
BIIB & Biogen Inc. & \\ \hline
BMY & Bristol-Myers Squibb & \\ \hline
CAG & Conagra Brands & \\ \hline
CCL & Carnival Corp. & \\ \hline
CFG & Citizens Financial Group & x \\ \hline
CHD & Church \& Dwight & 42367 \\ \hline