My attempt to the 'Predict Future Sales' Kaggle competition : https://www.kaggle.com/competitions/competitive-data-science-predict-future-sales/data
First version of my data pre-processing. I've identified the most evident issues with this dataset such as sparsity & location variations.
Before going into the model training I have to be sure that I'm totally aware of how RNN works and how to implement them correctly. https://towardsdatascience.com/temporal-loops-intro-to-recurrent-neural-networks-for-time-series-forecasting-in-python-b0398963dc1f
Second version of my data pre-processing. One-hot encodagind of temporal features, daily time lag added.
Data seems to come from mutlimedia stores, meaning that product are DVDs, Blu-rays, video-games, computer, pc, etc. -> This has huge implications. In fact, GLOBAL sales may be following monthly/yearly trends but, when looking at the product level, sales are highly sparsed in time. For example, most sales for a video game are made right after the game comes out.
First attempt predicting future sales using Amazon DeepAR algorithm RMSE = 4.48
New attempt at implementing DeepAR using GluonTS library, it looks like that the implementation went well. Nevertheless, the results are not the expected one. The isssue may come from the random sampling, pushing to model to learn from 0 sales product from the testset.
I choose to focus on sales forecasting by product category (currently regardless of the shop)
It went pretty well, but lot of changes/upgrades are coming.
Here are some very interesting prediction for some relevant product categories:
