This is our group's submission for the KANTAR-NUS DACC Data Science Hackathon 2021. Our group achieved the 1st position in this hackathon as well.
Data Science Techniques:
- Exploratory Data Analysis
- K-Means Clustering
- Feature Extraction
- Feature Engineering
- Market Basket Analysis
- Recommender System
- Natural Language Processing (NLP)
The main objective of the Hackathon is to encourage the population to minimise their spending on packaged foods while maintaining an optimal level of calories consumption based on previous habits. In other words, it is to promote healthier purchasing and consumption habits.
To achieve this, our group’s sub-objectives are as follows:
-
To identify different consumers segments based on their personal details and overall lifestyle such as spending habits and media consumption
-
To identify suitable media channels through which we can maximize our campaign’s media outreach
-
To incentivise and encourage different consumer segments to continually purchase healthier substitutes
-
To design a mobile application which utilises a recommendation system for the different consumer segments to purchase healthier substitutes
More importantly, our overall plan focuses on a long-term implementation period because we believe the adoption of a healthier lifestyle requires time. Using this approach, the future potential of our strategies will ensure that the effects of these measures are not temporary and truly encourage change in mindset and lifestyle among consumers.
Using this flowchart to better explain our architecture, items in our dataset will be passed into the Bidirectional Encoder Representation from Transformers (BERT) model with the aim of finding semantically similar items. By converting the items into word embeddings, we simplify the problem by calculating the cosine similarity between two words to decide if they are semantically similar.
To give a better understanding of how this works, we plotted out the word embeddings in a 3D space.
Words.in.3D.Space.mp4
The main motivation of this method is so that we provide reasonable substitutes for a given food, instead of doing so manually. From this GIF, we can see that the Spaghetti-Rice pair is much closer in the 3D space as compared to the Honey-Sugar pair. Hence, with this interesting characteristic, our architecture is able to provide sensible and healthier options by filtering through this NLP layer and by rewarding/penalising a given food based on their nutrition.
- Lua Jun An
- Keith Tay Xiang Rui
- Tu Zhehao
- Timothy Wong Hoey Pheen
- Ahmad As-Shodiqqul Amin
- Dione Lim Yee Sze
- Kellie Chin Shu Wen
- Ni Hui Ling
