Project submission by Michael Janschek for Data Stream Processing class in University
Contents:
- Project description
- Requirements
- Getting started
The project descrition is available as notebooks at tweetAnalysis.ipynb, calcPredictions.ipynb and evalPredictions.ipynb.
Also, there are rendered versions in the doc directory.
Following programs and libraries should be installed and functioning for running this project and its specified application part.
- JDK 8
- Eclipse with Gradle Plugin
- Personally, I used "Buildship Gradle Integration" which can be installed via the Eclipse Marketplace
- Spark 2.2.0
- JDK 8
- Spark 2.2.0
- Python 3.5
- python libraries:
- pandas
- numpy
- datetime
Requirements as python script plus:
- matplotlib (for plots)
- Jupyter
There are demo videos accessible in demo/videos.
- Unpack %PROJECT_DIR%/release_0.1.zip into a directory, this is you %WORKING_DIR% now
- Execute %WORKING_DIR%/startTweetStream.sh
- Open Eclipse
- Import this project as gradle project
- Use gradle plugin to build this project
- all built files should be in %PROJECT_DIR%/target now
- Edit %PROJECT_DIR%/src/main/resources/application.properties
- Set WORKINGDIR to your desired working directory
- Set boolean options as desired
- You CAN set change hashtags and filters, BUT this is not recommended due to use of constant hashtags in the code
- You can run this project inside eclipse as Java Application.
- You also can build this project and submit the file to Spark via the script startTweetStream.sh.
- Edit %PROJECT_DIR%/target/tweetAnalysis.py
- set workDir to your desired working directory, make sure the csv files from %PROJECT_DIR%/data are there
- run the python script in this directory
- python3 tweetAnalysis.py
- Open the terminal and navigate to %PROJECT_DIR%/target
- Run command
- jupyter notebook
- By default, your webbrowser will open the jupyter GUI
- Open the notebooks:
- tweetAnalysis.ipynb
- calcPredictions.ipynb
- evalPredictions.ipynb
- If wanted, run the notebook step-by-step