This project is a language recommender system that suggests languages for users to learn based on the languages they already know. It is built with Python and uses a Variational Autoencoder (VAE) model implemented with PyTorch.
The project provides three main functionalities:
- Training: A script to train the VAE model on a dataset of user language skills.
- Recommendation: A command-line interface (CLI) and a FastAPI endpoint to get language recommendations.
- Hyperparameter Tuning: A script to automatically tune the hyperparameters of the model using Optuna.
To install the project and its dependencies, run the following command:
pip install -e .To train the model with default hyperparameters, run the train.py script:
python scripts/train.pyThis will train the VAE model and save the trained model to the models/ directory.
To train the model with a specific set of hyperparameters (e.g., from hyperparameter tuning), provide a path to a hyperparameter configuration file:
python scripts/train.py --hyperparameters best_hyperparameters.jsonTo get language recommendations from the command line, use the recommend.py script:
python scripts/recommend.py English JapaneseTo run the FastAPI application, use the following command:
uvicorn nlmt_v2.api.main:app --reload --app-dir srcThis will start a local server. You can then send a POST request to http://127.0.0.1:8000/recommend:
curl -X POST -H "Content-Type: application/json" -d '{
"known_languages": ["English", "Japanese"],
"top_k": 5
}' http://127.0.0.1:8000/recommendTo automatically tune the hyperparameters of the model, you can use the tune_hyperparameters.py script:
python scripts/tune_hyperparameters.py --n-trials 100This will run 100 trials and save the best hyperparameters to a best_hyperparameters.json file. You can then use this file to train the model as described in the Training section.
- Project Structure: The project follows a standard Python project structure, with source code in the
src/directory and scripts in thescripts/directory. - Configuration: Hyperparameters are managed using Pydantic models in the
src/nlmt_v2/configdirectory. - Model Versioning: Trained models are saved with a timestamp to allow for versioning.