This repository contains the codebase for Project 2 of the subject COMP90051 – Statistical Machine Learning, Semester 1, 2025, at the University of Melbourne. This project is part of a group submission for Group 28.
Team Members:
- Ankita Holey
- Richardo Husni
- Sarathi Thirumalai Soundararajan
The objective of this project is to build a classification model to distinguish between machine-generated and human-written text, leveraging domain adaptation and imbalance handling techniques. The project was conducted as part of a Kaggle competition outlined in the Project Specification PDF, which includes additional details on data format, performance evaluation, and submission criteria.
If you haven't already cloned the repository, run:
git clone https://github.com/sthirumalais/COMP90051-A2.git
cd COMP90051-A2Ensure you have Conda (Anaconda or Miniconda) installed, then create the environment:
conda env create -f environment.ymlThis will create a Conda environment named p2 with all required dependencies.
Activate the newly created environment:
conda activate p2Register the Jupyter kernel:
python -m ipykernel install \
--user \
--name COMP90051-P2 \
--display-name "COMP90051-P2"Verify that the kernel COMP90051-P2 is available:
jupyter lab
# or
jupyter notebookSelect the COMP90051-P2 kernel when prompted.
To ensure reproducibility and correctness, run the notebooks in the following order:
Pre-Processing.ipynbFeature-Engineering.ipynb- Then, run any of the model notebooks below as needed:
- Ensure your kernel is set to
COMP90051-P2in each notebook. - If you encounter issues with dependencies, try updating Conda and re-creating the environment.
Happy coding!