You can set up the environment using either venv or Conda.
-
Clone the repository and navigate to the project directory.
-
Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate` -
Install the required packages:
pip install pandas numpy matplotlib seaborn torch torchvision torchaudio scikit-learn transformers tqdm pyyaml flask
-
Clone the repository and navigate to the project directory.
-
Create a Conda environment and activate it:
conda create --name paper_classifier python=3.8 conda activate paper_classifier -
Install the required packages using pip:
pip install -r requirements.txt -
Ensure you have the following files in your project directory:
config.yaml: Configuration filecc_data.parquet: Training datacc_test.parquet: Test datarequirements.txt: List of required packages
A pre-trained model is available for immediate use. You can download it from the following link:
After downloading, place the model file in the appropriate directory as specified in your config.yaml file.
-
Run all the cells in the noteboook.
-
The trained model will be saved as specified in the
config.yamlfile (bert_classifier.pthby default)
-
Start the Flask server:
python app.py -
Open a web browser and go to
http://127.0.0.1:5000(you can Ctrl+click this link in most consoles). -
Upload a
.parquetfile containing scientific paper data. -
The server will process the file and return a
predictions.parquetfile with the classification results (straight to downloads).