This Python project classifies patient data into four distinct classes using two different methodologies: Semantic Routing and Modern_Bert. Additionally, it provides tools for visualizing sentence embedding vectors using t-SNE and U-MAP. For interactive plots, users can set the argument -i to enable this feature.
Install dependencies using:
pip install -r requirements.txtmain.py: The main script to classify data or visualize sentence vectorsExamples.xlsx: Contains a few example patient data for the classification modelsargs.py: Handles command-line argumentsmodules/SemanticRoute.py: Implements the Semantic Routing methodmodules/ModernBert.py: Implements the ModernBert methodutils/t_SNE.py: Script for generating t-SNE visualizationsutils/UMAP.py: Script for generating UMAP visualizationsutils/bert_finetuning.py: Script for finetuning ModernBert model
You can classify patient data into one of four classes using Semantic Routing or Modern_Bert. Run the main.py file with the following arguments:
python main.py --semantic-
By default, the dataset is set to
Classification_v1.jsonand the embedding model is set tosentence-transformers/all-MiniLM-L6-v2 -
To use a custom dataset or embedding model, add the arguments
--dataset <path_to_dataset>or--embedding <embedding_model>
python main.py --bert- Before using ModernBert, download the model weights from this link and place them in the
Modeernbert_modelfolder. - Both methods launch an interactive Gradio application where you can input text and receive classification results.
python main.py --tsne -i- To visualize a different dataset, add the argument
--dataset <path_to_dataset> -ior--interactive: Optional flag to enable interactive plotting.
Example t-SNE plot:
python main.py --umap -i- To visualize a different dataset, add the argument
--dataset <path_to_dataset> -ior--interactive: Optional flag to enable interactive plotting.
Example UMAP plot:
python main.py --semanticpython main.py --tsne --dataset Classification_v1.json -i.
├── main.py # Main entry point
├── args.py # Argument parser
├── Examples.xlsx # Example dataset
├── modules/ # Contains Semantic Routing and ModernBert implementations
│ ├── SemanticRoute.py
│ └── ModernBert.py
├── utils/ # Utility scripts for visualizations and finetuning
│ ├── t_SNE.py
│ ├── UMAP.py
│ └── bert_finetuning.py
├── Datasets/ # Contains classification datasets
│ ├── Classification_v1.json
│ └── Classification_bert.json
├── Modeernbert_model/ # Contains ModernBert model and weights (download from the link before use)
└── Visual_plots/ # Contains t-SNE and UMAP plots (tsne_plot.png, umap_plot.png)
- Make sure to put your custom datasets inside the
Datasetsfolder to work on it. - Currently it supports only
huggingface embedding models. It can further be extended to OpenAI embedding models or any other third-party provider. - For ModernBert, download the model weights from this link and put them in the
Modeernbert_modelfolder before running the classification.

