This project uses DistilBERT, a lightweight version of the popular BERT model, to classify sentences as either active or passive. The model is fine-tuned on a small dataset of sentence examples and can predict the grammatical voice of any given sentence.
- Text Classification: Classifies sentences into two categories: Active or Passive.
- Transformer-based Model: Utilizes DistilBERT for high-performance NLP.
- Minimal Data: Fine-tuned with a small dataset, leveraging the power of transfer learning.
- Fast Inference: Thanks to the lightweight DistilBERT architecture.
Google Colab Link: Active/Passive Sentence Classifier
Hugging Face: ActiveVoice_PassiveVoice_Classifier
You can install the required libraries using pip:
pip install transformers tensorflow datasets numpy pandas matplotlib scikit-learnThe model is fine-tuned on a custom dataset with sentences labeled as active or passive. Below are the steps to fine-tune the DistilBERT model:
The input sentences are tokenized using the DistilBERT tokenizer, which converts them into a format that can be processed by the model. The tokenizer handles padding and truncation to ensure uniform input length.
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
inputs = tokenizer(texts, return_tensors="tf", padding=True, truncation=True, max_length=128)The DistilBERT model is loaded and prepared for fine-tuning. It is instantiated with a classification head for sequence classification.
model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')The model is compiled with the Adam optimizer and sparse categorical cross-entropy loss function since this is a classification task.
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=2e-5), loss='sparse_categorical_crossentropy', metrics=['accuracy'])Once the model is fine-tuned and saved, you can use it to classify new sentences as active or passive.
Prayas Jadhav