GitHub - megha-66/DistilBERT-Model: DistilBERT Model for sequence classification. Into Transformers these days!

Problem Statement :-

In this work, we are finetuning a DistilBERT model on a drilling logs dataset by ONGC. The goal is to automate the process of event extraction. The input variable is the "Service hour type", which is an attribute of "text data type". The output or the variable to be predicted is the "Cde" or Code value, which is an Alphanumeric data type.

Steps followed :-

Conversion of given dataset in excel format to Python dataframe object.
Dropping all null values.
Preprocessing the text of the input column, i.e, The service hours type.
Discarding the very rare instances of input, because keep them in the training set will lead to a class imbalance issue. ( Threshold for the number of instances is set to be 10 ).
Label encoding the output/target features.
Splitting the dataset in the ratio of 7:3. Here, I have done stratified splitting.
Tokenizing the train & test encodings.
Forming Pytorch datasets.
Training the model for 30 epochs.
Evaluating the training loss & accuracy.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
DistilBert.ipynb		DistilBert.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Problem Statement :-

Steps followed :-

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Problem Statement :-

Steps followed :-

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages