This project is designed to create NLP as a service with code base for both front end GUI (streamlit) and backend server (FastApi) the usage of transformers models on various downstream NLP task.
The downstream NLP tasks covered:
-
News Classification
-
Entity Recognition
-
Sentiment Analysis
-
Summarization
The user can select different models from the drop down to run the inference.
The users can also directly use the backend fastapi server to have a command line inference.
- Python Code Base: Built using
FastapiandStreamlitmaking the complete code base in Python. - Expandable: The backend is desinged in a way that it can be expanded with more Transformer based models and it will be available in the front end app automatically.
- Micro-Services: The backend is designed with a microservices architecture, with dockerfile for each service and leveraging on Nginx as a reverse proxy to each independently running service.
- This makes it easy to update, manitain, start, stop individual NLP services.
- Clone the Repo.
- Run the
Docker Composeto spin up the Fastapi based backend service. - Run the Streamlit app with the
streamlit run command.
-
Download the models
- Download the models from here
- Save them in the specific model folders inside the
src_fastapifolder.
-
Running the backend service.
- Go to the
src_fastapifolder - Run the
Docker Composecomnand
$ cd src_fastapi src_fastapi:~$ sudo docker-compose up -d - Go to the
-
Running the frontend app.
Go to the
src_streamlitfolder- Create the docker image from the
Docker File - Then execute the docker image to spin up a container.
$ cd src_streamlit src_streamlit:~$ sudo docker build -t streamlit_app . src_streamlit:~$ sudo docker run -d -p 8501:8501 --network=src_fastapi_default --name streamlit_app streamlit_app- Run the app with the streamlit run command
$ cd src_streamlit src_streamlit:~$ streamlit run NLPfily.py - Create the docker image from the
-
Access to Fastapi Documentation: Since this is a microservice based design, every NLP task has its own seperate documentation
- News Classification: http://localhost:8080/api/v1/classification/docs
- Sentiment Analysis: http://localhost:8080/api/v1/sentiment/docs
- NER: http://localhost:8080/api/v1/ner/docs
- Summarization: http://localhost:8080/api/v1/summary/docs
-
Front End: Front end code is in the
src_streamlitfolder. Along with theDockerfileandrequirements.txt -
Back End: Back End code is in the
src_fastapifolder.- This folder contains directory for each task:
Classification,ner,summary...etc - Each NLP task has been implemented as a microservice, with its own fastapi server and requirements and Dockerfile so that they can be independently mantained and managed.
- Each NLP task has its own folder and within each folder each trained model has 1 folder each. For example:
- sentiment > app > api > distilbert - model.bin - network.py - tokeniser files >roberta - model.bin - network.py - tokeniser files-
For each new model under each service a new folder will have to be added.
-
Each folder model will need the following files:
- Model bin file.
- Tokenizer files
network.pyDefining the class of the model if customised model used.
-
config.json: This file contains the details of the models in the backend and the dataset they are trained on.
- This folder contains directory for each task:
-
Fine Tune a transformer model for specific task. You can leverage the transformers-tutorials
-
Save the model files, tokenizer files and also create a
network.pyscript if using a customized training network. -
Create a directory within the NLP task with
directory_nameas themodel nameand save all the files in this directory. -
Update the
config.jsonwith the model details and dataset details. -
Update the
<service>pro.pywith the correct imports and conditions where the model is imported. For example for a new Bert model in Classification Task, do the following:-
Create a new directory in
classification/app/api/. Directory namebert. -
Update
config.jsonwith following:"classification": { "model-1": { "name": "DistilBERT", "info": "This model is trained on News Aggregator Dataset from UC Irvin Machine Learning Repository. The news headlines are classified into 4 categories: **Business**, **Science and Technology**, **Entertainment**, **Health**. [New Dataset](https://archive.ics.uci.edu/ml/datasets/News+Aggregator)" }, "model-2": { "name": "BERT", "info": "Model Info" } }
-
Update
classificationpro.pywith the following snippets:Only if customized class used
from classification.bert import BertClass
Section where the model is selected
if model == "bert": self.model = BertClass() self.tokenizer = BertTokenizerFast.from_pretrained(self.path)
-
Images from https://pixabay.com *(Free for commercial use, No attribution required )
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

