Punctuation API

Installation

git lfs install
git clone https://github.com/eleldar/Punctuation.git
cd Punctuation
python3 -m venv venv
. venv/bin/activate
pip install -r requirements.txt
cd models
git clone https://huggingface.co/eleldar/rubert-base-cased-sentence
git clone https://huggingface.co/eleldar/repunct-model_ft repunct-model_ft/weights/

Usage

(venv)$ python main.py

open http://127.0.0.1:8000/docs in browser!

How it works

Before inserting raw text into model it should be tokenized. Library handle it with BaseDataset.parse_tokens

Model architecture is pretty easy and straight forward:

BERT layer - DeepPavlov/rubert-base-cased-sentence language model
Bi-LSTM layer - to reduce demsions
Linear layer - final layer to predict what symbol should go after token

Links

Article on habr.ru

This repository contains code (which was edited for production purposes) from xashru/punctuation-restoration.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
models/repunct-model_ft		models/repunct-model_ft
src/neuro_comma		src/neuro_comma
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Punctuation API

Installation

Usage

How it works

Links

About

Uh oh!

Releases

Packages

Languages

eleldar/Punctuation

Folders and files

Latest commit

History

Repository files navigation

Punctuation API

Installation

Usage

How it works

Links

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages