GitHub - radu-ion/ToxicTagger

Installation

Install all requirements with:

pip3 install -r requirements.txt

Download the nltk English tokenizer:

python3 -c "import nltk; nltk.download('punkt')"

Download the English fastText word embeddings file from https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.en.300.vec.gz and save the file under folder data/ of this repository. I will automate this, eventually.

Running

Execute a test run as follows:

python3 offensive.py -r models/<latest model> test_file.txt

All output is printed to the sys.stdout file.

About

A simple LSTM-based neural network, written in PyTorch 1.10, that detects contiguous spans of "toxic text". By "toxic text" we understand insulting/threating/identity-based attacking/obscene phrases. Training data comes from the ``Semeval-2021 Task 5: Toxic Spans Detection'' competition. The link to the relevant paper is here: https://aclanthology.org/2021.semeval-1.6.pdf.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
models		models
runs		runs
.gitignore		.gitignore
README.md		README.md
dataset.py		dataset.py
offensive.py		offensive.py
requirements.txt		requirements.txt
test_file.txt		test_file.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Running

About

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Installation

Running

About

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages