- cleanup: delete the following files to ensure that they are re computed
html_corpus.txt.gz... contains the extracted training xpathshtml_vocabulary.cvs.gz... the vocabulary to id mappinghtml_corpus.bin.gz... the binary version (translated using the vocabulary) of the html xpath corpus
- generate a file with the Xpath representations using
generate-html-corpus-texts.py. the corresponding XPaths are stored inhtml_corpus.txt.gz. - use
triinput.pyto generate (a) the html vocabulary file and (b) the html corpus file as well as the corresponding training corpora. - use
trilearn.pyfor training the embeddings.
-
Notifications
You must be signed in to change notification settings - Fork 0
License
fhgr/biLSTM
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published