Skip to content

Latest commit

 

History

History
34 lines (26 loc) · 2.18 KB

File metadata and controls

34 lines (26 loc) · 2.18 KB

Explaining Drift in Text Data with Document Embeddings

This repository provides a software pipeline in order to explain drift between two sets of documents using embeddings.

First experiments indicate that BERT document embeddings outperform Doc2Vec document embeddings.

Documentation

Developer Information

  • Goal: Reusable, complete and documented code (good for developers, reviewers, everyone)
  • If you add new classes, please provide minimal code examples, put them into the doc directory and add a link above.
  • Directories
    • doc: Documentation (e.g. how to read data)
    • experiments Jupyter notebooks (e.g. combine class instances into a process generating explanations)
    • transformation: Classes for data transformation (e.g. create embeddings, reduce dimensions)
    • access: Classes for data access (e.g. read or split embeddings)
    • explanations: Classes for the explanation process (e.g. handling ml models, generate explanations)
    • scripts: Small sets of commands (e.g. to synchronize repositories)
  • How to name your code: PEP 8 - Style Guide for Python Code

Acknowledgments

This work has been supported by the German FederalMinistry of Education and Research (BMBF) within the project EML4U under the grant no 01IS19080B.