Skip to content

Latest commit

 

History

History
33 lines (28 loc) · 1.57 KB

File metadata and controls

33 lines (28 loc) · 1.57 KB

Ideas

Priority 1

  • Use a morphological analyzer for preprocessing before the bow.
  • Use n-grams: character, words, and syntactic levels (POS n-grams)
  • Use TF-IDF or TF-IGF

Priority 2

  • Clustering according to word level and phrase level.
  • Comparing topical distribution (LDA, BERTopic) - Trying the syntactic language model (probabilistic context-free grammars)
  • Comparing the sentiment
  • Cluster minimas and maximas using vector semantics
  • Experimenting with attention

Priority 3

  • Analyzing the different styles that capture people's attention or influence them.
  • Explainable AI
    • Local Explanations: Techniques like LIME approximate the behavior of complex models with simpler, interpretable models to highlight factors influencing a specific prediction.
    • Example-Based Explanations: These explanations show similar examples from the data to provide context and help understand the model's actions.
    • Feature Importance: Methods like SHAP use game theory to assign importance values to input features, showing which ones most influenced a decision.
    • Counterfactuals: These explanations describe the minimum changes to the input that would lead to a different outcome, helping users understand how to alter a decision.
    • Trying language diffusion models in stylometry (if Llada models destroy the syntactic structure)
  • Try Anomaly Detection

Q

  • Words are similar in certain dimensions. Can I focus on certain dimensions while clustering?
    • How to identify those dimensions?

Readings

  • Vector semantics and embeddings from SLP
  • Stylometry Analysis
  • Appendix G in SLP