Ideas

Priority 1

Priority 2

Clustering according to word level and phrase level.
Comparing topical distribution (LDA, BERTopic) - Trying the syntactic language model (probabilistic context-free grammars)
Comparing the sentiment
Cluster minimas and maximas using vector semantics
Experimenting with attention

Priority 3

Analyzing the different styles that capture people's attention or influence them.
Explainable AI
- Local Explanations: Techniques like LIME approximate the behavior of complex models with simpler, interpretable models to highlight factors influencing a specific prediction.
- Example-Based Explanations: These explanations show similar examples from the data to provide context and help understand the model's actions.
- Feature Importance: Methods like SHAP use game theory to assign importance values to input features, showing which ones most influenced a decision.
- Counterfactuals: These explanations describe the minimum changes to the input that would lead to a different outcome, helping users understand how to alter a decision.
- Trying language diffusion models in stylometry (if Llada models destroy the syntactic structure)
Try Anomaly Detection

Q

Words are similar in certain dimensions. Can I focus on certain dimensions while clustering?
- How to identify those dimensions?