Skip to content

Tom-Owl/OverlookedRLF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

This work has been accepted to Findings of EMNLP 2024. We call attention to RLF, an expressive-valuable and prevailed but overlooked informal style. We show that RLF sentences can serve as signatures of document sentiment and have potential value for online content analysis.

Framework_RLF Figure 1: An overview of our framework for RLF. (1) We introduce Lengthening in Section 3. (2) We propose ExpInstruct and describe prompt details in Section 4.1. (3) Experiments details are in Section 5.

Our contributions are as follows:

  • We introduce Lengthening, a multi-domains dataset featuring RLFs with 850k samples grounding from 4 public datasets for SA tasks.
  • We propose a cost-effective approach ExpInstruct, which can improve the performance and explainability of open-sourced LLMs for RLF to the same level of zero-shot GPT-4.
  • We quantify the explainability of PLMs and LLMs for RLF with a unified approach. Human evaluation demonstrates the reliability of this method.

Data Generation Process

Data Source

Code for Lengthening Generation

python ./code/data_utils.py

We provide sample data at ./data/sample_data.csv

ExpInstruct

Prompt Design for Explanation

We prompt GPT-4 with CoT to generate Word Importance Scores (WIS) to reflect word-level understanding of the input sentence.

Instruction Template

ExpInstruct consists of two tasks with the same prompt template as shown in the figure below:

Task_Instruction Figure 2: Prompt Design and Template for ExpInstruct. (a) Prompt with CoT for word-level explainability. (b) Simple Prompt for SA. (c) Prompt Template for Instruction tuning

Code for ExpInstruct with LLaMA2

python ./code/ExpInstruct_LLaMA2.py

A Unified Approach to Evaluate Explainability

For LLMs, we use a prompt-based method to generate WIS. For PLMs, we choose a saliency-based method to generate WIS. Specifically, we use the occlusion-based method. To eliminate the barrier caused by differing relative WIS values across models, we apply min-max followed by ( L_1 ) normalization to WIS as show in Figure 3.

WIS_example Figure 3: Comparing normalized WIS for an RLF sentence from zero-shot GPT-4 and fine-tuned RoBERTa.

Code for the Unified Approach to Evaluate Explainability

./code/Unified_Exp_eval.ipynb

Human Evaluation

We conduct human evaluation to assess potential errors in our methodology. Task 1 - Annotation for Sentiment Label: Give a sentence (RLF or w/o RLF). Annotators need to give a binary sentiment label (1: Positive, 0: Negative). Task 2 - Annotation for Explanation Reliability: Annotators need to give the reliability score for each result (1: Agree, 0: Disagree). We customized our annotation page with streamlit.

Human_eval_page Figure 4: Our customized user interface for human evaluation. Annotators are asked to do two tasks: annotation for sentiment labels and explanation reliability.

Code for customized user interface for human evaluation

cd code
streamlit run human_eval_page.py

Cite our work

@inproceedings{wang-dragut-2024-overlooked,
    title = "The Overlooked Repetitive Lengthening Form in Sentiment Analysis",
    author = "Wang, Lei  and
      Dragut, Eduard",
    editor = "Al-Onaizan, Yaser  and
      Bansal, Mohit  and
      Chen, Yun-Nung",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2024",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-emnlp.952/",
    doi = "10.18653/v1/2024.findings-emnlp.952",
    pages = "16225--16238"
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors