Master's Thesis : Code Repository

Climate Change Representation in IPCC Reports and Wikipedia: A Comparative Analysis Through Natural Language Processing

Public link to the paper : via Zenodo. Should be published on ECRIN later this year.

This code repository contains 3 directories:

corpus
python_scripts
results_csv

Corpus

The corpus directory contains all corpus versions used in the analyses.

Note: IPCC documents are not included in this repository due to copyright restrictions.

The corpus directory contains 4 subfolders:

A_raw_corpus

This folder contains the initially obtained raw texts following data acquisition (Master's Thesis, Section 3.2.1).

B_cleaned_corpus

This folder contains the cleaned raw texts (Section 3.2.1).

C_preprocessed_docbin_obj

This folder contains preprocessed texts saved to disk as spaCy Doc objects using the DocBin class in spaCy. These are used for most of the analyses (Section 3.2.2).

D_semantic_chunking

This folder contains cleaned raw texts that were semantically chunked for Sentiment Analysis and Emotion Detection (Section 3.3.5), and also used in Named Entity Recognition analysis (Section 3.3.6).

python_scripts

The python_scripts directory contains the complete set of Python scripts employed in the analyses.

0_data_visualisation.py

This script was used for figure generation and does not correspond to any particular section.

1_convert_AR6_to_text.py

Used for raw text retrieval from the AR6 WG3 SPM PDF (Section 3.2.1).

1_wikipedia_extract_to_text.py

Used to extract articles from Wikipedia and save them as raw text (Section 3.2.1).

2_preprocessing.py

Used to preprocess the corpus using spaCy (Section 3.2.2).

3_lexicometry.py

Used to generate lexicometric, stylistic, and readability analysis results (Section 3.3.1 for methodology; Section 4.1 for results).

4_modality.py

Used to compute frequencies for the modality analysis (Sections 3.3.2 and 4.2).

5_semantic_similarity.py

Used to compare semantic similarity between selected text pairs (Sections 3.3.3 and 4.3).

6_topic_modeling.py

Used to perform topic modelling on the corpus (Sections 3.3.4 and 4.4).

7_sentiment_emotion.py

Used for sentiment analysis and emotion detection (Sections 3.3.5 and 4.5).

8_NER.py

Used for named entity recognition and analysis (Sections 3.3.6 and 4.6).

Results

The results_csv directory contains all results generated by the Python scripts, saved as CSV files. These results are identical to those presented in the tables and figures of the thesis. Each file shares the same name as its corresponding Python script in the python_scripts directory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Master's Thesis : Code Repository

Climate Change Representation in IPCC Reports and Wikipedia: A Comparative Analysis Through Natural Language Processing

This code repository contains 3 directories:

Corpus

python_scripts

Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
corpus		corpus
python_scripts		python_scripts
results_csv		results_csv
readme.md		readme.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Master's Thesis : Code Repository

Climate Change Representation in IPCC Reports and Wikipedia: A Comparative Analysis Through Natural Language Processing

This code repository contains 3 directories:

Corpus

python_scripts

Results

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages