Fake News Detection

Paper for more detail

Getting Started

Prerequisites

Make sure follow FakeNewsNet instruction to obtain the dataset first.

Pretained word2vec model from google (if you want to use it, otherwise, set it to False)
python 3
gensim
scikit-learn
nltk
matplotlib
numpy
pandas
pytorch

Installing

pip install -r requirement.txt

Run

Replace the first parameter to your fakenewsnet_dataset location.

data = NewsContent('../fakenewsnet_dataset', dataset, ['fake', 'real'])

Start training machine learning classifiers

python main_gcv.py

Start evaluate the models.

python main_eval.py

Start trianing LSTM+ATT

python nn_main.py

Project Structure

classfiers.py: Machine learning classifiers

AdaBoost
K-nearest-neighbor
Support vector machine
Random Forest
XGBoost
Logistic regression

utils.py:

NewsContent Class
- get_features() generator function that returns news title, body, or both preprocessed.
- save_in_sentence_form() generate a json file of all news content with title, body, label key value pair.
- get_list_news_files() generator function that yield each of news file path.
stem_tokens(tokens, stemmer) stem tokens for preprocessing
preprocess(line, token_pattern=token_pattern, exclude_num=True, exclude_stopword=True, stem=True) tokenize words and preprocess
remove_emoji(text) remove emojis for preprocessing
get_ngram(n, sentence) n gram function
tsne_similar_word_plot(model, word) Utility function for visualization. feed in model and a word, plot tsne of similar words.
division(x, y, val = 0.0) to divide two numbers
plot_learning_curve(estimator, title, X, y, ylim=None, cv=None, n_jobs=None, train_sizes=np.linspace(.1, 1.0, 5)) generate plot of training and testing learning curve

CountFeature.py :

CountFeatureGenerator Class
- process_and_save() takes title and body pair data and write count feature into csv file.
- read() read the count feature csv file.
get_article_part_count get ngram of title or body

SentimentFeature.py:

SentimentFeatureGenerator Class
- compute_sentiment() compute polarity score of each sentences in title or body and average them
- process_and_save() takes title and body pair data and write polarity score feature of title and body into csv file.
- read() read the title or body sentiment feature from csv file

Word2VecFeature.py:

Word2VecFeatureGenerator Class
- cosine_sim() compute cosine similarity
- get_title_body_cos_sim() get cosine similarity between a title of article and its body content
- get_nn_vecs() Function to get the word2vec vectors for neural network
- process_and_save() takes title and body pair data and write polarity score feature of title and body into csv file.
- read() read the word2vec feature csv file for machine learning classifiers

SvdFeature.py:

SvdFeature Class
- process_tfidf() get tf-idf matrix
- process_and_save() use SVD (or NMF) to reduce Tf-idf matrix and write into csv file
- read() read the svd feature csv file and make prediction
- get_tfidf_scores() to get vocab and their corresponding scores from tf-idf matrix

Parameters.py: To hold best parameters for various classifier models.

Contribute

If you have any questions, please submit a issue!

Authors

Jun Lin
Glenna Tremblay-Taylor

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
.idea		.idea
learning_rate		learning_rate
new_expt		new_expt
.gitignore		.gitignore
CountFeature.py		CountFeature.py
LICENSE.md		LICENSE.md
NmfFeature.py		NmfFeature.py
README.md		README.md
SentimentFeature.py		SentimentFeature.py
Word2VecFeature.py		Word2VecFeature.py
classfiers.py		classfiers.py
main_eval.py		main_eval.py
main_gcv.py		main_gcv.py
nn_main.py		nn_main.py
nn_model.py		nn_model.py
nn_utils.py		nn_utils.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fake News Detection

Getting Started

Prerequisites

Installing

Run

Project Structure

Contribute

Authors

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Fake News Detection

Getting Started

Prerequisites

Installing

Run

Project Structure

Contribute

Authors

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages