Skip to content

lukef533/GoodReads-WebScraping-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

GoodReads-WebScraping-Project

Objective

This project is focused on scraping and analyzing data from Goodreads.com, with a dual focus: exploring "Best Books" lists from a specific year and conducting a detailed analysis of works by a specific author. The project is structured into two main tasks: Best Books Analysis: Analysis of Goodreads' "Best Books" list for a given year. This includes scraping data like book titles, publication dates, authors, genres, ratings, and more. Author-Level Analysis: Detailed exploration of books by a designated author, collecting similar data points and performing specific analytical inquiries related to the author's works over time.

Methodology

Data Collection

Task 1: Best Books:

Scraped the "Best Books of [Year]" list from Goodreads. For example, if assigned the year 2023, the URL would be https://www.goodreads.com/list/best_of_year/2023. Data points include Title, Publication Date, Author, Genre, Average Rating, Number of Ratings, Number of Pages, Rank, Language, Current Readers, and Want-to-Read counts.

Task 2: Author Analysis:

Scraped all books by a specific author from their Goodreads profile. For example, Stephen King’s profile for initials A–E. Data points similar to Task 1, with additional analysis on language distribution and the relationship between the author's age at publication and various metrics (page count, ratings).

Analysis

Genre Ratings:

Evaluated how average ratings vary across genres.

Popularity and Ratings:

Investigated the relationship between the number of ratings a book receives and its average rating.

Author's Trends:

Analyzed the change in page count and book ratings as the author aged.

Interest and Ratings:

Explored correlations between reader interest (currently reading and want-to-read counts) and book ratings.

Results

Key Insights:

Identified the genres with the highest average ratings. Examined whether more popular books (higher ratings count) receive better or worse ratings. Analyzed the publishing trends of [Author Name], including how their writing evolved over time. Correlations found between page count, ratings, and reader interest.

Visualizations

Scatterplots and line graphs illustrating the relationship between various metrics such as ratings vs. popularity, page count over time, and author's age against book characteristics. Tables summarizing average ratings and ranks by genre, and books by language.

Conclusion

This project provides a comprehensive analysis of book trends on Goodreads, offering insights into reader preferences and authorial evolution over time. The findings can aid publishers, writers, and marketers in understanding the dynamics of book popularity and author development.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published