This GitHub repository contains an EDA project focusing on the production of sugarcane across various countries worldwide. The project utilizes Python libraries such as Pandas, NumPy, Seaborn, and Matplotlib for data manipulation, visualization, and analysis.
-
Data Cleaning:
- Formats data columns to standardize numerical values.
- Renames columns for clarity and consistency.
- Handles missing values by dropping rows with null values.
-
Analysis:
- Visualizes the percentage share of sugarcane production by continent.
- Explores the relationship between acreage and production of sugarcane.
- Identifies outliers and distributions in numerical columns.
- Determines the top sugarcane-producing countries based on various metrics.
- Calculates correlation between different columns using a heatmap.
-
Visualization:
- Utilizes pie charts, bar plots, line plots, violin plots, and box plots to represent data insights.
- Incorporates Seaborn styles for aesthetic visualizations.
- Provides detailed labels and titles for clarity.
-
Documentation:
- Includes detailed code comments for each step of the analysis.
- Utilizes Markdown format for structured documentation.
- Data Cleaning: Cleansing and formatting the dataset.
- Analysis: Exploration and visualization of sugarcane production data.
- Visualization: Visual representation of data insights.
- Documentation: Detailed explanation of each code segment.
- Data Source: 'https://github.com/SPARTANX21/EDA-Sugarcane-Production-in-World/blob/main/Sugarcane.csv'
- Libraries: Pandas, NumPy, Seaborn, Matplotlib
- Clone this repository to your local machine.
- Ensure you have Python and the required libraries installed.
- Run the Python script to execute the analysis.
- Explore visualizations and insights obtained from the project.
- Contributions to improve code efficiency, add new analyses, or enhance visualizations are welcome.
- Fork the repository, make changes, and submit a pull request for review.
- Pranay Shah