Skip to content

AnmolPatel20/InstaParseX-Unstructured_to_Structured

Repository files navigation

📊 InstaParseX

✨ Text to Insights

A pure Python project that parses unstructured Instagram profile text and converts it into structured JSON data.

📌 Project Overview

Social media data is often available in raw, unstructured text formats that are not directly suitable for analysis.
InstaParseX demonstrates how core Python only can be used to process such data without relying on external libraries.

The project focuses on:

  • Reading raw Instagram profile text from files
  • Parsing and cleaning profile information
  • Converting unstructured text into structured JSON
  • Validating logic using both initial and final datasets

📁 Repository Structure

📄 initialdata.txt

  • Raw, unprocessed Instagram profile data in text format
  • Used to test the initial parsing logic

📄 finaldata.txt

  • Refined version of the raw text data
  • Used for final parsing and accurate extraction

📄 data.json

  • Structured output generated after parsing
  • Contains:
    • Username
    • Followers
    • Following
    • Posts
    • Category

📓 testing_with_initial_data-Copy1.ipynb

  • Notebook for testing parsing logic on initial data

📓 testing_with_final_data.ipynb

  • Notebook containing finalized parsing logic and results

⚙️ Key Features

  • 🧩 Parses unstructured Instagram profile text
  • 🐍 Uses only core Python string operations
  • 🔢 Converts follower counts (K / M) into numeric values
  • 📦 Stores output in JSON format
  • ✅ Tested on multiple datasets

🛠️ Tech Stack

  • 🐍 Python (Core)
  • ✂️ String Manipulation
  • 📂 File Handling
  • 📋 Lists and Dictionaries
  • 🗂️ JSON (Built-in)

❌ No external libraries
❌ No regex
❌ No frameworks

📊 Use Cases

  • Beginner-friendly data processing project
  • Portfolio project for Python fundamentals
  • Practice for parsing unstructured data
  • Foundation for future data analytics or NLP projects

🎓 Learning Outcomes

  • Strong understanding of text parsing fundamentals
  • Experience handling unstructured data
  • Improved Python logic and problem-solving skills
  • Real-world data processing practice

🔮 Future Enhancements

  • 📈 Add data visualizations
  • 📊 Calculate engagement metrics
  • 🌐 Extend parsing to other platforms
  • 🤖 Automate data input

🏷️ Project Tagline

InstaParseX – Unstructured to Structured

👤 Author

-Anmol Patel

About

InstaParseX is a Python-based project that parses raw Instagram profile text to extract usernames, followers, following, posts, and categories for structured analysis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors