Fast, production-friendly parser for extracting Google Play package names and Apple App Store IDs from raw HTML or live URLs.
- Features
- Technology Stack
- Technical Notes
- Getting Started
- Testing
- Deployment
- Usage
- Configuration
- License
- Contacts
- Support the Project
- Dual ecosystem parsing in one pass
- Extracts Android package names from Google Play links.
- Extracts numeric App Store IDs from Apple URLs and iTunes-style links.
- Multi-source input workflow
- Add multiple source blocks and parse everything in one run.
- Supports both raw HTML paste mode and URL fetch mode.
- Deterministic deduplication
- Removes duplicates while preserving first-seen order, which keeps exports clean without losing practical context.
- Localization-ready UI
- Built-in language support in the app and separate JS locale assets in
locales/.
- Built-in language support in the app and separate JS locale assets in
- Operator-centric UX
- Quick source switching, expandable source cards, and immediate result metrics.
- Export-first output
- One-click
.txtdownloads for Google Play and App Store IDs (gp.txt,as.txt).
- One-click
- Language:
Python - Framework/UI:
Streamlit - HTTP Client:
requests - Parsing strategy:
re(Python regular expressions) - Frontend styling: custom CSS (
assets/style.css) - Localization assets: Python dictionary (
translations.py) + JS locale modules (locales/*.js)
.
βββ app.py # Streamlit entrypoint, UI, input orchestration, result rendering
βββ parser_logic.py # Regex extraction and HTML title detection helpers
βββ translations.py # App localization dictionary used by the Streamlit UI
βββ locales/ # Standalone JS locale modules (one file per language)
βββ assets/
β βββ style.css # UI styling layer
βββ requirements.txt # Python dependencies
βββ LICENSE # GPL-3.0 license text
βββ README.md # Project documentation
- Regex over heavy HTML parsing
- The target data follows stable URL-like patterns, so regex is faster, simpler, and easier to maintain for this use case.
- Stateful multi-source UX with
st.session_state- Keeps user inputs persistent during UI reruns and enables an efficient βadd/remove sourceβ flow.
- Separated extraction logic
parser_logic.pyisolates parser behavior from UI code, making it easier to test and evolve safely.
- Order-preserving deduplication
- Uses dictionary-order semantics to preserve the first useful occurrence for analyst workflows.
- Fail-soft network behavior
- URL fetch errors are surfaced in UI without crashing the parsing session.
Make sure your environment has:
Python 3.9+(3.10+ recommended)pip(latest stable preferred)- Optional but recommended:
venvor another virtual environment managergit
# 1) Clone the repository
git clone https://github.com/ostinua/app-discovery-parser-appid-extractor.git
cd app-discovery-parser-appid-extractor
# 2) Create and activate a virtual environment
python -m venv .venv
# Linux/macOS
source .venv/bin/activate
# Windows (PowerShell)
# .venv\Scripts\Activate.ps1
# 3) Install dependencies
pip install -r requirements.txtThere is currently no full automated test suite committed in this repository, but you can still run reliable sanity checks before shipping changes:
# Dependency integrity check
pip install -r requirements.txt
# Basic parser smoke check
python - <<'PY'
from parser_logic import extract_google_play_ids, extract_app_store_ids
sample = '''
https://play.google.com/store/apps/details?id=com.example.app
https://apps.apple.com/us/app/sample/id123456789
'''
print(extract_google_play_ids(sample))
print(extract_app_store_ids(sample))
PY
# Streamlit app boot check
streamlit run app.pyRecommended contributor tooling:
pytestfor unit testsruff/flake8for lintingblackfor formatting
For production-ish usage, you can deploy this app in a few common ways:
- Streamlit Community Cloud
- Push to GitHub, configure app entrypoint as
app.py, and install dependencies fromrequirements.txt.
- Push to GitHub, configure app entrypoint as
- Container-based deployment
- Package the app in Docker and expose the Streamlit port (
8501by default).
- Package the app in Docker and expose the Streamlit port (
- VM/Bare-metal
- Run behind a reverse proxy (Nginx/Caddy), pin Python environment, and manage process with
systemdorsupervisor.
- Run behind a reverse proxy (Nginx/Caddy), pin Python environment, and manage process with
Minimal local production command:
streamlit run app.py --server.address 0.0.0.0 --server.port 8501# Start the app
streamlit run app.py
# Then open the local URL shown in terminal (usually http://localhost:8501)Typical workflow in the UI:
1) Choose language in the sidebar.
2) Add one or more sources (HTML paste or URL mode).
3) Click "Parse All Data".
4) Review Google Play and App Store panels.
5) Download results as gp.txt / as.txt.
Pro tip for data completeness: when scraping store listing pages manually, scroll to the bottom first so lazy-loaded blocks are present in page source.
This project is intentionally low-config and does not require a mandatory .env file for baseline execution.
Current runtime knobs are primarily Streamlit flags:
--server.address--server.port--server.headless
Example:
streamlit run app.py --server.address 0.0.0.0 --server.port 8501 --server.headless trueIf you introduce environment variables in future updates, document them here with default values and security notes.
This project is distributed under the GPL-3.0 license. See LICENSE for the full legal text.
Maintainer and project author channels:
If you find this tool useful, consider leaving a β on GitHub or supporting the author directly: