Estimate how frequently Python packages are imported across public GitHub repositories.
📊 View Live Dashboard | Methodology | Download Data
We determine package popularity by:
- Randomly sampling GitHub repositories with Python as the main language
- Analyzing Python import statements in these repositories
- Extrapolating findings based on the total Python repository count (~18M repositories
The system continually improves its accuracy by sampling additional repositories every 6 hours via GitHub Actions.
Note: We have stopped considering standard Python libraries but have not yet removed all the data.
Traditional package metrics like download counts are increasingly unreliable indicators of real-world usage. Every CI/CD run, dependency resolver cache miss, and mirror sync inflates these numbers without representing actual adoption. A package with millions of downloads might be used by only a handful of projects, while genuinely popular tools can be undercounted due to efficient caching.
This project provides a more meaningful metric: actual usage in production code. By analyzing import statements across a random sample of GitHub's ~18 million Python repositories, we capture how developers genuinely use packages in their projects. This approach offers package maintainers something download counts cannot—contextual understanding of their impact. Knowing your package appears in 2% of Python repositories, or has 70% of the adoption rate of an industry leader, provides actionable insights about market penetration and growth opportunities.
For the open source community, these metrics democratize impact measurement. Smaller, specialized packages can demonstrate their value within their niche, funding conversations become data-driven, and developers can make more informed decisions about dependencies based on actual adoption patterns rather than inflated download statistics.
- Track Real Adoption: See how many projects actually import your package
- Benchmark Performance: Compare your package's usage against similar tools
- Identify Ecosystems: Discover which packages are commonly used alongside yours
- Measure Growth: Monitor adoption trends over time as we continuously sample
- Support Funding Applications: Provide concrete usage data for grant proposals
- Add Usage Badges: Display your package's real usage statistics in your README with our badges
| Script | Purpose |
|---|---|
| find_repos.py | Queries GitHub API for random Python repositories |
| analyze_imports.py | Extracts import statements from repository files |
| count_libs.py | Aggregates and calculates package usage statistics |
| update_readme.py | Refreshes this README with latest data |
| total_python_repos.ipynb | Estimates total Python repository count on GitHub |
| File | Description | Format |
|---|---|---|
| repos.jsonl | Details of processed repositories | JSONL |
| imports.jsonl | Raw import statements extracted from repos | JSONL |
| library_counts.csv | Aggregated package usage statistics | CSV |
Our GitHub Actions workflow orchestrates the entire process:
Find Random Repos → Analyze Imports → Count Package Usage → Update Statistics → Refresh README
| Rank | Library | Count |
|---|---|---|
| 1 | numpy | 51059 |
| 2 | matplotlib | 16587 |
| 3 | torch | 16488 |
| 4 | pandas | 15227 |
| 5 | cv2 | 11096 |
| 6 | django | 10429 |
| 7 | sklearn | 8743 |
| 8 | utils | 8115 |
| 9 | requests | 7972 |
| 10 | tensorflow | 7965 |
Last updated: 2025-12-24 12:59:06 UTC