You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .github/copilot-instructions.md
+17-17Lines changed: 17 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,30 +1,30 @@
1
1
# JDK Metadata DB Scraper - AI Coding Guide
2
2
3
3
## Project Overview
4
-
A parallel Java application that scrapes JDK metadata from 35+ vendors (Temurin, Zulu, Liberica, Corretto, etc.) via vendor APIs and GitHub releases. Outputs structured JSON metadata files with checksums for each JDK distribution.
4
+
A parallel Java application that scrapes JDK metadata from 35+ distros (Temurin, Zulu, Liberica, Corretto, etc.) via distro APIs and GitHub releases. Outputs structured JSON metadata files with checksums for each JDK distribution.
5
5
6
6
## Architecture
7
7
8
8
### Core Execution Flow
9
9
1.**Main** (`Main.java`) - CLI entry via Picocli, manages ExecutorService for parallel scraping
10
10
2.**ScraperFactory** - Uses Java ServiceLoader to discover scrapers via `META-INF/services/dev.jbang.jdkdb.scraper.Scraper$Discovery`
11
11
3.**ProgressReporter** - Dedicated thread receives events from all scrapers via `BlockingQueue<ProgressEvent>`
12
-
4.**Scrapers** - Each vendor scraper implements `Callable<ScraperResult>` for concurrent execution
12
+
4.**Scrapers** - Each distro scraper implements `Callable<ScraperResult>` for concurrent execution
Copy file name to clipboardExpand all lines: README.md
+55-55Lines changed: 55 additions & 55 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,16 +1,16 @@
1
1
# jdkdb-scraper - JDK Metadata DB Scraper
2
2
3
-
A Java-based application for scraping JDK metadata from various vendors. This project replaces the original bash scripts with a robust, parallel Java implementation.
3
+
A Java-based application for scraping JDK metadata from various distros. This project replaces the original bash scripts with a robust, parallel Java implementation.
4
4
5
5
This project is based on [Joschi's Java Metadata project](https://github.com/joschi/java-metadata) and incorporates ideas from the [Foojay's Disco API project](https://github.com/foojayio/discoapi).
6
6
7
7
## Features
8
8
9
-
-**Parallel Execution**: Run multiple vendor scrapers concurrently for improved performance
10
-
-**Selective Scraping**: Run all scrapers or select specific vendors
9
+
-**Parallel Execution**: Run multiple distro scrapers concurrently for improved performance
10
+
-**Selective Scraping**: Run all scrapers or select specific distros
11
11
-**Central Reporting**: Thread-safe progress reporting with real-time status updates
12
-
-**Extensible Architecture**: Easy to add new vendor scrapers
13
-
-**Generic Base Classes**: Reduces code duplication for similar vendors (e.g., Semeru versions, Trava versions)
12
+
-**Extensible Architecture**: Easy to add new distro scrapers
13
+
-**Generic Base Classes**: Reduces code duplication for similar distros (e.g., Semeru versions, Trava versions)
14
14
-**Comprehensive Logging**: SLF4J/Logback integration with both console and file output
15
15
-**Multi-command CLI**: Separate commands for updating metadata, generating indexes, downloading checksums, and cleaning up old releases
16
16
-**Archive Extraction**: Automatically extracts release information from JDK archives
-**`update`** - Scrape JDK metadata from various vendors and update metadata files
92
-
-**`index`** - Generate aggregated all.json files for vendor directories
91
+
-**`update`** - Scrape JDK metadata from various distros and update metadata files
92
+
-**`index`** - Generate aggregated all.json files for distro directories
93
93
-**`download`** - Download and compute checksums for metadata files with missing checksums
94
94
-**`clean`** - Clean up metadata by removing incomplete files and pruning old EA releases
95
95
@@ -114,7 +114,7 @@ The application checks for tokens in this order: environment variable first, the
114
114
115
115
### Typical usage
116
116
117
-
- You can simply run `update` in the root of the data repository (where the `docs/` folder is located) and let it do its work. It will scrape all the vendor sites, obtain the latest metadata, download the jdk distributions, calculate checksums and update all the indices. Nothing else to be done. But this can take some time.
117
+
- You can simply run `update` in the root of the data repository (where the `metadata/` folder is located) and let it do its work. It will scrape all the distro sites, obtain the latest metadata, download the jdk distributions, calculate checksums and update all the indices. Nothing else to be done. But this can take some time.
118
118
- You can split the work into two steps:
119
119
120
120
1. You run `update --no-download` which will do the scraping and will make sure that we have all the latest distributions cataloged. It will write all the metadata but with _missing_ checksums (and release info).
@@ -130,12 +130,12 @@ And finally the `clean` command can be used to get rid of any invalid or orphane
130
130
131
131
```bash
132
132
Usage: jdkdb-scraper [-hV] [COMMAND]
133
-
Scrapes JDK metadata from various vendors and generates index files
133
+
Scrapes JDK metadata from various distros and generates index files
134
134
-h, --help Show this help message and exit.
135
135
-V, --version Print version information and exit.
136
136
Commands:
137
-
update Scrape JDK metadata from various vendors and update metadata files
138
-
index Generate all.json files forvendor directories by aggregating
137
+
update Scrape JDK metadata from various distros and update metadata files
138
+
index Generate all.json files fordistro directories by aggregating
139
139
individual metadata files
140
140
download Download and compute checksums for metadata files that have missing
0 commit comments