Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 20 additions & 2 deletions docs/api/methods.md
Original file line number Diff line number Diff line change
Expand Up @@ -1593,7 +1593,7 @@ Returns `null` on success. The scraper continues after the response is sent.

Return the latest known metadata scraper status.

This method behaves like `media` does for indexing status: clients can query the current scrape snapshot after opening a UI, then continue listening for `media.scraping` notifications. If no scrape has run since startup, the result is idle with `scraping: false` and `done: false`.
This method behaves like `media` does for indexing status: clients can query the current scrape snapshot after opening a UI, then continue listening for `media.scraping` notifications. If no scrape has run since startup, the result is idle with `scraping: false`, `done: false`, and `state: "idle"`. Existing flat counter fields remain for compatibility; new UIs should prefer `currentSystem` for per-system progress and `totalSteps`/`currentStep`/`currentStepDisplay` for whole-run progress.

#### Parameters

Expand All @@ -1613,6 +1613,12 @@ None.
| scraping | boolean | Yes | Whether a scrape is currently running. |
| done | boolean | Yes | Whether the latest scrape reached a terminal state. |
| paused | boolean | Yes | Whether the active scrape is paused because media is running or until resumed. |
| state | string | No | Explicit lifecycle state: `idle`, `running`, `paused`, `completed`, `cancelled`, or `failed`. |
| error | string | No | Fatal scrape error on failed terminal updates. |
| totalSteps | integer | No | Total systems in the scrape run, when known. |
| currentStep | integer | No | 1-based current system step, when known. |
| currentStepDisplay | string | No | Display name for the current system step, falling back to system ID. |
| currentSystem | object | No | Per-system progress object with `systemId`, `systemName`, `processed`, `total`, `matched`, and `skipped`. |

#### Example

Expand Down Expand Up @@ -1642,7 +1648,19 @@ None.
"totalScraped": 1200,
"scraping": true,
"done": false,
"paused": false
"paused": false,
"state": "running",
"totalSteps": 2,
"currentStep": 1,
"currentStepDisplay": "Super Nintendo Entertainment System",
"currentSystem": {
"systemId": "snes",
"systemName": "Super Nintendo Entertainment System",
"processed": 42,
"total": 100,
"matched": 38,
"skipped": 4
}
}
}
```
Expand Down
36 changes: 33 additions & 3 deletions docs/api/notifications.md
Original file line number Diff line number Diff line change
Expand Up @@ -217,7 +217,7 @@ Sent during media database generation to indicate indexing progress and completi

Sent while a metadata scraper run is active and when it completes.

The first notification for a scraper run identifies the scraper and sets `scraping` to true. Progress notifications include the current system, counters, pause state, and completion state. A final notification has `scraping` set to false and `done` set to true.
The first notification for a scraper run identifies the scraper and sets `scraping` to true. Progress notifications include the current system, per-system counters, whole-run system-step progress, pause state, and completion state. A final notification has `scraping` set to false and `done` set to true. Existing flat counter fields remain for compatibility; new UIs should prefer `currentSystem` for per-system progress and `totalSteps`/`currentStep`/`currentStepDisplay` for whole-run progress.

#### Parameters

Expand All @@ -233,6 +233,12 @@ The first notification for a scraper run identifies the scraper and sets `scrapi
| scraping | boolean | Yes | True while scraping is active. |
| done | boolean | Yes | True on the terminal update for the scraper run. |
| paused | boolean | Yes | True when the active scrape is paused. |
| state | string | No | Explicit lifecycle state: `idle`, `running`, `paused`, `completed`, `cancelled`, or `failed`. |
| error | string | No | Fatal scrape error on failed terminal updates. |
| totalSteps | number | No | Total systems in the scrape run, when known. |
| currentStep | number | No | 1-based current system step, when known. |
| currentStepDisplay | string | No | Display name for the current system step, falling back to system ID. |
| currentSystem | object | No | Per-system progress object with `systemId`, `systemName`, `processed`, `total`, `matched`, and `skipped`. |

#### Examples

Expand All @@ -252,7 +258,19 @@ The first notification for a scraper run identifies the scraper and sets `scrapi
"totalScraped": 1200,
"scraping": true,
"done": false,
"paused": false
"paused": false,
"state": "running",
"totalSteps": 12,
"currentStep": 3,
"currentStepDisplay": "Super Nintendo Entertainment System",
"currentSystem": {
"systemId": "SNES",
"systemName": "Super Nintendo Entertainment System",
"processed": 42,
"total": 100,
"matched": 38,
"skipped": 4
}
}
}
```
Expand All @@ -273,7 +291,19 @@ The first notification for a scraper run identifies the scraper and sets `scrapi
"totalScraped": 1250,
"scraping": false,
"done": true,
"paused": false
"paused": false,
"state": "completed",
"totalSteps": 12,
"currentStep": 12,
"currentStepDisplay": "Super Nintendo Entertainment System",
"currentSystem": {
"systemId": "SNES",
"systemName": "Super Nintendo Entertainment System",
"processed": 100,
"total": 100,
"matched": 92,
"skipped": 8
}
}
}
```
Expand Down
16 changes: 14 additions & 2 deletions docs/scraper.md
Original file line number Diff line number Diff line change
Expand Up @@ -197,11 +197,23 @@ Progress is queryable with `media.scrape.status` and broadcast as `media.scrapin
"totalScraped": 1000,
"scraping": true,
"done": false,
"paused": false
"paused": false,
"state": "running",
"totalSteps": 2,
"currentStep": 1,
"currentStepDisplay": "Super Nintendo Entertainment System",
"currentSystem": {
"systemId": "snes",
"systemName": "Super Nintendo Entertainment System",
"processed": 42,
"total": 100,
"matched": 38,
"skipped": 4
}
}
```

`totalScraped` is derived from scraper sentinel tags in the database, not from the current run's `matched` count.
`totalScraped` is derived from scraper sentinel tags in the database, not from the current run's `matched` count. Existing flat fields stay for compatibility; new UIs should use `currentSystem` for current-system progress and `totalSteps`/`currentStep`/`currentStepDisplay` for whole-run system-step progress.

Only one scraper can run at a time, and scraping is mutually exclusive with media indexing.

Expand Down
146 changes: 119 additions & 27 deletions pkg/api/methods/media_scrape.go
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ import (
"github.com/ZaparooProject/zaparoo-core/v2/pkg/api/models/requests"
"github.com/ZaparooProject/zaparoo-core/v2/pkg/api/notifications"
"github.com/ZaparooProject/zaparoo-core/v2/pkg/api/validation"
"github.com/ZaparooProject/zaparoo-core/v2/pkg/assets"
"github.com/ZaparooProject/zaparoo-core/v2/pkg/database"
"github.com/ZaparooProject/zaparoo-core/v2/pkg/database/scraper"
"github.com/ZaparooProject/zaparoo-core/v2/pkg/helpers/syncutil"
Expand All @@ -38,7 +39,15 @@ import (
// scrapingStatus tracks the lifecycle of an active media.scrape operation.
// It mirrors the indexingStatus pattern in media.go for consistent state
// management and safe concurrent access.
const scrapeTotalScrapedRefreshInterval = 5 * time.Second
const (
scrapeTotalScrapedRefreshInterval = 5 * time.Second
scrapeStateIdle = "idle"
scrapeStateRunning = "running"
scrapeStatePaused = "paused"
scrapeStateCompleted = "completed"
scrapeStateCancelled = "cancelled"
scrapeStateFailed = "failed"
)

type scrapedCountCache struct {
lastRefresh time.Time
Expand Down Expand Up @@ -67,6 +76,7 @@ func (s *scrapingStatus) startIfNotRunning(scraperID string) bool {
s.countCache = scrapedCountCache{}
s.latest = models.ScrapingStatusResponse{
ScraperID: scraperID,
State: scrapeStateRunning,
Scraping: true,
}
return true
Expand Down Expand Up @@ -122,6 +132,7 @@ func (s *scrapingStatus) cancel() bool {
s.latest.Scraping = false
s.latest.Done = true
s.latest.Paused = false
s.latest.State = scrapeStateCancelled
// Do NOT clear running/scraperID here. The goroutine's deferred
// clearIfOwner call is the single writer for those fields, preventing
// a new scrape from starting only to have its state cleared by the
Expand Down Expand Up @@ -243,10 +254,92 @@ func queryScrapedMediaCount(ctx context.Context, db *database.Database, scraperI
return count, true
}

func systemProgressDisplay(systemID string) string {
if systemID == "" {
return ""
}
md, err := assets.GetSystemMetadata(systemID)
if err != nil || md.Name == "" {
return systemID
}
return md.Name
}

func ptrIfPositive(v int) *int {
if v <= 0 {
return nil
}
return &v
}

func ptrIfNotEmpty(v string) *string {
if v == "" {
return nil
}
return &v
}

func scrapeState(scrapeCtx context.Context, update *scraper.ScrapeUpdate, paused bool) string {
switch {
case update.FatalErr != nil:
return scrapeStateFailed
case update.Done && scrapeCtx != nil && scrapeCtx.Err() != nil:
return scrapeStateCancelled
case update.Done:
return scrapeStateCompleted
case paused:
return scrapeStatePaused
default:
return scrapeStateRunning
}
}

func scrapingStatusFromUpdate(
scrapeCtx context.Context,
scraperID string,
update *scraper.ScrapeUpdate,
paused bool,
) models.ScrapingStatusResponse {
display := systemProgressDisplay(update.SystemID)
status := models.ScrapingStatusResponse{
ScraperID: scraperID,
SystemID: update.SystemID,
Processed: update.Processed,
Total: update.Total,
Matched: update.Matched,
Skipped: update.Skipped,
Scraping: !update.Done,
Done: update.Done,
Paused: paused && !update.Done,
State: scrapeState(scrapeCtx, update, paused && !update.Done),
TotalSteps: ptrIfPositive(update.TotalSteps),
CurrentStep: ptrIfPositive(update.CurrentStep),
CurrentStepDisplay: ptrIfNotEmpty(display),
}
if update.FatalErr != nil {
status.Error = update.FatalErr.Error()
}
if update.SystemID != "" {
status.CurrentSystem = &models.ScrapeSystemProgressResponse{
SystemID: update.SystemID,
SystemName: display,
Processed: update.Processed,
Total: update.Total,
Matched: update.Matched,
Skipped: update.Skipped,
}
}
return status
}

func PublishScrapePauseStatus(ns chan<- models.Notification, paused bool) {
status := scrapingStatusInstance.getLatest()
status.Scraping = true
status.Paused = paused
status.State = scrapeStateRunning
if paused {
status.State = scrapeStatePaused
}
publishScrapingStatus(ns, &status)
}

Expand Down Expand Up @@ -300,8 +393,13 @@ func HandleMediaScrape(env requests.RequestEnv) (any, error) { //nolint:gocritic
ns := env.State.Notifications
db := env.Database

initialState := scrapeStateRunning
if paused {
initialState = scrapeStatePaused
}
initialStatus := models.ScrapingStatusResponse{
ScraperID: params.ScraperID,
State: initialState,
Scraping: true,
Paused: paused,
}
Expand All @@ -320,17 +418,8 @@ func HandleMediaScrape(env requests.RequestEnv) (any, error) { //nolint:gocritic
if update.Done {
receivedDone = true
}
status := models.ScrapingStatusResponse{
ScraperID: scraperID,
SystemID: update.SystemID,
Processed: update.Processed,
Total: update.Total,
Matched: update.Matched,
Skipped: update.Skipped,
Scraping: !update.Done,
Done: update.Done,
Paused: env.ScrapePauser != nil && env.ScrapePauser.IsPaused() && !update.Done,
}
paused := env.ScrapePauser != nil && env.ScrapePauser.IsPaused()
status := scrapingStatusFromUpdate(scrapeCtx, scraperID, &update, paused)
if update.Done {
populateScrapedMediaCountExact(env.State.GetContext(), db, &status)
} else {
Expand All @@ -344,21 +433,14 @@ func HandleMediaScrape(env requests.RequestEnv) (any, error) { //nolint:gocritic
// Otherwise the channel already delivered the final counters and sending
// another zeroed-out Done would overwrite them for consumers.
if !receivedDone {
status := scrapingStatusInstance.getLatest()
status.ScraperID = scraperID
status.Scraping = false
status.Done = true
status.Paused = false
terminalStatus := models.ScrapingStatusResponse{
ScraperID: status.ScraperID,
SystemID: status.SystemID,
Processed: status.Processed,
Total: status.Total,
Matched: status.Matched,
Skipped: status.Skipped,
Scraping: status.Scraping,
Done: status.Done,
Paused: status.Paused,
terminalStatus := scrapingStatusInstance.getLatest()
terminalStatus.ScraperID = scraperID
terminalStatus.Scraping = false
terminalStatus.Done = true
terminalStatus.Paused = false
terminalStatus.State = scrapeStateCompleted
if scrapeCtx.Err() != nil {
terminalStatus.State = scrapeStateCancelled
}
populateScrapedMediaCountExact(env.State.GetContext(), db, &terminalStatus)
publishScrapingStatus(ns, &terminalStatus)
Expand All @@ -374,8 +456,18 @@ func HandleMediaScrape(env requests.RequestEnv) (any, error) { //nolint:gocritic
//nolint:gocritic // API handler signature; large env param cannot be passed by pointer
func HandleMediaScrapeStatus(env requests.RequestEnv) (any, error) {
status := scrapingStatusInstance.getLatest()
if status.State == "" {
status.State = scrapeStateIdle
if status.Scraping {
status.State = scrapeStateRunning
}
}
if status.Scraping && env.ScrapePauser != nil {
status.Paused = env.ScrapePauser.IsPaused()
status.State = scrapeStateRunning
if status.Paused {
status.State = scrapeStatePaused
}
}
if env.Database == nil || env.Database.MediaDB == nil {
return status, nil
Expand Down
Loading
Loading