fix(haidan): Fix seeding quantity and size extraction with deduplication#741
fix(haidan): Fix seeding quantity and size extraction with deduplication#741madrays wants to merge 1 commit intopt-plugins:masterfrom
Conversation
- Fix seeding quantity: Count deduplicated table rows instead of reading incorrect 588 - Fix seeding size: Add deduplication when accumulating torrent sizes - Use torrent ID (details.php?id=XXX) as unique identifier for deduplication - Resolve issue where haidan site returns 100% duplicated data in HTML fragment
Reviewer's GuideThis PR overhauls the Haidan site scraper by moving seeding count and size calculations into a new AJAX‐driven process pipeline, parsing the returned HTML fragment with createDocument and Sizzle, deduplicating entries by torrent ID, and accumulating sizes with parseSizeString, plus minor regex and URL fixes. Sequence diagram for AJAX-driven seeding info extraction and deduplicationsequenceDiagram
participant Client
participant HaidanSite
participant "createDocument/Sizzle"
participant "Deduplication Logic"
participant "Size Accumulation"
Client->>HaidanSite: GET /getusertorrentlistajax.php (type=seeding)
HaidanSite-->>Client: HTML fragment (table rows)
Client->>"createDocument/Sizzle": Parse HTML fragment
"createDocument/Sizzle"-->>"Deduplication Logic": Extract torrent IDs from details.php?id=XXX
"Deduplication Logic"-->>Client: Deduplicated seeding count
"createDocument/Sizzle"-->>"Size Accumulation": Extract and sum sizes for unique torrents
"Size Accumulation"-->>Client: Deduplicated seeding size
ER diagram for deduplicated seeding data extractionerDiagram
USER ||--o{ TORRENT : "seeds"
TORRENT {
id int PK
size float
}
USER {
id int PK
seeding_count int
seeding_size float
}
USER ||--o{ SEEDING : "deduplicated by torrent id"
SEEDING {
user_id int FK
torrent_id int FK
}
Class diagram for updated seeding and seedingSize extraction logicclassDiagram
class SiteMetadata {
+process: Array<ProcessStep>
}
class ProcessStep {
+requestConfig
+fields
+selectors
}
class SeedingSelector {
+filters: [deduplicate by torrent ID]
}
class SeedingSizeSelector {
+filters: [deduplicate by torrent ID, accumulate size]
}
SiteMetadata --> ProcessStep
ProcessStep --> SeedingSelector
ProcessStep --> SeedingSizeSelector
SeedingSelector <|-- SeedingSizeSelector
File-Level Changesrows
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
对NPHP构架站点,其seeding和seedingSize的计算有专门的函数适配,仅从pr来看其实现和 schema 的基本类似,但 schema 中无去重实现 PT-depiler/src/packages/site/schemas/NexusPHP.ts Lines 622 to 680 in 7b4c4c4 另外,关于是否需要去重我认为是有必要商榷的。 |
海胆这个站点一直无法正常获取做种量和做种体积,且体积异常为 YB 级别(导致时间轴汇总数据完全异常),所以尝试了特殊处理,处理完了发现这个站点百分百重复(应该是同时有 v4 和 v6 导致的这个问题),所以又尝试了去重,如果大佬有更好的方案当然是更好的~ |
fix(haidan): 修复做种数量和体积获取逻辑,添加去重机制