fix(haidan): Fix seeding quantity and size extraction with deduplication by madrays · Pull Request #741 · pt-plugins/PT-depiler

madrays · 2025-11-01T14:29:00Z

fix(haidan): 修复做种数量和体积获取逻辑，添加去重机制

修复做种数量获取：从表格行数去重统计，而不是直接读取错误的XXX
修复做种体积获取：累加时对重复种子进行去重
使用种子ID（details.php?id=XXX）作为唯一标识进行去重
解决了海胆站点返回的HTML片段中数据100%重复的问题"

- Fix seeding quantity: Count deduplicated table rows instead of reading incorrect 588 - Fix seeding size: Add deduplication when accumulating torrent sizes - Use torrent ID (details.php?id=XXX) as unique identifier for deduplication - Resolve issue where haidan site returns 100% duplicated data in HTML fragment

sourcery-ai

We've reviewed this pull request using the Sourcery rules engine

sourcery-ai · 2025-11-01T14:30:11Z

Reviewer's Guide

This PR overhauls the Haidan site scraper by moving seeding count and size calculations into a new AJAX‐driven process pipeline, parsing the returned HTML fragment with createDocument and Sizzle, deduplicating entries by torrent ID, and accumulating sizes with parseSizeString, plus minor regex and URL fixes.

Sequence diagram for AJAX-driven seeding info extraction and deduplication

sequenceDiagram
    participant Client
    participant HaidanSite
    participant "createDocument/Sizzle"
    participant "Deduplication Logic"
    participant "Size Accumulation"
    Client->>HaidanSite: GET /getusertorrentlistajax.php (type=seeding)
    HaidanSite-->>Client: HTML fragment (table rows)
    Client->>"createDocument/Sizzle": Parse HTML fragment
    "createDocument/Sizzle"-->>"Deduplication Logic": Extract torrent IDs from details.php?id=XXX
    "Deduplication Logic"-->>Client: Deduplicated seeding count
    "createDocument/Sizzle"-->>"Size Accumulation": Extract and sum sizes for unique torrents
    "Size Accumulation"-->>Client: Deduplicated seeding size

ER diagram for deduplicated seeding data extraction

erDiagram
    USER ||--o{ TORRENT : "seeds"
    TORRENT {
        id int PK
        size float
    }
    USER {
        id int PK
        seeding_count int
        seeding_size float
    }
    USER ||--o{ SEEDING : "deduplicated by torrent id"
    SEEDING {
        user_id int FK
        torrent_id int FK
    }

Class diagram for updated seeding and seedingSize extraction logic

classDiagram
    class SiteMetadata {
        +process: Array<ProcessStep>
    }
    class ProcessStep {
        +requestConfig
        +fields
        +selectors
    }
    class SeedingSelector {
        +filters: [deduplicate by torrent ID]
    }
    class SeedingSizeSelector {
        +filters: [deduplicate by torrent ID, accumulate size]
    }
    SiteMetadata --> ProcessStep
    ProcessStep --> SeedingSelector
    ProcessStep --> SeedingSizeSelector
    SeedingSelector <|-- SeedingSizeSelector

File-Level Changes

rows

Extracted torrent ID from details.php link or fallback to row HTML

Used a Set to dedupe IDs and returned its size

Change	Details	Files
Imported HTML and filesize utilities and fixed regex for numeric parsing	Added imports for parseSizeString, sizePattern, Sizzle, and createDocument Adjusted whitespace/comma regex to /[\s,]/g	`src/packages/site/definitions/haidan.ts`
Updated site base URL	Replaced placeholder URL with https://www.haidan.video/	`src/packages/site/definitions/haidan.ts`
Replaced static selector-based seeding extraction with AJAX pipeline	Removed direct seeding selectors and moved seeding/size fields into process steps Added multi-step process: index.php → userdetails.php → getusertorrentlistajax.php → mybonus.php	`src/packages/site/definitions/haidan.ts`
Implemented deduplicated seeding count logic	Parsed HTML fragment into Document and selected all
`src/packages/site/definitions/haidan.ts`
Implemented deduplicated seeding size accumulation	Auto-detected size column index using sizePattern Parsed each row’s size cell with parseSizeString Skipped duplicate IDs via Set and summed total size	`src/packages/site/definitions/haidan.ts`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

Rhilip · 2025-11-04T02:19:27Z

对NPHP构架站点，其seeding和seedingSize的计算有专门的函数适配，仅从pr来看其实现和 schema 的基本类似，但 schema 中无去重实现

PT-depiler/src/packages/site/schemas/NexusPHP.ts

Lines 622 to 680 in 7b4c4c4

    
             /** 
        
              * 鉴于NexusPHP这里使用ajax交互，如果强行指定 responseType: 'document' ， 
        
              * 由于返回字段并不是 valid-html, 此时会解析失败（即 data = undefined ）， 
        
              * 所以此处不指定 responseType，而是返回文本形式的 string，交由 getUserSeedingStatus 
        
              * 生成 Document 
        
              * 
        
              * @param userId 
        
              * @param type 
        
              * @protected 
        
              */ 
        
             protected async requestUserSeedingPage(userId: number, type: string = "seeding"): Promise<string | null> { 
        
               const { data } = await this.request<string>({ 
        
                 url: "/getusertorrentlistajax.php", 
        
                 params: { userid: userId, type }, 
        
               }); 
        
               return data || null; 
        
             } 
        
             protected async parseUserInfoForSeedingStatus(flushUserInfo: Partial<IUserInfo>): Promise<Partial<IUserInfo>> { 
        
               const userId = flushUserInfo.id as number; 
        
               const userSeedingRequestString = await this.requestUserSeedingPage(userId); 
        
               let seedStatus = { seeding: 0, seedingSize: 0 }; 
        
               if (userSeedingRequestString && userSeedingRequestString?.includes("<table")) { 
        
                 const userSeedingDocument = createDocument(userSeedingRequestString); 
        
                 const divSeeding = Sizzle("div > div:contains(' | ')", userSeedingDocument); 
        
                 if (divSeeding.length > 0 && divSeeding[0].textContent) { 
        
                   const seedingText = divSeeding[0].textContent.split("|"); 
        
                   seedStatus.seeding = definedFilters.parseNumber(seedingText[0]); 
        
                   seedStatus.seedingSize = definedFilters.parseSize(seedingText[1]); 
        
                 } else { 
        
                   const trAnothers = Sizzle("table:last tr:not(:eq(0))", userSeedingDocument); 
        
                   if (trAnothers.length > 0) { 
        
                     seedStatus.seeding = trAnothers.length; 
        
                     // 根据自动判断应该用 td.rowfollow:eq(?) 
        
                     let sizeIndex = 2; 
        
                     const tdAnothers = Sizzle("> td", trAnothers[0]); 
        
                     for (let i = 0; i < tdAnothers.length; i++) { 
        
                       if (sizePattern.test((tdAnothers[i] as HTMLElement).innerText)) { 
        
                         sizeIndex = i; 
        
                         break; 
        
                       } 
        
                     } 
        
                     trAnothers.forEach((trAnother) => { 
        
                       const sizeSelector = Sizzle(`td:eq(${sizeIndex})`, trAnother)[0] as HTMLElement; 
        
                       seedStatus.seedingSize += parseSizeString(sizeSelector.innerText.trim()); 
        
                     }); 
        
                   } 
        
                 } 
        
               } 
        
               flushUserInfo = mergeWith(flushUserInfo, seedStatus, (objValue, srcValue) => { 
        
                 return typeof srcValue === "undefined" ? objValue : srcValue; 
        
               }); 
        
               return flushUserInfo; 
        
             }

另外，关于是否需要去重我认为是有必要商榷的。
从NPHP的代码来看 /getusertorrentlistajax.php?userid=xxxx&type=seeding 对应到的SQL是类似 SELECT xxx FROM　peers WHERE xxxx ，如果用户确实有多地做种的情况，那么其返回必然是有重复的。在这种情况下，去重反而会导致其统计出错。
至于软件重启原因导致的多做种情况，会被站点定时程序清理而在重刷新中回归正常。

madrays · 2025-11-04T02:59:16Z

对NPHP构架站点，其seeding和seedingSize的计算有专门的函数适配，仅从pr来看其实现和 schema 的基本类似，但 schema 中无去重实现

PT-depiler/src/packages/site/schemas/NexusPHP.ts

Lines 622 to 680 in 7b4c4c4

/**

* 鉴于NexusPHP这里使用ajax交互，如果强行指定 responseType: 'document' ，

* 由于返回字段并不是 valid-html, 此时会解析失败（即 data = undefined ），

* 所以此处不指定 responseType，而是返回文本形式的 string，交由 getUserSeedingStatus

* 生成 Document

*

* @param userId

* @param type

* @protected

*/

protected async requestUserSeedingPage(userId: number, type: string = "seeding"): Promise<string | null> {

const { data } = await this.request<string>({

url: "/getusertorrentlistajax.php",

params: { userid: userId, type },

});

return data || null;

}

protected async parseUserInfoForSeedingStatus(flushUserInfo: Partial<IUserInfo>): Promise<Partial<IUserInfo>> {

const userId = flushUserInfo.id as number;

const userSeedingRequestString = await this.requestUserSeedingPage(userId);

let seedStatus = { seeding: 0, seedingSize: 0 };

if (userSeedingRequestString && userSeedingRequestString?.includes("<table")) {

const userSeedingDocument = createDocument(userSeedingRequestString);

const divSeeding = Sizzle("div > div:contains(' | ')", userSeedingDocument);

if (divSeeding.length > 0 && divSeeding[0].textContent) {

const seedingText = divSeeding[0].textContent.split("|");

seedStatus.seeding = definedFilters.parseNumber(seedingText[0]);

seedStatus.seedingSize = definedFilters.parseSize(seedingText[1]);

} else {

const trAnothers = Sizzle("table:last tr:not(:eq(0))", userSeedingDocument);

if (trAnothers.length > 0) {

seedStatus.seeding = trAnothers.length;

// 根据自动判断应该用 td.rowfollow:eq(?)

let sizeIndex = 2;

const tdAnothers = Sizzle("> td", trAnothers[0]);

for (let i = 0; i < tdAnothers.length; i++) {

if (sizePattern.test((tdAnothers[i] as HTMLElement).innerText)) {

sizeIndex = i;

break;

}

}

trAnothers.forEach((trAnother) => {

const sizeSelector = Sizzle(`td:eq(${sizeIndex})`, trAnother)[0] as HTMLElement;

seedStatus.seedingSize += parseSizeString(sizeSelector.innerText.trim());

});

}

}

}

flushUserInfo = mergeWith(flushUserInfo, seedStatus, (objValue, srcValue) => {

return typeof srcValue === "undefined" ? objValue : srcValue;

});

return flushUserInfo;

}

另外，关于是否需要去重我认为是有必要商榷的。从NPHP的代码来看 /getusertorrentlistajax.php?userid=xxxx&type=seeding 对应到的SQL是类似 SELECT xxx FROM　peers WHERE xxxx ，如果用户确实有多地做种的情况，那么其返回必然是有重复的。在这种情况下，去重反而会导致其统计出错。至于软件重启原因导致的多做种情况，会被站点定时程序清理而在重刷新中回归正常。

海胆这个站点一直无法正常获取做种量和做种体积，且体积异常为 YB 级别（导致时间轴汇总数据完全异常），所以尝试了特殊处理，处理完了发现这个站点百分百重复（应该是同时有 v4 和 v6 导致的这个问题），所以又尝试了去重，如果大佬有更好的方案当然是更好的～

sourcery-ai bot reviewed Nov 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(haidan): Fix seeding quantity and size extraction with deduplication#741

fix(haidan): Fix seeding quantity and size extraction with deduplication#741
madrays wants to merge 1 commit intopt-plugins:masterfrom
madrays:fix/haidan-seeding-data

madrays commented Nov 1, 2025 •

edited

Loading

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot commented Nov 1, 2025

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

Rhilip commented Nov 4, 2025

Uh oh!

madrays commented Nov 4, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

madrays commented Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot commented Nov 1, 2025

Reviewer's Guide

Sequence diagram for AJAX-driven seeding info extraction and deduplication

ER diagram for deduplicated seeding data extraction

Class diagram for updated seeding and seedingSize extraction logic

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

Rhilip commented Nov 4, 2025

Uh oh!

madrays commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

madrays commented Nov 1, 2025 •

edited

Loading

madrays commented Nov 4, 2025 •

edited

Loading