Skip to content
This repository was archived by the owner on Jul 30, 2024. It is now read-only.
This repository was archived by the owner on Jul 30, 2024. It is now read-only.

Multi-level url deduplication problem 多级url去重的问题 #38

@CWF-999

Description

@CWF-999

A website has tens of thousands or more urls after rendering, and these urls are hierarchical. If the url of the previous level is judged to be repeated, then its next level url is directly ignored. Can the problem bloomfilter be solved? Where do I need to change?

一个网站经过渲染后有几万甚至更多url,这些url是分级的。如果上一级的url被判断重复了,那么它的下一级url就被直接忽略了,这个问题bloomfilter能解决吗?我需要在哪里修改?有没有大佬能提供一个好的思路?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions