Skip to content

【疑似BUG】0.1.2 版本pretend.py 文件存在问题,导致采集失败 #29

@DeSireFire

Description

@DeSireFire

部署新服务器的时候出现了问题。经过对比定位到了原因。
GerapyPyppeteer/gerapy_pyppeteer/pretend.py
使用 0.0.13版本正常代码如下
SET_WEBDRIVER = '''() => {Object.defineProperty(navigator, 'webdriver', {get: () => undefined})}'''
使用 0.1.2
其中第73行的SET_WEBDRIVER变量存在问题.请求某数时,被检测返回400.

测试代码:

import json
import os
import asyncio
import time

from pyppeteer import launch, connection
from pyppeteer import chromium_downloader
from gerapy_pyppeteer.pretend import SCRIPTS as PRETEND_SCRIPTS
from pyppeteer.network_manager import Response



async def main():
    browser = await launch({'headless': False, 'timeout': 10000, 'args': ['--no-sandbox', ]},)
    page = await browser.newPage()
    for script in PRETEND_SCRIPTS:
        await page.evaluateOnNewDocument(script)

    print(len(await browser.pages()))
    await page.goto(http://www.某个网址.com.cn/old_house/old_house.html') # 记得修改

    await page.waitForNavigation()


    await page.waitFor(10 * 1000)

    print(await page.evaluate("document.cookie"))
    print(f'等待url 完成')

    # await page.waitFor(10 * 1000)
    print(await page.content())

    await browser.close()



asyncio.get_event_loop().run_until_complete(main())

会拿到一个空白页

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions