Conversation
Allow connecting to remote browser services (Browserless, BrowserBase, Playwright Server) instead of always launching local browsers. (#1822) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ghtCrawler" This reverts commit 90fbb0c.
|
ready for review, user now only need to define 2 functions and wrapper handle others. |
packages/core/src/request.ts
Outdated
| // Lazy singleton — evaluated on first use so a user-configured logger is picked up | ||
| // rather than the default that exists at module load time. | ||
| let _log: CrawleeLogger | undefined; | ||
| const getLog = () => { | ||
| _log ??= Configuration.getGlobalConfig().getLogger().child({ prefix: 'Request' }); | ||
| return _log; | ||
| }; |
There was a problem hiding this comment.
nit / refactor: can getLog(ger) be a static method on Request, just to make the design a bit tidier?
| /** | ||
| * Internal logging method used by some Crawlee internals. | ||
| */ | ||
| internal(level: number, message: string, data?: Record<string, unknown>, exception?: Error): void; |
There was a problem hiding this comment.
If this is exposed in the public interface, we should assume people will use this (especially if there is no other method that does this).
How about naming this logger.logWithLevel or sth similar and add a friendly comment to encourage the (power)users instead?
There was a problem hiding this comment.
You mean rename just in interface and keep the internal implementation in Abstract Class?
There was a problem hiding this comment.
I suppose you'll need to rename both the interface member and the implementing function, but yes, it's just about the name / JSDoc for me. I believe our more advanced users could find usecases for this method too, no need to hide it from them.
There was a problem hiding this comment.
I am digging deep into it and I am not sure how you mean it. Internal function is implemented in apify/log, you want this change to bubble up to apify/log package?
There was a problem hiding this comment.
I'm sorry if I'm confusing.
I'd prefer to have a more descriptive name for the internal method, as it's public. This change would have to rename both CrawleeLogger.internal and BaseCrawleeLogger.internal.
As Honza mentions in the other comment
We don't really need to keep @apify/log directly assignable to CrawlingContext.log
so the @apify/log's interface doesn't really bother me now (ok it does - for the same reason as here - but we don't need to fix it now :)).
There was a problem hiding this comment.
I'm sorry if I'm confusing.
I believe I am the source of my confusion :]
But I have this example, this is one of a few usages of .internal -> basic-crawler.ts
this.log.internal(LogLevel[(options.level as 'DEBUG') ?? 'DEBUG'], message, data);
this.log is either @apify/log or custom logger provided by user. So i have to ensure that it exists on both. I cannot rename it in CrawleeLogger and BaseCrawleeLogger because I have to match @apify/log
Or I am missing something? 🤔
There was a problem hiding this comment.
As Honza mentions in the other comment
Feel free to tweak stuff to a reasonable degree; we can afford BC breaks. We don't really need to keep
@apify/logdirectly assignable toCrawlingContext.log. It's no problem if that's going to require some additional wrapper or even adding a new method to@apify/log.
I.e. right now, you don't have to match the @apify/log and CrawleeLogger APIs (I believe that was the original reason for these changes; we want to free ourselves from @apify/log and create our own Crawlee thing).
this.log is either @apify/log or custom logger provided by user.
imo after merging this PR, this.log should implement CrawleeLogger interface (and that's all we'll know about it). Whether it's a custom user-supplied logger or a CrawleeLogger wrapper for @apify/log should be inconsequential.
Let's keep the .internal() method for now if it's easier. I don't want to muddle your PR with minor things like this :) It's still good that we discussed this, because maybe I'm understanding the motivations behind this PR wrong.
Can we please get a third pair of eyes @janbuchar? 🙏
|
|
||
| if (BasicCrawler.useStateCrawlerIds.size > 1) { | ||
| defaultLog.warningOnce( | ||
| this.log.warningOnce( |
There was a problem hiding this comment.
This is a semantic change - originally, we would call warningOnce on the global instance (so it would log the warning only once).
Now we're calling warningOnce on each crawler's instance separately. These do not share the state, so this will log the message n times (n == crawler count).
There was a problem hiding this comment.
Does it matter really? Feels all right to me.
There was a problem hiding this comment.
It logs the same multiline message multiple times... not the end of the world, but imo unnecessary. I noticed this just because I dealt with this when implementing this not so long ago :)
If using the global log instance proves too complicated, I'm fine with this; if it's a one-line change, I'd prefer clean logs.
There was a problem hiding this comment.
yop, also claude suggested this to me but I was not believe him at the moment. I can take a look, probably I can store the state and pass it to child.
There was a problem hiding this comment.
fixed by static logger instance for basic-crawler
|
feedback from martin Important (Should Fix)
Suggestions (Nice to Have)
|
|
@l2ysho is it fine if I postpone my review until the conflicts are resolved and Jindra's feedback incorporated? |
Definitelly |
|
@janbuchar I think you can start, I addressed most critical issues from feedback. In the meantime I have still 10 minor/nice-to-have issues in stack to check |
Introduces a
CrawleeLoggerinterface andBaseCrawleeLoggerabstract class so users can plug in any logger (Winston, Pino, Bunyan, etc.) instead of being locked to@apify/log.CrawleeLoggerinterface — the contract any logger must satisfyBaseCrawleeLoggerabstract class — implement_log()+_createChild(), get 13 methods for freeConfiguration.loggerProvider— the injection pointHow it works internally
Every module that needs logging now calls
config.getLogger()instead of importing@apify/logdirectly. Two patterns:config.getLogger().child({ prefix })config(BasicCrawler, AutoscaledPool, etc.)Configuration.getGlobalConfig().getLogger()Only two files still import
@apify/logdirectly:log.ts(defines interface) andconfiguration.ts(default fallback).