refactor!: Extract service management from Configuration into ServiceLocator class#3325
refactor!: Extract service management from Configuration into ServiceLocator class#3325
Conversation
255385d to
65e0dba
Compare
fa7114d to
ac8ad86
Compare
l2ysho
left a comment
There was a problem hiding this comment.
I left some nit (or rather question) related to cyclic dependency, otherwise it looks good. 🚀
|
This is what claude things about it. As usual, take it with a grain of salt, but some of those things do sound valid to me. I am sure it will make more sense to you than me 🙃 I am particularly surprised by the removal of |
Right, I tried to knock out the easy ones first.
Yeah, let's discuss that some other time, other place.
returned
Warning it is.
Most of those only execute once during the entire crawler lifetime, hardly a hot path. A local variable would just introduce noise.
Fair, removed the argument.
Added a comment. ...other half coming soon. |
- split from #3325 after review (#3325 (comment))
13b834a to
9e05c31
Compare
|
@B4nan I resolved the comments, feel free to check again |
barjin
left a comment
There was a problem hiding this comment.
The changes are mostly fine by me, below there are some questions. Please forgive me if some questions are random (and feel free to skip those), it took me some time to get my head wrapped around this.
|
|
||
| test('should work', async () => { | ||
| const queue = new RequestQueue({ id: 'some-id', client: storageClient }); | ||
| const queue = new RequestQueue({ id: 'some-id', client: storageClient }, serviceLocator.getConfiguration()); |
There was a problem hiding this comment.
I see this has been discussed in the Claude comment above (and RequestQueue.open() being more popular is a valid argument by me), but what exactly is the reason for this change? Why cannot RequestQueue pull the Configuration instance from the serviceLocator on its own?
There was a problem hiding this comment.
Apparently it's not a problem anymore - I made it so in a previous attempt to resolve an import cycle. Thanks!
| } | ||
|
|
||
| setStorageManager(constructor: Constructor<IStorage>, storageManager: StorageManager): void { | ||
| this.storageManagers.set(constructor, storageManager); |
There was a problem hiding this comment.
Why we get to overwrite individual storage managers, but overriding the other services fails with an error?
There was a problem hiding this comment.
Good point, updated
|
|
||
| await this.client.delete(); | ||
| const manager = StorageManager.getManager(this.constructor as Constructor<IStorage>, this.config); | ||
| const manager = StorageManager.getManager(this.constructor as Constructor<IStorage>); |
There was a problem hiding this comment.
Why doesn't this use serviceLocator.getStorageManager?
crawlee/packages/core/src/service_locator.ts
Lines 213 to 215 in 0383a31
There was a problem hiding this comment.
I know this is confusing, but StorageManager still has the functionality it had before, only the "cache" got moved to ServiceLocator. This will be fully resolved in #3075.
| * | ||
| * Delegates to the global ServiceLocator, making it the single source of truth for service management. | ||
| */ | ||
| static getGlobalConfig(): Configuration { |
There was a problem hiding this comment.
this is no longer necessarily the "global" config, right?
const globalConfig = new Configuration();
const localConfig = new Configuration();
serviceLocator.setConfiguration(globalConfig);
const crawler = new CheerioCrawler({
requestHandler: async () => {
const config = Configuration.getGlobalConfig();
/// !!! this will fail
assert(config === globalConfig, "Configuration.getGlobalConfig is not returning the global config");
},
configuration: localConfig
});Isn't this an (unexpected) BC for users? Is this intentional?
There was a problem hiding this comment.
Yes. I added a note about this to the upgrading guide. The situation is the same on the Python side, but I agree that the name is not really... fitting anymore.
sad to hear it, let me know if I can help |
Review action items (comment)
bindMethodsToServiceLocatoris fragile — The super call concern is incorrect — AsyncLocalStorage context propagates through the entire sync/async call tree, so prototype methods reached via super execute within the context established by the instance-level wrapper. Identity loss is theoretical — binding happens before any references are captured. We did fix Symbol-keyed methods (now iterates getOwnPropertySymbols too).EventManager.init()— fixed by acceptingpersistStateIntervalMillisas a constructor param, addedLocalEventManager.fromConfig()factory (aligned with crawlee-python)serviceLocatoris a plain object, not aServiceLocatorinstance — I don't perceive this as a problemConfiguration.set()removal — restored the method, removed from upgrading guide's "removed" listStorageManager.getManagercache keying ignores config — fixed by removing the config parameter, first step of a process to be finished in Rework the storage client system #3075warningis appropriate for side-effect warnings,debugfor default creation messagesclearStorageManagerCacheuses fragile string matching — since this is going to be redone in Rework the storage client system #3075, we are leaving it unresolvedserviceLocator.getConfiguration()in local variables — dismissed,AsyncLocalStorage.getStore()is trivialnew LocalEventManager(crawlerConfig)passes unused arg — fixedRequestQueueconstructor requiresconfigas mandatory — kept the change, documented in upgrading guideConfigurationshows oldBasicCrawlerconstructor signature — fixedEventManager/LocalEventManagerconstructor options andfromConfigfactory documented in upgrading guide