Skip to content

Commit 52cece9

Browse files
authored
Merge pull request #49 from scrapinghub/kumo1719
Split frontier logic into Frontiers & Frontier
2 parents d4f79ef + d1c88bb commit 52cece9

File tree

5 files changed

+498
-47
lines changed

5 files changed

+498
-47
lines changed

README.rst

Lines changed: 79 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@ Project instance also has the following fields:
9595

9696
- activity - access to project activity records
9797
- collections - work with project collections (see ``Collections`` section)
98-
- frontier - using project frontier (see ``Frontier`` section)
98+
- frontiers - using project frontier (see ``Frontiers`` section)
9999
- settings - interface to project settings
100100
- spiders - access to spiders collection (see ``Spiders`` section)
101101

@@ -437,44 +437,104 @@ Usual workflow with `Collections`_ would be::
437437

438438
Collections are available on project level only.
439439

440-
Frontier
441-
--------
440+
Frontiers
441+
---------
442442

443443
Typical workflow with `Frontier`_::
444444

445-
>>> frontier = project.frontier
445+
>>> frontiers = project.frontiers
446446

447-
Add a request to the frontier::
447+
Get all frontiers from a project to iterate through it::
448+
449+
>>> frontiers.iter()
450+
<list_iterator at 0x103c93630>
451+
452+
List all frontiers::
453+
454+
>>> frontiers.list()
455+
['test', 'test1', 'test2']
456+
457+
Get a frontier by name::
458+
459+
>>> frontier = frontiers.get('test')
460+
>>> frontier
461+
<scrapinghub.client.Frontier at 0x1048ae4a8>
462+
463+
Get an iterator to iterate through a frontier slots::
464+
465+
>>> frontier.iter()
466+
<list_iterator at 0x1030736d8>
467+
468+
List all slots::
469+
470+
>>> frontier.list()
471+
['example.com', 'example.com2']
472+
473+
Get a frontier slot by name::
474+
475+
>>> slot = frontier.get('example.com')
476+
>>> slot
477+
<scrapinghub.client.FrontierSlot at 0x1049d8978>
478+
479+
Add a request to the slot::
480+
481+
>>> slot.queue.add([{'fp': '/some/path.html'}])
482+
>>> slot.flush()
483+
>>> slot.newcount
484+
1
485+
486+
``newcount`` is defined per slot, but also available per frontier and globally::
448487

449-
>>> frontier.add('test', 'example.com', [{'fp': '/some/path.html'}])
450-
>>> frontier.flush()
451488
>>> frontier.newcount
452489
1
490+
>>> frontiers.newcount
491+
3
492+
493+
Add a fingerprint only to the slot::
494+
495+
>>> slot.fingerprints.add(['fp1', 'fp2'])
496+
>>> slot.flush()
497+
498+
There are convenient shortcuts: ``f`` for ``fingerprints`` and ``q`` for ``queue``.
453499

454500
Add requests with additional parameters::
455501

456-
>>> frontier.add('test', 'example.com', [{'fp': '/'}, {'fp': 'page1.html', 'p': 1, 'qdata': {'depth': 1}}])
457-
>>> frontier.flush()
458-
>>> frontier.newcount
459-
2
502+
>>> slot.q.add([{'fp': '/'}, {'fp': 'page1.html', 'p': 1, 'qdata': {'depth': 1}}])
503+
>>> slot.flush()
460504

461-
To delete the slot ``example.com`` from the frontier::
505+
To retrieve all requests for a given slot::
462506

463-
>>> frontier.delete_slot('test', 'example.com')
507+
>>> reqs = slot.q.iter()
464508

465-
To retrieve requests for a given slot::
509+
To retrieve all fingerprints for a given slot::
466510

467-
>>> reqs = frontier.read('test', 'example.com')
511+
>>> fps = slot.f.iter()
512+
513+
To list all the requests use ``list()`` method (similar for ``fingerprints``)::
514+
515+
>>> fps = slot.q.list()
468516

469517
To delete a batch of requests::
470518

471-
>>> frontier.delete('test', 'example.com', '00013967d8af7b0001')
519+
>>> slot.q.delete('00013967d8af7b0001')
472520

473-
To retrieve fingerprints for a given slot::
521+
To delete the whole slot from the frontier::
474522

475-
>>> fps = [req['requests'] for req in frontier.read('test', 'example.com')]
523+
>>> slot.delete()
524+
525+
Flush data of the given frontier::
526+
527+
>>> frontier.flush()
528+
529+
Flush data of all frontiers of a project::
530+
531+
>>> frontiers.flush()
532+
533+
Close batch writers of all frontiers of a project::
534+
535+
>>> frontiers.close()
476536

477-
Frontier is available on project level only.
537+
Frontiers are available on project level only.
478538

479539
Tags
480540
----

0 commit comments

Comments
 (0)