Skip to content

Commit 001f8d8

Browse files
committed
Merge remote-tracking branch 'origin/sc1467-1' into sc1467-1
2 parents 2d0a95d + 52cece9 commit 001f8d8

File tree

6 files changed

+556
-61
lines changed

6 files changed

+556
-61
lines changed

README.rst

Lines changed: 105 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@ Project instance also has the following fields:
9595

9696
- activity - access to project activity records
9797
- collections - work with project collections (see ``Collections`` section)
98-
- frontier - using project frontier (see ``Frontier`` section)
98+
- frontiers - using project frontier (see ``Frontiers`` section)
9999
- settings - interface to project settings
100100
- spiders - access to spiders collection (see ``Spiders`` section)
101101

@@ -385,6 +385,32 @@ To retrieve all samples for a job::
385385
[1482233732452, 0, 0, 0, 0, 0]
386386

387387

388+
Activity
389+
--------
390+
391+
To retrieve all activity events from a project::
392+
393+
>>> project.activity.iter()
394+
<generator object jldecode at 0x1049ee990>
395+
396+
>>> project.activity.list()
397+
[{'event': 'job:completed', 'job': '123/2/3', 'user': 'jobrunner'},
398+
{'event': 'job:cancelled', 'job': '123/2/3', 'user': 'john'}]
399+
400+
To post a new activity event::
401+
402+
>>> event = {'event': 'job:completed', 'job': '123/2/4', 'user': 'john'}
403+
>>> project.activity.add(event)
404+
405+
Or post multiple events at once::
406+
407+
>>> events = [
408+
{'event': 'job:completed', 'job': '123/2/5', 'user': 'john'},
409+
{'event': 'job:cancelled', 'job': '123/2/6', 'user': 'john'},
410+
]
411+
>>> project.activity.add(events)
412+
413+
388414
Collections
389415
-----------
390416

@@ -411,44 +437,104 @@ Usual workflow with `Collections`_ would be::
411437

412438
Collections are available on project level only.
413439

414-
Frontier
415-
--------
440+
Frontiers
441+
---------
416442

417443
Typical workflow with `Frontier`_::
418444

419-
>>> frontier = project.frontier
445+
>>> frontiers = project.frontiers
420446

421-
Add a request to the frontier::
447+
Get all frontiers from a project to iterate through it::
448+
449+
>>> frontiers.iter()
450+
<list_iterator at 0x103c93630>
451+
452+
List all frontiers::
453+
454+
>>> frontiers.list()
455+
['test', 'test1', 'test2']
456+
457+
Get a frontier by name::
458+
459+
>>> frontier = frontiers.get('test')
460+
>>> frontier
461+
<scrapinghub.client.Frontier at 0x1048ae4a8>
462+
463+
Get an iterator to iterate through a frontier slots::
464+
465+
>>> frontier.iter()
466+
<list_iterator at 0x1030736d8>
467+
468+
List all slots::
469+
470+
>>> frontier.list()
471+
['example.com', 'example.com2']
472+
473+
Get a frontier slot by name::
474+
475+
>>> slot = frontier.get('example.com')
476+
>>> slot
477+
<scrapinghub.client.FrontierSlot at 0x1049d8978>
478+
479+
Add a request to the slot::
480+
481+
>>> slot.queue.add([{'fp': '/some/path.html'}])
482+
>>> slot.flush()
483+
>>> slot.newcount
484+
1
485+
486+
``newcount`` is defined per slot, but also available per frontier and globally::
422487

423-
>>> frontier.add('test', 'example.com', [{'fp': '/some/path.html'}])
424-
>>> frontier.flush()
425488
>>> frontier.newcount
426489
1
490+
>>> frontiers.newcount
491+
3
492+
493+
Add a fingerprint only to the slot::
494+
495+
>>> slot.fingerprints.add(['fp1', 'fp2'])
496+
>>> slot.flush()
497+
498+
There are convenient shortcuts: ``f`` for ``fingerprints`` and ``q`` for ``queue``.
427499

428500
Add requests with additional parameters::
429501

430-
>>> frontier.add('test', 'example.com', [{'fp': '/'}, {'fp': 'page1.html', 'p': 1, 'qdata': {'depth': 1}}])
431-
>>> frontier.flush()
432-
>>> frontier.newcount
433-
2
502+
>>> slot.q.add([{'fp': '/'}, {'fp': 'page1.html', 'p': 1, 'qdata': {'depth': 1}}])
503+
>>> slot.flush()
434504

435-
To delete the slot ``example.com`` from the frontier::
505+
To retrieve all requests for a given slot::
436506

437-
>>> frontier.delete_slot('test', 'example.com')
507+
>>> reqs = slot.q.iter()
438508

439-
To retrieve requests for a given slot::
509+
To retrieve all fingerprints for a given slot::
440510

441-
>>> reqs = frontier.read('test', 'example.com')
511+
>>> fps = slot.f.iter()
512+
513+
To list all the requests use ``list()`` method (similar for ``fingerprints``)::
514+
515+
>>> fps = slot.q.list()
442516

443517
To delete a batch of requests::
444518

445-
>>> frontier.delete('test', 'example.com', '00013967d8af7b0001')
519+
>>> slot.q.delete('00013967d8af7b0001')
446520

447-
To retrieve fingerprints for a given slot::
521+
To delete the whole slot from the frontier::
448522

449-
>>> fps = [req['requests'] for req in frontier.read('test', 'example.com')]
523+
>>> slot.delete()
524+
525+
Flush data of the given frontier::
526+
527+
>>> frontier.flush()
528+
529+
Flush data of all frontiers of a project::
530+
531+
>>> frontiers.flush()
532+
533+
Close batch writers of all frontiers of a project::
534+
535+
>>> frontiers.close()
450536

451-
Frontier is available on project level only.
537+
Frontiers are available on project level only.
452538

453539
Tags
454540
----

0 commit comments

Comments
 (0)