ZeroScraper API

API endpoints
Using Browser
Using Command-line tools

API endpoints

authentication

POST /login?username=:username&password=:password: To login.

 {
   "message": "Login successfully.",
   "access_token": "xxx"
 }

POST /logout: To logout.

system monitoring

GET /health: Check if scheduled scraping activity is executing as expected:

 {
   "discover": "okay",  // or "not okay"
   "update": "okay" // or "not okay"
 }

GET /variables: Get all variables

 {
   "message": "returning all variables", 
   "result": [
     {
       "key": "discover:pid", 
       "value": "4396"
     }, 
     {variable info},
     ...
   ]
 }

GET /variables?key=:key: Get variables by key

 {
   "message": "returning variables with key :key", 
   "result": {variable info}
 }

stats

GET /stats: Get stats of all days and sites

 {
   "message": "Returning all stats", 
   "result": [
     {
       "date": "2020-03-01", 
       "site_id": 1, 
       "new_article_count": 134, 
       "updated_article_count": 0
     },
     {stats},
     {stats}.
     ...
   ]
 }

GET /stats?date=:date: Get stats of all sites on a day, e.g. GET /stats?date=2020-04-05

 {
   "message": "Return stats of all sites on :date",
   "result": [
     {stats},
     {stats}, 
     ...
   ]
 }

GET /stats?site_id=:id: Get stats of a site, but only return stats of the last 30 days.

 {
   "message": "Returning stats of site :id from the last 30 days.", 
   "result": [
     {stats},
     {stats},
     ...
   ]
 }

articles

GET /articles: Returning 10 recent articles.

 {
   "message": "Returning 10 most recent articles.", 
   "result": [
     {
       "article_id": 3939173, 
       "article_type": "Article", 
       "first_snapshot_at": 1588107403, 
       "last_snapshot_at": 1588107403, 
       "next_snapshot_at": 1588193803, 
       "redirect_to": null, 
       "site_id": 105, 
       "snapshot_count": 1, 
       "url": xxx, 
       "url_hash": "43445058"
     }, 
     {article info},
     {article info},
     ...
   ]
 }

GET /articles?url=:url. Get articles with url, only for exact match. The query find matches in both requested url and redirected url.

 {
   "message": "Returning articles that matches url :url,
   "result": [
      {article info},
      {article info},
      ...
   ]
 }

GET /articles/:id: Get article info with article_id:

 {
   "message": "Returning article with id :id", 
   "result": {article info}
 }

sites

GET /sites: Get all sites.

 {
   "message": "Returning all sites",
   "result": [
     {
       "airtable_id": "xxx",
       "config": "{...}",
       "is_active": 1,
       "last_crawl_at": 1588123660,
       "name": "yyy",
       "site_id": 100,
       "site_info": "{...}",
       "type": "zzz",
       "url": "ooo"
       },
     {site info},
     {site info},
     ...
   ]
 }

GET /sites/active: Get all active sites.

 {
   "message": "Returning active sites",
   "result": [
     {site info},
     {site info},
     ...
   ]
 }

GET /sites/:id/article_count: Get article count in a site.

 {
   "message": "Returning article count of site :id",
   "result": {
     "site": {site :id info}
     "article_count": 100
   }
 }

GET /sites/:id/latest_article: Get most recently added article of a site:

 {
   "message": "Returning latest article from site :id",
   "result": {
     "latest_article": {article info},
     "site": {site :id info}
 }

publications

GET /publications?q=:search_string: Get publications where title or text contains the search string.

 {
   "message": "Return publications that matches :search_string",
   "result": [
     {publication info},
     {publication info},
     ...
   ]
 }

playground

GET /playground/random: Get a random publication title.
```
 {
   "publication_id": "xxx",
   "text": "yyy"
 }
```

POST /playground/add_record: Add a record.

 {
   "message": "Add new record successfully.",
   "record_id": 123,
 }

Using Browser

Using Command-line tools

To get site stats:

$ python ns-api.py stats

 Optional Arguments:
         --site-id: view stats of a particular site. 
         --date: view stats of a particular date. e.g. 2020-04-03
         -o / --output: filename to save the json output.

To get variable:

$ python ns-api.py variables

 Optional Arguments:
         --key: variable key
         -o / --output: filename to save the json output.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZeroScraper API

Contents

API endpoints

authentication

system monitoring

stats

articles

sites

publications

playground

Using Browser

Using Command-line tools

FilesExpand file tree

API.md

Latest commit

History

API.md

File metadata and controls

ZeroScraper API

Contents

API endpoints

authentication

system monitoring

stats

articles

sites

publications

playground

Using Browser

Using Command-line tools