Skip to content

f4berack/scrapy-first-experimental-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Introduction

A small tutorial project that includes a basic Scrapy spider.
The spider crawls the miamammausalinux.org news section and extracts article titles and URLs, stopping when it reaches a user-defined maximum number of pages.


Features

  • Configurable maximum number of pages to scrape
  • Save results in JSON format

Installation

TO DO


Usage

Command

scrapy crawl miamammausalinux -a max_pages=1 -O output.json

Parameters

  • max_pages (optional): Maximum number of pages to crawl. If omitted, the spider will crawl until no more pages are available.

Output file example

[
{"title": "Title1", "link": "link1"},
{"title": "Title2", "link": "link2"}
...
]

About

Experimental Scrapy personal project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages