scrapy.github.io/index.html at master · a-mkh/scrapy.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
layout: default
title: An open source web scraping framework for Python
---
<h2>Welcome to Scrapy</h2>

<h3>What is Scrapy?</h3>

<p>Scrapy is a fast high-level screen scraping and web crawling
framework, used to crawl websites and extract structured data from their
pages. It can be used for a wide range of purposes, from data mining to
monitoring and automated testing.</p>

<h3>Features</h3>

<dl>

  <dt>Simple<dt>
  <dd>Scrapy was designed with simplicity in mind, by providing the features
  you need without getting in your way</dd>

  <dt>Productive</dt>
  <dd>Just write the rules to extract the data from web pages and let Scrapy
  crawl the entire web site for you</dd>

  <dt>Fast</dt>
  <dd>Scrapy is used in production crawlers to completely scrape more than
  500 retailer sites daily, all in one server</dd>

  <dt>Extensible</dt>
  <dd>Scrapy was designed with extensibility in mind and so it provides
  several mechanisms to plug new code without having to touch the framework
  core</li>

  <dt>Portable, open-source, 100% Python</dt>
  <dd>Scrapy is completely written in Python and runs on Linux, Windows, Mac and BSD</dd>

  <dt>Batteries included</dt>
  <dd>Scrapy comes with lots of functionality built in. Check <a
    href="http://doc.scrapy.org/en/latest/intro/overview.html#what-else">this
    section</a> of the documentation for a list of them.</dd>

  <dt>Well-documented &amp; well-tested</dt>
  <dd>Scrapy is <a href="/doc/">extensively documented</a> and has an comprehensive test suite
  with <a href="http://static.scrapy.org/coverage-report/">very good code
    coverage</a></dd>

  <dt><a href="/community">Healthy community</a></dt>
  <dd>
  4,000 stars, 1,000 forks, 400 watchers on Github (<a href="https://github.com/scrapy/scrapy">link</a>)<br>
  1,300 followers on Twitter (<a href="https://twitter.com/ScrapyProject">link</a>)<br>
  1,900 questions on StackOverflow (<a href="http://stackoverflow.com/tags/scrapy/info">link</a>)<br>
  1,800 members, 150 messages per month on mailing list (<a href="https://groups.google.com/forum/?fromgroups#!aboutgroup/scrapy-users">link</a>)<br>
  40-50 users always connected to IRC channel (<a href="http://webchat.freenode.net/?channels=scrapy">link</a>)
  </dd>

  <dt><a href="/support">Commercial support</a></dt>
  <dd>A few companies provide Scrapy consulting and support</dd>

  <p>Still not sure if Scrapy is what you're looking for?. Check out <a
    href="http://doc.scrapy.org/en/latest/intro/overview.html">Scrapy at a
    glance</a>.

  <h3>Companies using Scrapy</h3>

  <p>Scrapy is being used in large production environments, to crawl
  thousands of sites daily. Here is a list of <a href="/companies/">Companies
    using Scrapy</a>.</p>

  <h3>Where to start?</h3>

  <p>Start by reading <a
    href="http://doc.scrapy.org/en/latest/intro/overview.html">Scrapy at a glance</a>,
  then <a href="/download/">download Scrapy</a> and follow the <a
    href="http://doc.scrapy.org/en/latest/intro/tutorial.html">Tutorial</a>.