|
2 | 2 |
|
3 | 3 | ## Overview |
4 | 4 |
|
5 | | -Synapse is a modular and configurable framework to build unified web crawling and data processing pipelines in Go for any workload from single machine to distributed clusters (planned for the future) and provides core components including fetcher, frontier, pipeline, queue and storage backends, allowing the developers focus on domain-specific crawling logic without re-implementing infrastructure from scratch, rather configure and extend the existing components. |
| 5 | +Synapse is a highly efficient and pluggable, open-source crawling/scraping framework; |
| 6 | +for both local and distributed workloads. |
| 7 | + |
| 8 | +There're two integration paths, based on level of control: |
| 9 | + |
| 10 | +1. High-Level API: Built for standard crawling workloads. |
| 11 | + Extend with built-in plugins and get moving immediately without fiddling |
| 12 | + with the underlying mechanics. **[TODO]** |
| 13 | + |
| 14 | +2. Low-Level API: For architecting custom scrapers/crawlers with (Sub)component-level control. |
| 15 | + Extend with your own implementations. **[WIP]** |
6 | 16 |
|
7 | 17 | ## Status |
8 | 18 |
|
9 | | -This framework is in active development and not production-ready yet. Breaking changes may occur in future releases. So, the public documentation and examples will be provided once the core components stabilize. |
| 19 | +The distributed architecture is **[WIP]**; essentially tinkering with distributed state-machine. |
| 20 | +Currently, in experimental phase. Expect breaking changes as the architecture evolves. |
10 | 21 |
|
11 | 22 | ## Documentation |
12 | 23 |
|
13 | | -For developers, component-specific implementation details are available in their respective directories with examples ([Fetcher](./fetcher), [Spooler](./spooler)) |
| 24 | +Efforts are currently prioritized toward solid core abstractions over polished public documentation. |
| 25 | +Implementation-specific details are available within each component's directory for developers |
| 26 | +diving into the internals. |
14 | 27 |
|
15 | | -## Contributing |
| 28 | +## Development |
16 | 29 |
|
17 | | -Contributions are welcome! Please refer to the [CONTRIBUTING.md](./CONTRIBUTING.md) for guidelines on how to contribute to this project. |
| 30 | +Contributions are welcome! |
18 | 31 |
|
19 | | -## History and Purpose |
| 32 | +- Start by checking [contribution guidelines](./CONTRIBUTING.md). |
20 | 33 |
|
21 | | -This framework is primarily developed as an engineering challenge to develop highly pluggable distributed web crawler infrastructure in Go. |
| 34 | +- Any questions, ask on [discussions](https://github.com/vyrelabs/synapse/discussions) |
22 | 35 |
|
23 | | -There is a long-term goal to leverage crawling infrastructure (on top of this framework) in a separate [OSS search engine project](https://github.com/ritvikos/idx), that project remains entirely independent. |
| 36 | +## Why this naming? |
24 | 37 |
|
25 | | -All architectural decisions, design choices, and implementation details in Synapse are made solely based on its merit as a standalone Go web crawling framework, with no influence from or coupling to any future projects. |
| 38 | +In neurobiology, a synapse is the junction for signal transmission between neurons. |
| 39 | +This framework serves as the interface between the web and application-specific logic, |
| 40 | +decoupling data acquisition from downstream processing. |
26 | 41 |
|
27 | 42 | ## Ethical Considerations |
28 | 43 |
|
29 | | -It's not intended for any malicious or unethical web scraping/crawling activities. Please ensure you [comply with the website's `robots.txt` directives](./frontier/robots) and terms of service (TOS) before crawling/scraping. |
| 44 | +It's not intended for any malicious or unethical web scraping/crawling activities. |
| 45 | +Please ensure you [comply with the website's `robots.txt` directives](./frontier/robots) |
| 46 | +and terms of service (TOS) before crawling/scraping. |
| 47 | + |
0 commit comments