Skip to content

Commit 4daafc4

Browse files
committed
refactor: update 'readme.md' and 'contributing.md'
Update TODO(s) and WIP(s) Signed-off-by: Ritvik Gupta <ritvikfoss@gmail.com>
1 parent f25024a commit 4daafc4

2 files changed

Lines changed: 42 additions & 16 deletions

File tree

CONTRIBUTING.md

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -20,17 +20,23 @@ The project welcomes contributions from the community!
2020

2121
4. [Install Taskfile](https://taskfile.dev/docs/installation#official-package-managers) (if not already installed)
2222

23-
5. Run all tests to verify your setup with `task test:all` or run individual component tests with `task test PKG=fetcher` (replace `fetcher` with the desired component name, see [project's Taskfile](./Taskfile.yml)).
23+
5. Run all tests to verify your setup with `task test:all` or run individual component tests with `task test PKG=fetcher`
24+
(replace `fetcher` with the desired component name, see [project's Taskfile](./Taskfile.yml)).
2425

2526
## Considerations for designing abstractions and APIs
2627

27-
1. Try to stay compatible with standard library interfaces when applicable..
28-
2. Prefer interface segregation for low couping, when applicable with bare-minimum, necessary methods, when possible.
29-
3. Try to expose safe minimal public API surface with sensible configurable options and defaults. In case the unsafe operations are exposed for standard library compatibility, document them clearly about the potential risks and usage guidelines.
28+
1. Try to stay compatible with standard library interfaces, when applicable.
29+
2. Prefer interface segregation for low couping with bare-minimum & necessary exposed methods, when possible.
30+
3. Try to expose safe minimal public API surface with sensible configurable options and defaults.
31+
In case the unsafe operations are exposed for standard library compatibility,
32+
document them clearly about the potential risks and usage guidelines.
33+
34+
Overall, shrink the API surface, so there's less for the end-users and implementors to mess up.
3035

3136
## Contributing
3237

33-
Before writing code for a new feature, please first discuss the change you wish to make via github discussions to ensure that it aligns with the project's goals and to avoid any duplication of effort.
38+
Before writing code for a new feature, please first discuss the change you wish to make via github discussions to ensure
39+
that it aligns with the project's goals and to avoid any duplication of effort.
3440

3541
To contribute, please follow these guidelines:
3642

@@ -57,7 +63,9 @@ git checkout -b your-bug-or-feature-branch
5763

5864
5. Run `task lint` to ensure your code adheres to the project's coding standards
5965

60-
6. Make your changes and commit them with clear, [descriptive conventoinal commit message](https://www.conventionalcommits.org/en/v1.0.0/). Ensure that you've [\"Sign-off\" your commits](https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits). You can do this by adding the `-s` flag to your git commit command:
66+
6. Make your changes and commit them with clear, [descriptive conventoinal commit message](https://www.conventionalcommits.org/en/v1.0.0/).
67+
Ensure that you've [\"Sign-off\" your commits](https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits).
68+
You can do this by adding the `-s` flag to your git commit command:
6169

6270
```
6371
git commit -s

README.md

Lines changed: 28 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2,28 +2,46 @@
22

33
## Overview
44

5-
Synapse is a modular and configurable framework to build unified web crawling and data processing pipelines in Go for any workload from single machine to distributed clusters (planned for the future) and provides core components including fetcher, frontier, pipeline, queue and storage backends, allowing the developers focus on domain-specific crawling logic without re-implementing infrastructure from scratch, rather configure and extend the existing components.
5+
Synapse is a highly efficient and pluggable, open-source crawling/scraping framework;
6+
for both local and distributed workloads.
7+
8+
There're two integration paths, based on level of control:
9+
10+
1. High-Level API: Built for standard crawling workloads.
11+
Extend with built-in plugins and get moving immediately without fiddling
12+
with the underlying mechanics. **[TODO]**
13+
14+
2. Low-Level API: For architecting custom scrapers/crawlers with (Sub)component-level control.
15+
Extend with your own implementations. **[WIP]**
616

717
## Status
818

9-
This framework is in active development and not production-ready yet. Breaking changes may occur in future releases. So, the public documentation and examples will be provided once the core components stabilize.
19+
The distributed architecture is **[WIP]**; essentially tinkering with distributed state-machine.
20+
Currently, in experimental phase. Expect breaking changes as the architecture evolves.
1021

1122
## Documentation
1223

13-
For developers, component-specific implementation details are available in their respective directories with examples ([Fetcher](./fetcher), [Spooler](./spooler))
24+
Efforts are currently prioritized toward solid core abstractions over polished public documentation.
25+
Implementation-specific details are available within each component's directory for developers
26+
diving into the internals.
1427

15-
## Contributing
28+
## Development
1629

17-
Contributions are welcome! Please refer to the [CONTRIBUTING.md](./CONTRIBUTING.md) for guidelines on how to contribute to this project.
30+
Contributions are welcome!
1831

19-
## History and Purpose
32+
- Start by checking [contribution guidelines](./CONTRIBUTING.md).
2033

21-
This framework is primarily developed as an engineering challenge to develop highly pluggable distributed web crawler infrastructure in Go.
34+
- Any questions, ask on [discussions](https://github.com/vyrelabs/synapse/discussions)
2235

23-
There is a long-term goal to leverage crawling infrastructure (on top of this framework) in a separate [OSS search engine project](https://github.com/ritvikos/idx), that project remains entirely independent.
36+
## Why this naming?
2437

25-
All architectural decisions, design choices, and implementation details in Synapse are made solely based on its merit as a standalone Go web crawling framework, with no influence from or coupling to any future projects.
38+
In neurobiology, a synapse is the junction for signal transmission between neurons.
39+
This framework serves as the interface between the web and application-specific logic,
40+
decoupling data acquisition from downstream processing.
2641

2742
## Ethical Considerations
2843

29-
It's not intended for any malicious or unethical web scraping/crawling activities. Please ensure you [comply with the website's `robots.txt` directives](./frontier/robots) and terms of service (TOS) before crawling/scraping.
44+
It's not intended for any malicious or unethical web scraping/crawling activities.
45+
Please ensure you [comply with the website's `robots.txt` directives](./frontier/robots)
46+
and terms of service (TOS) before crawling/scraping.
47+

0 commit comments

Comments
 (0)