scrape top N websites

Sparked by my rant online https://bsky.app/profile/veneman.dev/post/3m7gjtmgrtc2z

- [ ] What scraping method? https://www.projectwallace.com/blog/ways-to-scrape-css
- [ ] What do we store? HTML + CSS? All raw data? CSS only?
- [ ] Fetch from source or via HTTPArchive?
- [ ] Resumability is key -> store progress while scraping, continue later without losing progress
- [ ] How to aggregate data?