Seperate profile parsing from scrapers/profiles.go

Currently scrapers/profiles.go also does the parsing which does not match our design.  

Here is what I am proposing 

- Update scrapers/profiles.go to save .html files similar to scrapers/coursebook.go 
   - add /professors to outDir
   - save profiles as {fist}-{last}.html
- Create a parser/profiles.go
   - Copy all of the parsing logic into here, modified to use goquery instead of chromedp
- Update flags in main.go 
- Bonus 
  - Add resume support to scraper
  - Add a unit test for the parser
- Side effects
  - parser.go uses `utils.GetAllFilesWithExtension`  which would create an issue if the proposed `/poffessors` is added so we might consider scraping coursebook into `outDir/coursebook/...` instead. 

```
Sample dir structure: 

 outDir (ie data)
    ├───coursebook
    │   ├───24f
    │   │   └───cp_acct
    │   │           acct2301.001.24f.html
    │   │           acct2301.002.24f.html
    │   │           ...
    │   │    ...
    └───professors
            first-last.html
            ...

```


I haven't worked with the profiles scraper very much but there does not seem to be any technical reason why this should not be possible.

If this is added as a task I don't mind working on it but if someone is interested feel free.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seperate profile parsing from scrapers/profiles.go #81

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Seperate profile parsing from scrapers/profiles.go #81

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions