Fix video url scraping by makamys · Pull Request #285 · taspinar/twitterscraper

makamys · 2020-04-21T16:47:39Z

The HTML element that the video url was getting scraped no longer exists, so video_div.find('a') returned None, and this made tweets containing videos fail getting scraped.
I changed it to use regex to extract the video id, and construct the video url from it.

someguy-2020 · 2020-04-23T22:52:07Z

I had to change line 83 to:
video_id = re.search(r"https://pbs.twimg.com/ext_tw_video_thumb/(.*)\.jpg", str(video_div)).group(1)
[tweet_video_thumb --> ext_tw_video_thumb]
to get the proper video image URL. Unfortunately, this doesn't provide the proper video_url. Any idea what the video_url is based on the video img url?

makamys · 2020-04-24T18:17:40Z

Oh dang, it looks like it wasn't as simple as I was hoping. It turns out short videos have the thumbnail image in a format like tweet_video_thumb/<VIDEO ID>.jpg, and for those, my code works.

But longer videos are in the format of ext_tw_video_thumb/<TWEET ID>/pu/img/<THUMBNAIL ID>.jpg like you posted. Those videos are streamed via HLS, and the web app makes an API call (https://api.twitter.com/1.1/videos/tweet/config/<TWEET ID>.json) to find the m3u8 that contains the segments (which is in the form of https://video.twimg.com/ext_tw_video/<TWEET_ID>/pu/pl/<VIDEO ID>.m3u8).

Using <THUMBNAIL ID> as the <VIDEO ID> doesn't work though, and there's no reference to the <VIDEO ID> in the html served. So there may not be a way to get the video url without making an API call.

By the way, youtube-dl uses the API with a guest token to get the video url (see twitter.py, relevant discussion here).

As a workaround, the video url could be set to the tweet's url so at least tweets with videos don't get skipped. My use case for twitterscraper didn't include scraping tweets with long videos though, so I won't be fixing this myself, but hopefully these notes will be useful to someone else.

Fix video url scraping

2125976

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix video url scraping#285

Fix video url scraping#285
makamys wants to merge 1 commit intotaspinar:masterfrom
makamys:master

makamys commented Apr 21, 2020

Uh oh!

someguy-2020 commented Apr 23, 2020

Uh oh!

makamys commented Apr 24, 2020 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

makamys commented Apr 21, 2020

Uh oh!

someguy-2020 commented Apr 23, 2020

Uh oh!

makamys commented Apr 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

makamys commented Apr 24, 2020 •

edited

Loading