modified: twitterscraper/tweet.py#100
modified: twitterscraper/tweet.py#100hengruo wants to merge 4 commits intotaspinar:masterfrom hengruo:master
Conversation
| @generate_ordering('timestamp', 'id', 'text', 'user', 'replies', 'reply_to_id', 'retweets', 'likes') | ||
| class Tweet: | ||
| def __init__(self, user, fullname, id, url, timestamp, text, replies, retweets, likes, html): | ||
| def __init__(self, user, fullname, id, url, timestamp, text, reply_to_id, replies, retweets, likes, html): |
There was a problem hiding this comment.
placing a new argument at this location breaks backward compatibility. I suggest you move it to the end of the list of arguments.
There was a problem hiding this comment.
The newly implemented 'reply_to_user' is not passed to the Tweet class and hence will not appear in the output.
| 'span', 'ProfileTweet-actionCount')['data-tweet-stat-count'] or '0', | ||
| html=str(tweet.find('p', 'tweet-text')) or "", | ||
| ) | ||
| reply_to_id = tweet.findChildren()[0]['data-conversation-id'] or '0', |
There was a problem hiding this comment.
This can also be achieved with
reply_to_id = tweet.find('div', 'tweet')['data-conversation-id'] or '0'
| self.retweets = retweets | ||
| self.likes = likes | ||
| self.html = html | ||
| self.reply_to_id = reply_to_id |
There was a problem hiding this comment.
self.reply_to_id = 0 if id == reply_to_id else reply_to_id
sets it to zero if it is equal to the tweet-id, i.e. if it is not a reply to anyone. Giving the reply_to_id a value even when it is not a reply is misleading and people would have to check the equivalence of id and reply_to_id before they can be sure it is an reply.
|
I know it is possible to retrieve the contents of a tweet if you know the username and id with "https://twitter.com//status/". One way in which you can find out the username belonging to the original tweet is with the following command: This retrieves a JSON-list containing among other things the username, screen_name and id_str of everyone which has participated in the conversation. If it is not possible to retrieve a tweet by id only, I suggest you also include the username of the original tweet. |
|
Your suggestions are very useful! I stored tweets in my database so I don't consider the condition where we need to get tweets online just by id. I'll fix it. |
| html = response.text | ||
| else: | ||
| json_resp = response.json() | ||
| json_resp = ujson.loads(response.text) |
There was a problem hiding this comment.
What is the difference between json.loads() and ujson.loads() ? If there is no clear reason for using ujson instead of json, I prefer the usage of json.
| limit_per_pool = roundup(limit, poolsize) | ||
| else: | ||
| limit_per_pool = None | ||
| limit_per_pool = limit |
There was a problem hiding this comment.
This change will result in twitterscraper scraping approximately for P*limit number of tweets (where P is the poolsize) instead of the given limit. Please remove this change.
| html=str(tweet.find('p', 'tweet-text')) or "", | ||
| ) | ||
| reply_to_id = tweet.find('div', 'tweet')['data-conversation-id'] or '0', | ||
| reply_to_user = tweet.find('div', 'tweet')['data-mentions'] or "", |
There was a problem hiding this comment.
This is already implemented in PR #98 . Maybe it is best to remove it here.
I added a new field in Tweet: reply_to_id.
If tweet A is a reply to tweet B, then reply_to_id = B.id;
If tweet A doesn't reply to any tweet, then reply_to_id = A.id.
This field can let us construct the reply tree of tweets.