I’ve been writing scripts to process Twitter streaming data via the Twitter API. One of those scripts looks for patterns in metadata and associations between accounts, as streaming data arrives. The script processes retweets, and I decided to add functionality to also process quote Tweets.
Retweets “echo” the original by embedding a copy of the Tweet in a field called retweeted_status:
According to Twitter’s own API documentation, a quote Tweet should work in a similar way. (A quote Tweet is like wrapping your tweet around somebody else’s.) A Tweet object containing the quoted Tweet should be available in the quoted_status field.
I some wrote code to fetch and process quoted_status in a similar way to how I was already processing retweeted_status, but it didn’t work. I “asked” Google for answers, but didn’t really find anything, so I decided to dig into what the API was actually returning in the quoted_status field.
It turns out it’s not a Tweet object. Here’s what a quoted_status field actually looks like:
{u'contributors': None, u'truncated': False, u'text': u'', u'is_quote_status': False, u'in_reply_to_status_id': None, u'id': 0, u'favorite_count': 0, u'source': u'<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>', u'retweeted': False, u'coordinates': None, u'entities': {u'user_mentions': [], u'symbols': [], u'hashtags': [], u'urls': []}, u'in_reply_to_screen_name': None, u'id_str': u'', u'retweet_count': 0, u'in_reply_to_user_id': None, u'favorited': False, u'user': {u'follow_request_sent': None, u'profile_use_background_image': True, u'default_profile_image': False, u'id': 0, u'verified': True, u'profile_image_url_https': u'https://pbs.twimg.com/profile_images/', u'profile_sidebar_fill_color': u'FFFFFF', u'profile_text_color': u'FFFFFF', u'followers_count': 0, u'profile_sidebar_border_color': u'FFFFFF', u'id_str': u'0', u'profile_background_color': u'FFFFFF', u'listed_count': 0, u'profile_background_image_url_https': u'https://abs.twimg.com/images/', u'utc_offset': -18000, u'statuses_count': 0, u'description': u"", u'friends_count': 0, u'location': None, u'profile_link_color': u'FFFFFF', u'profile_image_url': u'http://pbs.twimg.com/profile_images/', u'following': None, u'geo_enabled': True, u'profile_banner_url': u'https://pbs.twimg.com/profile_banners/', u'profile_background_image_url': u'http://abs.twimg.com/images/', u'name': u'', u'lang': u'en', u'profile_background_tile': False, u'favourites_count': 0, u'screen_name': u'', u'notifications': None, u'url': None, u'created_at': u'Fri Nov 27 23:14:06 +0000 2009', u'contributors_enabled': False, u'time_zone': u'', u'protected': False, u'default_profile': True, u'is_translator': False}, u'geo': None, u'in_reply_to_user_id_str': None, u'lang': u'en', u'created_at': u'Thu Jun 22 00:33:13 +0000 2017', u'filter_level': u'low', u'in_reply_to_status_id_str': None, u'place': None}
So, it’s a data structure that contains some of the information you might find in a Tweet object. But it’s not an actual Tweet object. Kinda makes sense if you think about it. A quote Tweet can quote other quote Tweets, which can quote other quote Tweets. (Some folks created rather long quote Tweet chains when the feature was first introduced.) So, if the API would return a fully-hydrated Tweet object for a quoted Tweet, that object could contain another Tweet object in its own quoted_status field, and so on, and so on.
Here’s a small piece of python code that looks for retweets and quote Tweets in a stream and retrieves the screen_name of the user who published the original Tweet, if it finds one. It illustrates the differences between handling retweets and quote Tweets.
from tweepy.streaming import StreamListener from tweepy import OAuthHandler from tweepy import Stream from tweepy import APIconsumer_key="add your own key here"
consumer_secret="add your own secret here"
access_token="add your own token here"
access_token_secret=“add your own secret here”class StdOutListener(StreamListener):
def on_status(self, status):
screen_name = status.user.screen_nameif hasattr(status, 'retweeted_status'): retweet = status.retweeted_status if hasattr(retweet, 'user'): if retweet.user is not None: if hasattr(retweet.user, "screen_name"): if retweet.user.screen_name is not None: retweet_screen_name = retweet.user.screen_name print screen_name + " retweeted " + retweet_screen_name if hasattr(status, 'quoted_status'): quote_tweet = status.quoted_status if 'user' in quote_tweet: if quote_tweet['user'] is not None: if "screen_name" in quote_tweet['user']: if quote_tweet['user']['screen_name'] is not None: quote_tweet_screen_name = quote_tweet['user']['screen_name'] print screen_name + " quote tweeted " + quote_tweet_screen_name return True def on_error(self, status): print status
if name == ‘main’:
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
auth_api = API(auth)
print "Signing in as: "+auth_api.me().name
print “Preparing stream”stream = Stream(auth, l, timeout=30.0) searches = ['donald', 'trump', ] while True: if 'searches' in locals(): print"Filtering on:" + str(searches) stream.filter(track=searches) else: print"Getting 1% sample" stream.sample()</pre><br /> Tagged: <a href="https://labsblog.f-secure.com/tags/osint/" rel="noreferrer" target="_blank">OSINT</a>, <a href="https://labsblog.f-secure.com/tags/python/" rel="noreferrer" target="_blank">Python</a>, <a href="https://labsblog.f-secure.com/tags/th3-cyb3r/" rel="noreferrer" target="_blank">Th3 Cyb3r</a>, <a href="https://labsblog.f-secure.com/tags/twitter/" rel="noreferrer" target="_blank">Twitter</a> <img alt="" height="1" src="https://pixel.wp.com/b.gif?host=labsblog.f-secure.com&blog=96573572&post=2892&subd=newsfromthelab&ref=&feed=1" width="1" />
Article Link: https://labsblog.f-secure.com/2017/06/23/processing-quote-tweets-with-twitter-api/