How To Get Twitter Follower Data Using Python And Tweepy

In January 2018, I wrote a couple of blog posts outlining some analysis I’d performed on followers of popular Finnish Twitter profiles. A few people asked that I share the tools used to perform that research. Today, I’ll share a tool similar to the one I used to conduct that research, and at the same time, illustrate how to obtain data about a Twitter account’s followers.

This tool uses Tweepy to connect to the Twitter API. In order to enumerate a target account’s followers, I like to start by using Tweepy’s followers_ids() function to get a list of Twitter ids of accounts that are following the target account. This call completes in a single query, and gives us a list of Twitter ids that can be saved for later use (since both screen_name and name an be changed, but the account’s id never changes). Once I’ve obtained a list of Twitter ids, I can use Tweepy’s lookup_users(userids=batch) to obtain Twitter User objects for each Twitter id. As far as I know, this isn’t exactly the documented way of obtaining this data, but it suits my needs. /shrug

Once a full set of Twitter User objects has been obtained, we can perform analysis on it. In the following tool, I chose to look at the account age and friends_count of each account returned, print a summary, and save a summarized form of each account’s details as json, for potential further processing. Here’s the full code:

from tweepy import OAuthHandler
from tweepy import API
from collections import Counter
from datetime import datetime, date, time, timedelta
import sys
import json
import os
import io
import re
import time

Helper functions to load and save intermediate steps

def save_json(variable, filename):
with io.open(filename, “w”, encoding=“utf-8”) as f:
f.write(unicode(json.dumps(variable, indent=4, ensure_ascii=False)))

def load_json(filename):
ret = None
if os.path.exists(filename):
try:
with io.open(filename, “r”, encoding=“utf-8”) as f:
ret = json.load(f)
except:
pass
return ret

def try_load_or_process(filename, processor_fn, function_arg):
load_fn = None
save_fn = None
if filename.endswith(“json”):
load_fn = load_json
save_fn = save_json
else:
load_fn = load_bin
save_fn = save_bin
if os.path.exists(filename):
print("Loading " + filename)
return load_fn(filename)
else:
ret = processor_fn(function_arg)
print("Saving " + filename)
save_fn(ret, filename)
return ret

Some helper functions to convert between different time formats and perform date calculations

def twitter_time_to_object(time_string):
twitter_format = “%a %b %d %H:%M:%S %Y”
match_expression = “^(.+)\s(+[0-9][0-9][0-9][0-9])\s([0-9][0-9][0-9][0-9])$”
match = re.search(match_expression, time_string)
if match is not None:
first_bit = match.group(1)
second_bit = match.group(2)
last_bit = match.group(3)
new_string = first_bit + " " + last_bit
date_object = datetime.strptime(new_string, twitter_format)
return date_object

def time_object_to_unix(time_object):
return int(time_object.strftime("%s"))

def twitter_time_to_unix(time_string):
return time_object_to_unix(twitter_time_to_object(time_string))

def seconds_since_twitter_time(time_string):
input_time_unix = int(twitter_time_to_unix(time_string))
current_time_unix = int(get_utc_unix_time())
return current_time_unix - input_time_unix

def get_utc_unix_time():
dts = datetime.utcnow()
return time.mktime(dts.timetuple())

Get a list of follower ids for the target account

def get_follower_ids(target):
return auth_api.followers_ids(target)

Twitter API allows us to batch query 100 accounts at a time

So we’ll create batches of 100 follower ids and gather Twitter User objects for each batch

def get_user_objects(follower_ids):
batch_len = 100
num_batches = len(follower_ids) / 100
batches = (follower_ids[i:i+batch_len] for i in range(0, len(follower_ids), batch_len))
all_data = []
for batch_count, batch in enumerate(batches):
sys.stdout.write("\r")
sys.stdout.flush()
sys.stdout.write("Fetching batch: " + str(batch_count) + “/” + str(num_batches))
sys.stdout.flush()
users_list = auth_api.lookup_users(user_ids=batch)
users_json = (map(lambda t: t._json, users_list))
all_data += users_json
return all_data

Creates one week length ranges and finds items that fit into those range boundaries

def make_ranges(user_data, num_ranges=20):
range_max = 604800 * num_ranges
range_step = range_max/num_ranges

We create ranges and labels first and then iterate these when going through the whole list

of user data, to speed things up

ranges = {}
labels = {}
for x in range(num_ranges):
    start_range = x * range_step
    end_range = x * range_step + range_step
    label = "%02d" % x + " - " + "%02d" % (x+1) + " weeks"
    labels[label] = []
    ranges[label] = {}
    ranges[label]["start"] = start_range
    ranges[label]["end"] = end_range
for user in user_data:
    if "created_at" in user:
        account_age = seconds_since_twitter_time(user["created_at"])
        for label, timestamps in ranges.iteritems():
            if account_age > timestamps["start"] and account_age < timestamps["end"]:
                entry = {} 
                id_str = user["id_str"] 
                entry[id_str] = {} 
                fields = ["screen_name", "name", "created_at", "friends_count", "followers_count", "favourites_count", "statuses_count"] 
                for f in fields: 
                    if f in user: 
                        entry[id_str][f] = user[f] 
                labels[label].append(entry) 
return labels

if name == “main”:
account_list = []
if (len(sys.argv) > 1):
account_list = sys.argv[1:]

if len(account_list) < 1:
    print("No parameters supplied. Exiting.")
    sys.exit(0)

consumer_key=""
consumer_secret=""
access_token=""
access_token_secret=""

auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
auth_api = API(auth)

for target in account_list:
    print("Processing target: " + target)

Get a list of Twitter ids for followers of target account and save it

    filename = target + "_follower_ids.json"
    follower_ids = try_load_or_process(filename, get_follower_ids, target)

Fetch Twitter User objects from each Twitter id found and save the data

    filename = target + "_followers.json"
    user_objects = try_load_or_process(filename, get_user_objects, follower_ids)
    total_objects = len(user_objects)

Record a few details about each account that falls between specified age ranges

    ranges = make_ranges(user_objects)
    filename = target + "_ranges.json"
    save_json(ranges, filename)

Print a few summaries

    print
    print("\t\tFollower age ranges")
    print("\t\t===================")
    total = 0
    following_counter = Counter()
    for label, entries in sorted(ranges.iteritems()):
        print("\t\t" + str(len(entries)) + " accounts were created within " + label)
        total += len(entries)
        for entry in entries:
            for id_str, values in entry.iteritems():
                if "friends_count" in values:
                    following_counter[values["friends_count"]] += 1
    print("\t\tTotal: " + str(total) + "/" + str(total_objects))
    print
    print("\t\tMost common friends counts")
    print("\t\t==========================")
    total = 0
    for num, count in following_counter.most_common(20):
        total += count
        print("\t\t" + str(count) + " accounts are following " + str(num) + " accounts")
    print("\t\tTotal: " + str(total) + "/" + str(total_objects))
    print
    print

Let’s run this tool against a few accounts and see what results we get. First up: @realDonaldTrump

realdonaldtrump_age_ranges

Age ranges of new accounts following @realDonaldTrump

As we can see, over 80% of @realDonaldTrump’s last 5000 followers are very new accounts (less than 20 weeks old), with a majority of those being under a week old. Here’s the top friends_count values of those accounts:

realdonaldtrump_friends_counts

Most common friends_count values seen amongst the new accounts following @realDonaldTrump

No obvious pattern is present in this data.

Next up, an account I looked at in a previous blog post – @niinisto (the president of Finland).

Age ranges of new accounts following @niinisto

Many of @niinisto’s last 5000 followers are new Twitter accounts. However, not in as large of a proportion as in the @realDonaldTrump case. In both of the above cases, this is to be expected, since both accounts are recommended to new users of Twitter. Let’s look at the friends_count values for the above set.

Most common friends_count values seen amongst the new accounts following @niinisto

In some cases, clicking through the creation of a new Twitter account (next, next, next, finish) will create an account that follows 21 Twitter profiles. This can explain the high proportion of accounts in this list with a friends_count value of 21. However, we might expect to see the same (or an even stronger) pattern with the @realDonaldTrump account. And we’re not. I’m not sure why this is the case, but it could be that Twitter has some automation in place to auto-delete programmatically created accounts. If you look at the output of my script you’ll see that between fetching the list of Twitter ids for the last 5000 followers of @realDonaldTrump, and fetching the full Twitter User objects for those ids, 3 accounts “went missing” (and hence the tool only collected data for 4997 accounts.)

Finally, just for good measure, I ran the tool against my own account (@r0zetta).

Age ranges of new accounts following @r0zetta

Here you see a distribution that’s probably common for non-celebrity Twitter accounts. Not many of my followers have new accounts. What’s more, there’s absolutely no pattern in the friends_count values of these accounts:

Most common friends_count values seen amongst the new accounts following @r0zetta

Of course, there are plenty of other interesting analyses that can be performed on the data collected by this tool. Once the script has been run, all data is saved on disk as json files, so you can process it to your heart’s content without having to run additional queries against Twitter’s servers. As usual, have fun extending this tool to your own needs, and if you’re interested in reading some of my other guides or analyses, here’s full list of those articles.

Article Link: https://labsblog.f-secure.com/2018/02/27/how-to-get-twitter-follower-data-using-python-and-tweepy/