The State of Android Health Data (Part 2) – Google Fit

Who doesn’t like seeing a red panda doing pull-ups?

This is the second installment in the Android Health Data series. If you missed first one, you can catch it here.

I actually started writing this post early last year, but some of Google’s announcements at Google I/O  2021 along with other new & shiny things diverted my attention. So is life in digital forensics…

Google Fit has, for the most part, stayed the same since its introduction at Google I/O 2014. Interestingly enough, Android Wear was introduced at the same conference, which makes one wonder if Google had envisioned Android Wear and Google Fit would play a more dominant role, similar to how AppleWatch and Fitness/Health do within iOS. The idea behind Google Fit was that it would be a central repository for apps to store health data, and one from which apps could access activity data from other apps and sensors designed to collect such data.  The data would be persistent between app upgrades and sensor changes. Google Fit, like Android Wear, had some launch partners: Nike+ Running, Withings Health Mate, Runkeeper, Runtastic, and Noom Coach.

In 2018 Google Fit got a facelift and some feature upgrades when Google teamed up with the American Heart Association and World Health Organization. The new version tracked Move Minutes and Heart Points. The former did not have a formal definition, but the latter was described as getting credit for moderate activities such as as a brisk walk and intensive activities such as running. The interesting part was that Google Fit would track and record this data automatically if it detected what it thought was an activity, whether on a phone or a paired Wear OS device.

In 2021 Google Fit’s future was (and is) uncertain, but in a good way. Wear (formally Wear OS) looks to be getting a few new features as a result of Google’s completed acquisition of Fitbit, so I anticipate Google Fit will will be seeing some welcomed changes in the future as well. And, with the Tizen and Wear pair up, Google looks to be positioning Wear (and presumably Google Fit) to take on the bigger players in the health data market like Apple, Garmin and Whoop.  It will be quite the uphill battle, though.

The Warm Up

Google Fit’s interface is much like other fitness apps. There are three main parts of the app. The first is “Home,”which is the main dashboard where a user lands when the open the app. In Home, a user can see their metrics: Steps, Calories burned, Sleep, Heart rate, Weight, Blood Pressure, Workouts, and Heart Points. See Figures 1 – 3.

Figure 1Figure 1.  Google Fit dashboard (Part 1). Figure 2Figure 2.  Google Fit dashboard (Part 2). Figure 3Figure 3.  Google Fit dashboard (Part 3).

The dashboard may vary from what is seen here.  Users can customize it, and the contents will vary based on exercise/sleep/other activity.  The metrics in Google Fit are, for the most part, self-explanatory. The Heart Points metric is interesting; points are earned for every minute of activity, such as brisk walking, running, swimming, or some other equivalent activity. Google does a decent job of explaining how Google Fit measures the activity needed to get heart points, which you can check out here.

Depending on the hardware being used, some of the metrics may be fully populated with data, some metrics may have some data population, and some may have no data at all. Missing or sporadic data could be the result of the hardware being used. For example, during testing one of the devices I used was a Mobvoi TicWatch 2020 (November patch). When it comes to Google Fit, the watch only captures heart rate data during a workout. The same can be said about another device I used, a Mobvoi TicWatch E3. Otherwise, I would need to manually initiate a heart rate measurement on the watch itself, or use the camera on the paired Pixel 3 to capture heart rate. It is extremely important to understand the capabilities and limitations of the paired hardware used to capture metrics. And remember, hardware could be a simple pedometer, smart watch, blood pressure cuff, smart scale (for weight), chest strap (for respiration), some other “smart” device, or the phone itself.

The second part of the app is “Journal.” While it may document many things, in my testing Journal documented workouts and sleep events. See Figure 4.

Figure 4Figure 4.  The Journal.

Pressing on any entry in the Journal provides a user with more details about the specific entry. Figures 5, 6, and 7 show an entry for one of my runs.

Figure 5Figure 5.  My run (Part 1). Figure 6Figure 6.  My run (Part 2). Figure 7Figure 7.  My run (Part 3).

The interesting thing I noticed during my testing is that Google Fit logged entries in the Journal even if I had not explicitly started a workout. For example, the watch or phone would log “Walks” in the Journal if I walked approximately 50 yards or longer without stopping for a substantial amount of time. When I end my runs, I usually walk for an additional mile (~ 1.609344 km), and that gets logged, too, without any action needed from me. All without having to press any buttons. From an investigative standpoint, this “feature” could come in handy.

The third part of the app is “Profile.” Here, a user can set their step and heart point goals, a bedtime schedule, and provide the app information about themselves. See Figure 8.

Figure 8Figure 8.  My profile.

As with the Garmin article, I do want to address the settings of the app. There are three main parts of the settings: Units of Measurement, Data & Personalization, and Tracking Preferences. See Figures 9 and 10.

Figure 9Figure 9.  Units & Google Fit data and personalization. Figure 10Figure 10.  Tracking preferences.

The Units of Measurement portion is straight forward. A user can set how things are measured.  Tracking Preferences is also straight forward, and the descriptions are accurate. I will note that neither of them were on by default. If a user wants to use the phone to track their activity, they will need to turn these on. The location setting is interesting as will be seen when discussing the forensics.

In Figure 10 just below Tracking Preferences, there is “Settings for other devices.” This setting lists all of the devices on which the user account is signed into Google Fit. So, if an examiner or investigator thinks they are missing a wearable device, this would be a good place to look to see what device(s) are associated with the account.

Data personalization is a biggie for a couple of reasons. See Figure 11.

Figure 11Figure 11.  Data written to my Google account (Google Fit data permissions).

First, most of the data collected by Google Fit is sent to the Google account that is signed in on the app, so things like, sleep steps, workouts, and others are available via Google Takeout if they were recorded by Google Fit. This is extremely important to remember when discussing the forensics later.

Second, if a user is utilizing a bridging app (i.e. an app to import data from another fitness app into Google Fit) or another app to collect certain data, that app (or apps) is listed here as well. For example, Sleep As Android is the app I used to track sleep during testing; it could be installed on the watch, which is a feature I used. In the public Android 12 image I used Health Sync to import Garmin data and FitToFit to import Fitbit data.  Knowing there are additional apps present might give an examiner an additional source of information. See Figure 12.

Figure 12Figure 12.  Connected apps.

The Workouts

A couple of notes. First, Google Fit it is not a default app on Android. A user has to download it from the Play Store, or an OEM has to have it pre-installed on a device. Second,  this post will not address the accuracy of the associated werable or the data that may reside on the wearable.  If you are interested in the latter, I wrote an article about Wear OS last year that can be found here.  And finally, Google Fit data resides in two different locations on the device. One of the locations is obvious, and the other is not. Let’s start with the not-so-obvious.

Earlier, it was mentioned that certain collected data is available via Google Takeout, which means Google has the data. But where, exactly, is it getting the data from? As with a lot of things that get sync’d to Google, the database resides in the Google Mobile Services directory path, USERDATA/data/com.google.android.gms/databases. The database name itself is useful as an examiner can quickly determine the account associated with the data within the database. The file name format is fitness.db.%GOOGLE_ACCOUNT_HERE%. So, as an example, the name of the database for my test account was fitness.db.thisisdfir_gmail.com. The data in this database directly corresponds to the health data that is sync’d to Google. When I pulled the takeout data for the test account, I found the data present in the database was also present in the takeout data and in a similar format.

The main table of interest is Sessions. See Figure 13.

Figure 13Figure 13.  Sessions table.

The table contains information on activity tracked by the app along with activity descriptors. Google Fit acts as a repository; it is able to collect health data on its own, but also stores health data from other apps that are connected to it. So, in addition to Google Fit (com.google.android.apps.fitness), examiners may find entries in the app_package column that are third party health apps that contributed health data to Google Fit. Figure 14 shows an example of this.

Figure 14Figure 14.  A third party app, Sleep As Android, contributed data to Google Fit.

A quick SQL query can clean up the data.

SELECT
datetime(Sessions.start_time/1000,’unixepoch’) AS “Activity Start Time (UTC)”,
datetime(Sessions.end_time/1000,’unixepoch’) AS “Activity Stop Time (UTC)”,
Sessions.app_package AS “Contributing App”,
CASE
WHEN Sessions.activity=7 THEN “Walking”
WHEN Sessions.activity=8 THEN “Running”
WHEN Sessions.activity=72 THEN “Sleeping”
ELSE Sessions.activity
END AS “Activity Type”,
Sessions.name AS “Activity Name”,
Sessions.description AS “Activity Description”
FROM
Sessions
ORDER BY “Activity Start Time (UTC)” ASC

Google Fit is able to track a long list of activities.  The full list of the codes that may be seen in the activity column can be found here.

This query has been implemented in ALEAPP.

The other table that is of interest is ChangeLog. The data in this table shows the data being sync’d with Google, along with the time it was sync’d, when the data was collected, and the app/method that contributed the data. The contributed data is stored in protobuf, so that will need to be decoded. For a reason I will discuss later, I recommend using Google’s protoc in lieu of anything else so that the raw data here, and elsewhere in Google Fit, is retrieved as-is. The interesting thing about this table is that it mirrors part of what is returned in a Google Takeout request. There is related data in other tables, so a quick SQL query can pull things together.

SELECT
datetime(ChangeLog.timestamp/1000,’unixepoch’) AS “Timestamp”,
DataSource.identifiers AS “Data Source”,
DataTypes.name AS “Data Type”
FROM
ChangeLog
JOIN DataSources ON DataSources._id=ChangeLog.data_source_id
JOIN DataTypes ON DataTypes._id=DataSources.data_type_id

That’s it for this table and database. While having this data is great, it is not as granular as the data that is kept by the Google Fit app. Speaking of which, the Google Fit app resides in USERDATA/data/com.google.android.apps.fitness, and is the other location where an examiner can find health data. User data is kept in the ~/files/accounts/%USER_NUMBER%/ folder. For my test device the folder path was ~/files/accounts/1/. The folder contains a mix of SQLite databases and protobuf files. This blog could be a novel, but I am going to highlight a few files that pertain to activity and location data.

The first file of interest is the metric_database.db, and there is a single table of interest: metric_aggregations. This table contains data related to steps taken, distance traveled, calories burned (both at rest and during workouts), move minutes, and heart points. Depending on the app contributing the data, an examiner may find only daily totals for these metrics, or they may find metric totals for segments of a half hour, hour, daily, and weekly. Being able to narrow the time of physical activity down to a thirty minute period could be extremely beneficial to an examiner/investigator.

For purposes of this blog post, I tried to stay as native-to-the-app as possible by using a Wear device, so I was able to generate data for in time period segments. See Figure 15.

Figure 15Figure 15.  metric_database.db (cleaned up after a SQL query).

The values in the metric column are not documented anywhere that I could find, but, based on testing, I have been able to determine the following:

1 = Steps
2 = Distance (meters)
4 = Calories Burned (rest + workouts)
11 = Move Minutes
12 = Heart Points

Another query will clean the table up.

SELECT
datetime(metric_aggregations.start_time_ms/1000,’unixepoch’) AS “Period Start”,
datetime(metric_aggregations.end_time_ms/1000,’unixepoch’) AS “Period End”,
CASE
WHEN metric_aggregations.metric=1 THEN “Steps”
WHEN metric_aggregations.metric=3 THEN “Distance (meters)”
WHEN metric_aggregations.metric=4 THEN “Calories Burned (resting + workouts)”
WHEN metric_aggregations.metric=11 THEN “Minutes Moved”
WHEN metric_aggregations.metric=12 THEN “Heart Points”
ELSE metric_aggregations.metric
END AS “Metric”,
metric_aggregations.value AS “Metric Value”,
datetime(metric_aggregations.last_update_time_ms/1000,’unixepoch’) AS “Metric Value Last Updated”
FROM
metric_aggregations
ORDER BY “Period Start” ASC

During testing I did notice that values were populated for burned calories prior to my setting up the test devices. Examiners should evaluate values in this column, if pertinent, to make sure that the values are not a result of errant readings and that they make sense. Additionally, there were values I was not able to positively identify; specifically, 15, 18, and 21. If I am able to identify what these values represent later, I will update this blog accordingly.

The next two databases contain similar data. The first is session_database.db, and the table of interest is session_entries. The table contains the start times, end times, activity types, and metadata about sessions found in the Journal (Figure 4). See Figure 16.

Figure 16Figure 16.  sessions_database.db.

The column activity_type contains values that describe the activity.  These codes are the same ones as discussed earlier. For testing I only observed walking, running, and sleeping, which are 7, 8, and 72, respectively. Metadata about the sessions are stored in BLOBs as protobuf in the metadata column. Figure 17 shows decoded metadata from a running workout recorded using the TicWatch E3. I will use this particular workout (or “session”) for the next several figures.

Figure 17Figure 17.  Decoded protobuf.

The top and bottom parts of Figure 17 show the same data, but the bottom half is a bit more concise. The area in the red box shows how the activity was recorded. There are multiple sections to this entry, each separated by a colon. The first part is a unique identifier followed by an indication of how the activity was tracked (watch-activemode), the type of activity (running), and a Unix Epoch timestamp for the start of the session. In instances where I only used the phone to track my activity, watch-activemode was replaced by just activemode. The identifier and timestamp remained.

Just below in the blue box is a description of the session (Lunch Run). In the purple box is the duration of the session in milliseconds, and the app that contributed the data to Google Fit is in the green box, which, in this case, was itself. Just below the green box are the start and stop times of the session (not highlighted).

One thing I did notice during the testing is that the protobuf field tags were consistent

The file journal_database.db contains much of the same information with a little extra. It also has a single table: journal_entries. See Figure 18.

Figure 18Figure 18.  journal_database.db.

As can be seen, the columns from sessions_database.db are present, but the column id is also present, which contains identifiers for sessions, such as the one in the red box in Figure 17. There are two additional formats in this column, though. The first are ones that contain two Unix Epoch timestamps separated by a hyphen (e.g. red box in Figure 18), and ones that contain the string “header_” followed by a Unix Epoch timestamp (e.g. blue box in Figure 18). The former are indicative of sessions that were automatically detected by the app and recorded…all without any intervention by me. The timestamps are the start and end of the session. The latter entry types indicate the start of a new day (the “header” of the new day, if you will).

The column journal_entry contains data about the entry, stored in protobuf. As before, I decoded it using protoc, and received the results seen in Figure 19.

Figure 19Figure 19.  More decoded protobuf.

Figure 19 represents the same session seen in Figure 18, but, as can be seen, there is slightly more data. The data in the green box is the same data seen in Figure 18. Below it, in the blue box, is the number of Heart Points I received for the run. The hex data in the red box is a double value. It represents the total distance of my run in meters. The converted value is 4985.70703125 meters, which is 3.097974722153871 miles. Figure 20 shows the run telemetry in the Google Fit interface and it looks like Google may be rounding up a bit.

Figure 20Figure 20.  Run telemetry in the Google Fit UI.

Just below the hex values is the value “24” seen in the purple box in Figure 19. That represents the “Move Minutes” for this particular session. I suspect value “23” just below it (not highlighted) represents the amount of active time for the session (23:51 from Figure 20), but I have not been able to definitively confirm it.

Locations

With two sources of workout data readers may have noticed a lack of location data. Location data is stored on the device, but it is in other files, the first being SqliteKeyValueCache:SessionLocation.db. See Figure 21.

Figure 21Figure 21.  Location data.  Maybe.

The database, as with the others, has a single table, cache_table. The column of interest is response_data, and it, too, contains protobuf BLOB data. The timestamps associated with the BLOB entries in the columns write_ms and access_ms do not reflect the accurate time of the captured location data, so they can not be replied upon. After saving the protobuf data out, I decoded it with protoc. See Figures 22 and 23.

Figure 22Figure 22.  Top of the protobuf BLOB. Figure 23Figure 23.  Bottom of the protobuf BLOB.

Figures 22 and 23 are the location data for the running workout. Each part of the protobuf has the same field tags. In each part the field tags one (1) and two (2) contain the latitude (red box) and longitude (blue box) values, respectively, both stored in hexadecimal as double values. The green box (field tag 5) contains the Unixepoch timestamp for the lat/long values. I have not been able to determine what field tags 3 and 4 represent, but if I do, I will update the blog post accordingly.

While I only highlighted the first (Figure 22) and last (Figure 23) set of values, the BLOB contained a substantial amount of lat/long values. If all of the lat/long values were converted an examiner could map out the entire route of the workout, along with the timestamps for each mapped point.

Full disclosure: I stared at this protobuf data for quite a while and could not determine how the hex values translated into location data. I had been using tools that made assumptions about the hex values and went down several dead ends. I ended up getting some inspiration from one of Alexis Brignoni’s instructional Python videos (thank you, sir!) and discovered the values were doubles, as were other metric values that had been giving me some trouble.  Always remember to go back verify your tool output as it could give you errant output.

A few notes about this location data. Location data is captured during one of two instances: during an active, user-initiated workout that requires it (e.g., an outdoor run, outdoor walk, biking, etc.) or when Google Fit detects what it *thinks* is a workout. The second instance is tricky because, based on my testing, there is a caveat with it. I was not able to get the TicWatch E3 to capture location data during a detected (non-user initiated) workout, For example, I walked to my mailbox (about a half a mile away from my home) and left the phone at the house while wearing the TicWatch E3. A workout was detected and recorded (without my knowledge), but there was no location data associated with it. This may be a symptom of the watch and not of Google Fit, but in order to test that I would need to have more watches, and that’s just not happening.

The phone, however, did capture location data for detected workouts when I enabled the “Use your location” setting. See Figure 24.

Figure 24Figure 24.  A non-user initiated, auto-detected walk…with location data.

In Figure 24, I had taken the TicWatch E3 off, slipped the phone into my back pocket, and walked around for a while. I did not, however, walk for two (2) hours. It seems that Google Fit kept capturing data after I returned home and was walking around the house. In addition to the testing, I had instances when Google Fit logged a walk (with location data) when I walked from my parked car to the front door of a store (approximately 50 yards), and while I was merely walking around in my home. I did not know this data was logged until after I looked in the Google Fit Journal. The point is that a user may have activity and location data logged without their knowledge, and that examiners should look at this database (and the others described below) as it could be extremely helpful. This is, of course, is when Google Fit is allowed to use the phone’s location. I repeated similar tests when Google Fit was not permitted to use my location and the activities were logged, but location data was not associated with any of the sessions.

Also note, in Figure 24, that the locations do not appear to be completely accurate, as I was walking on the roads seen on the map.

The second file where location data is stored is SqliteKeyValueCache:MinimapLocations.db. This database is structured just like the previous one; it has a single table cache_table. See Figure 25.

Figure 25Figure 25.  More cache_table entries.

As with the previous database, the timestamps in the columns write_ms and access_ms are not accurate when it comes to the location data contained in the table. The column request_data contains the identifier from the journal_database.db (Figure 18 – table journal_entries, column id). The location data is stored in protobuf BLOBs in the column response_data. See Figures 26 and 27.

Figure 26Figure 26.  Top of the protobuf BLOB. Figure 27Figure 27.  Bottom of the protobuf BLOB.

The data in the BLOBs contain many of the same lat/long and timestamp values as seen in the BLOBs from SqliteKeyValueCache:SessionLocation.db, but it is less granular (i.e., there are less lat/long values). The field tags (1, 2, and 5) are the same, too. Even with less values, an examiner could still get a good idea of the location of the user at a given point in time.

The third place an examiner may find location data is in the database location_database.db. Notice I said *may find* because I have only had one (1) instance in which this database held any data, and that was from my public Android 12 image. I suspect this database may be used as temporary storage, but without further testing I can not be for certain.

A fourth location an examiner will find location data is the file PreviousLocationPoint.pb. As the file extension suggests, this file is protobuf and does contain one set of lat/long values that are, again, double values stored in hexadecimal. I have not been able to determine what the lat/long values in this file represent, but will update the blog post if I do.

Snoozin’

Google Fit is not able to track sleep natively. A user can set a sleep schedule, but that is about it. A user can not even manually add a sleep session to the journal. So, a third party app is needed to track sleep. For testing I used Sleep As Android (v. 20220118) and loaded it on to the TicWatch E3. Loading Sleep As Android on to the watch allowed me to capture heart rate and sleep stage information. Heart rate is fairly simple, but I will discuss sleep stages shortly. For now, see Figures 28 and 29 for how sleep sessions look to a user.

Figure 28Figure 28.  Google Fit sleep information. Figure 29Figure 29.  Google Fit sleep information (part 2).  Note the sleep stages in the middle of the screen.

Google Fit stores sleep data in the file SqliteKeyValueCache:SleepSegmentsCache.db. As with the other similarly-named database files, this also has a single table, cache_table, that keeps its data in protobuf BLOBs in the response_data column. See Figure 30.

Figure 30Figure 30.  Sleep data.

Also as before, the timestamps in the write_ms and access_ms columns should not be used. Figures 31 and 32 show the beginning and end of the decoded protobuf data for the sleep session seen in Figures 28 and 29.

Figure 31Figure 31.  Top of the decoded protobuf. Figure 32Figure 32.  Bottom of the decoded protobuf.

The contents of the protobuf document when a user is in particular sleep stages. Google Fit tracks the following stages of sleep:

Awake = 1
Light = 4
Deep = 5
REM = 6

There are other values for sleep, too, which can be found in the developer documentation. The ability to track sleep stages, obviously, is dependent on the app used and the associated tracker (if one was used). An examiner can see when a user enters (red boxes in Figures 31 and 32) and exits (blue boxes in Figures 31 and 32) the different sleep stages (green boxes in Figures 31 and 32) throughout a sleep session. Obviously, an examiner will need to assess the accuracy of this data.

As previously mentioned, a user can set a sleep schedule in Google Fit, and examiners can find the schedule in the file SleepSchedule.pb. See Figure 33.

Figure 33Figure 33.  My test sleep schedule.  22:30 – 06:30.

Heart Rate

With workout, location, and sleep data being scattered across multiple databases, it should be no surprise that heart rate data is stored in yet another database. The file SqliteKeyValueCache:SmoothedMetricSampleSummariesCache.db contains heart rate data for the various activities recorded by Google Fit. As with the other similarly named databases, this one has a single table of interest: cache_table. See Figure 34.

Figure 34Figure 34.  Heart rate data in BLOBs.

As with the other databases, the timestamps in the write_ms and access_ms can not be relied upon. Also the same, the column response_data contains the protobuf with the heart rate data. See Figure 35.

Figure 35Figure 35.  Heart rate data from a sleep session.

Figure 35 shows heart rate data captured during a sleep session. The protobuf is broken up into sections, with each section and respective field tags containing the same type of data. In the red box there are two timestamps which represent the start (1) and end (2) time being measured (a segment). The blue box represents the lowest heart rate measurement during the segment, and the yellow box represents the highest heart rate measurement of the segment. Both of the heart rate values are stored as doubles. The green box represents the app that contributed the heart rate data. The data in the purple box is interesting. I initially thought it would be the average heart rate during the segment, but it was not. It appears to be a heart rate value at a particular point in the segment. I say “particular” because the timestamp is usually the beginning of the segment, but there are occasionally values that correspond to the end of the segment.

Figure 36 shows another entry from SqliteKeyValueCache:SmoothedMetricSampleSummariesCache.db that contains heart rate data from a workout session (running).

Figure 36Figure 36.  Heart rate data from a workout (running) session.

The Cool Down

Google Fit is a microcosm of how data is stored in Google; it is scattered all over the place. For examiners, the trick is knowing where to look, and the data they may find is dependent on the hardware being used to track the activity.  Examiners should also remember that Google Fit will log a user’s activity on the sneak if it’s allowed to do so, which could be extremely beneificial.   Even if examiners do not find activity data or find very little of it, a legal process to Google could reveal more activity data than is present on the phone, which is never a bad thing.

I do have another part of this series planned, but I am waiting on the hardware to arrive.  Hopefully, it doesn’t keep me from going another year without another installment in this series.  Stay tuned.

Article Link: The State of Android Health Data (Part 2) – Google Fit – The Binary Hick