GDPR makes it easier to get your data, but that doesn’t mean you’ll understand it

If the numerous tech scandals of recent years have taught us
anything, it’s that tech companies hold a truly terrifying
amount of data about us all. Along with feeling invasive, this
data can be outright dangerous when it falls into the wrong
hands.

Europe’s response to that risk, put in place as part of the
General Data Protection Regulation (GDPR), is the “Right of
Access.” The right says that, when requested, any company
should be prepared to provide you with your personal data. They
should provide it in a way that’s easy for you to read, in a
timely manner, and with enough background information for you
to understand how they got it and how they use it. The thinking
is that once you know what data a company holds about you, you
can use it to make informed decisions about whether you want to
provide it, as well as holding them accountable when they
gather data without your consent.

Apple, Amazon, Facebook, and Google’s data downloaded and
examined

The problem is that companies can often be really stingy about
actually providing this data. After all, if your service is
essentially “forcing consent” (as Google was recently
fined €50 million for doing
), then you might not want your
users to easily see how much personal data you’re collecting.

I decided to test the “Right of Access” offered by four of the
biggest tech companies operating in the EU: Apple, Amazon,
Facebook, and Google. What I found suggested that while you can
certainly get the raw data, actually understanding it is
another matter, which makes it harder to make informed
decisions about your data.

According to the UK data protection regulator, the ICO,
companies must provide all
personal data
— defined as any data that relates to an
identified or identifiable individual — on request. The
information must be provided to the individual in a “concise,
transparent, intelligible and easily accessible form, using
clear and plain language” in a “commonly used electronic
format.” It sounds simple enough, but how did each of the four
tech giants do?

It was easy to download my data in the first place. Both Google
and
Apple’s
data download services let you pick and choose what
data you want to download. Facebook doesn’t, but all three are
easy to find on their respective websites, and it arrives
quickly. Meanwhile, rather than presenting it as an easy option
to find on its site, getting a single link with all of your
Amazon data relies on you digging through the site’s “Contact
Us” page to find the option hidden at the end of the list. Once
I requested it, it took the full 30 days to receive a link to
download my data (the limit imposed by the regulation).

Google’s location-tracking data was particularly hard to
understand

When it actually came time to look at the data I’d received,
however, things got messy. Some files were ambiguously labeled,
while others were stored in formats that tested the limits of
what constitutes “commonly used.” Actually working out what
data I was looking at wasn’t nearly as simple as it should be.

Google’s location-tracking data was particularly hard to
understand. The company has been repeatedly criticized for
tracking Android users, even when
they’ve turned off the main location-tracking option
in the
operating system. Consumer groups across
seven European countries have lodged complaints
with their
data security watchdogs about it, and downloading your data
using GDPR should be a way of checking that a service isn’t
using tricks like these to gather any more data than it should
be. It should be a means of holding companies like Google to
account.


Google has admitted that it
tracks you even if you turn off Location History.
Photo
by Chris Welch / The Verge

But when you actually look at the data, this information is
very difficult to view and understand. All of my location data
from Google was contained within a single 61MB JSON file, and
opening it with Chrome revealed a bewildering array of fields
labeled “timestampMs,” “latitudeE7,” “logitudeE7,” and
estimations about whether I was sitting still or in some kind
of transport (I assume).

I don’t doubt that this is all the location history information
that Google has associated with my account, but without
context, this data is meaningless. It’s a series of numbers
that I’d have to make a serious effort to even begin to
understand and import into another piece of software to
properly parse. If the purpose of GDPR is to allow people to
have more control and understanding of what data is collected
from them, then this part of Google’s download has little to
offer. JSON’s are great if you want to ingest the data into
another system, but they’re less helpful if you want to
evaluate how much data Google has on you and make informed data
privacy decisions.

Google should make more of an effort to explain what this
data is

When it came to other files, it wasn’t even clear what data I
was looking at in the first place. A 4GB HTML file called “My
Activity” located within the “Ads” folder is presumably showing
me something relating to the ad-tracking data that Google has
gathered on me, but there are no annotations or metadata here
to explain it.

These are, by far, the most confusing files out of the entire
data download, and they’re also the most important. They
contain the kinds of personal information that potential
advertisers would kill for, and Google should make more of an
effort to explain what they are. It already provides an Index
HTML file to give you an overview of your data, so why not
include information in there about the contents of each file?

Apple fared better than Google in the way it presented its
data, although there were still problems. First impressions
were very positive, though. The majority of the data Apple
provided was in file types that were easy to read and
understand like CSV, TXT, and JPG, with only a couple of JSON
files to confuse things.

But once you get into these files, there’s still a lot of
information that’s difficult to understand. A file titled,
“Apple ID Account Information” appeared to contain 11 nearly
identical records about my Apple account, all created on
exactly the same date in 2014, with no explanation as to what
they were. Another CSV file with the ambiguous title of “Apps
and Service Analytics” appears to contain an entire list of
every single one of my App Store searches, but it has so many
empty cells that I only noticed it had data in it when I saw
its 6.7MB file size.

Ironically enough, Facebook actually had the most
comprehensible data of the four services

The creepiness of being able to listen to all my Alexa requests
notwithstanding, Amazon did far better with how it presented
its data, although this may just have been because of how
comparatively little it holds about me. For the most part,
files and folders were clearly labeled, although the company
still has some work to do on labeling the contents of its
spreadsheets better.

Ironically enough, Facebook actually had the most
comprehensible data of the four services. For starters, every
single file Facebook gives you is an HTML file. Each is sorted
into its own clearly labeled folder, and an index file gives
you an overview of what each document contains. The files
themselves are clearly laid out and formatted, and browsing
them feels almost like browsing a page on Facebook itself,
albeit one that’s stored entirely locally on your computer.


Facebook’s download includes a
lengthy index file that tells you where to find all of your
information.

It’s still terrifying to see the amount of data Facebook has
stored on you (and that’s not even getting into the instances
of people having found records of all their
old calls and SMS messages
), but at least you’re
well-informed about what exactly this information is, rather
than having to guess based on the contents of each file.

At the end of my experiment, I’m left with just under 138GB of
data across the four services I contacted. I had 1.1GB from
Facebook, 392MB from Amazon, and 254MB from Apple. Although
Google had a massive 72.5GB of data for me to download, this
overwhelmingly consisted of my Google Drive and Google Photos
backups, which came in at 44.3 and 25.7GB, respectively. The
rest of my Google data came in at just 2.5GB.

After attempting to sift through and understand it all, it’s
clear that these companies, and the GDPR regulations that
govern them, have a long way to go if they want to give us real
control over our data. Being able to download it is one thing,
but making it useful means working harder to ensure that what’s
downloaded is easier for the average person to understand.

At a minimum, that means providing a better index to tell you
what data is contained in what file, but it also means
organizing the contents of those files in a way that allows
them to make better sense by themselves.

Leave a Reply

Your email address will not be published. Required fields are marked *