More thoughts on Google’s tracking abilities

It all comes down to the cookie.

The Wall Street Journal recently began a series of articles called What They Know, detailing the different pieces of data that online marketing companies have about people as they traverse the web. None of this is really new, especially not to me, since I work in that industry. But I was surprised at some of the data that was present in the cookies right in plaintext:

Now, I don’t know if the above image of a cookie was presented as it was because the reporters didn’t realize that all that was needed to “decode” that cookie was a couple of runs through PHP’s urldecode() and those %25255Es would be converted from their hexcodes to plain old ASCII – %25255E0 -> %255E0 -> %5E0 -> ^0 (caret). Maybe they didn’t know, or maybe they knew but they left it all computery so it looked “scarier” to readers… that green text on black background is usually reserved for movies like The Matrix. Anyway, like I said, what was surprising to me wasn’t that there was that much data being collected, but rather that the data was right there in the body of the cookie, readable by anyone. Even a simple base64_encode would have hidden the contents of the cookie from the casual snooper.

For a while I’ve been thinking about Google’s vast troves of data that go far, far beyond what the average marketer knows about the average web user. Let’s assume you’re… me. You use Gmail, Google, and YouTube on a pretty frequent basis. Google has single sign-on — as it should — so to use any of these services you can (and in many cases, have to) be logged in with your Google Account. This is logical and convenient for the user, but it unlocks huge amounts of information about you to Google. By having you sign in to any of their services, Google’s ability to track you online transcends cookies.

Cookies are small bits of data set by the server on your browser to allow information to persist between sessions. Since it’s set in the browser, it’s implicitly impossible for cookies set in one browser to be used in another browser. This means that if you start Firefox and click around the internet for a while, you’ll accumulate some cookies. If you then exit Firefox and start Safari, and click around to those same sites, you’ll get completely different cookies than those you got in Firefox — from a “tracking” perspective, the person using Firefox and the person using Safari are different people (even though they both happen to be you)1. Also, because cookies are tied to browsers, this implies that cookies set on one computer are bound to that browser on that computer — i.e., cookies in Firefox on computer A have no bearing on what happens in Firefox (or any other browser) on computer B.

Single sign-on knocks down these implicit privacy walls. Assume, again, that you’re me, and you have a Linux laptop at work. At home you have a Linux desktop, a Mac mini hooked up to the TV in the living room, and a Windows laptop. You also have an iPhone. Single sign-on enables Google to track what you’re doing across all of these devices. It’s really quite simple: on each machine you use, if you want to read your email (Gmail) you log in with your Google Account. At that point, Google knows that it’s you using the browser. The value inside the cookie they set in your particular browser may differ, but they know that you’re you. They know what you’re searching for in Google; where you go (by IP address; or, if you allow it, by GPS on most modern smart phones — Google’s Latitude service lets you relay your GPS coordinates to your friends), what kind of email you receive, who you correspond with. And let’s not forget that Google has plastered the internet with ads – over 90% of their revenue comes from advertising, and they bought DoubleClick a few years ago, so any time you go to a site with Google ads on it (which is pretty much all of them), they know it. They own YouTube, so they know every video you’ve watched on YouTube, which ones you’ve “Liked” and which ones you’ve “Favorited.” And, as I mentioned in my previous crazy-guy post, Google is amassing a huge facial-recognition database, so they’ll know everything about you – interests, income, travel habits, friends, what you look like, likes & dislikes. They can probably give a pretty good guess as to where you home is and where your office is just by seeing that between 9:00 AM and 6:00 PM you commonly access the internet from IP 1.2.3.4 and the rest of the time you usually come from IP 2.3.4.5, and simple IP-geo databases can tell them where those IPs are (admittedly, with widely varying accuracy).

The trove of information they have on the average person is actually frightening. The only thing keeping them from completely exploiting this data (assuming they aren’t, for argument’s sake) is their “Don’t Be Evil” philosophy and the shitstorm of bad press (and, one would assume, legal action) that would ensue if they were to do so. I’m not really convinced they aren’t already using all of this data, probably to make ultra-targeted advertising decisions, which seems relatively benign on the face. But the real risk comes when this all falls into someone else’s hands. Google could get hax0red — it’s already happened. Google could get subpoenaed — I’m sure it’s happened hundreds of times already. A new batch of idiots in the Senate could just redefine terrorism and require all Google’s data be handed over daily.

This isn’t strictly a problem with Google, but there aren’t many companies I can think of that have massive ad platforms that also provide services you’re willing to log in to, and the logging in is what allows them to track you across browsers, across computers, across devices, and ultimately in real life.

Oh well. Whatever. I’m a big hypocrite because I can’t imagine not using Gmail or any of Google’s services that I use daily. Sucks to be me, I guess. Even if you “trust” Google, you may not trust what Google becomes 10 years from now, but by then they already know all about you.

1This isn’t completely accurate, because even without cookies there are pieces of data that will be the same regardless of your browser, for example your IP address, which in general is a pretty good proxy for uniqueness, but I’m just thinking about cookies for now.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: