Skip to content

Dataclysm: Our Data, Ourselves


I just finished reading Dataclysm: Who We Are When We Think No One’s Looking. It was written by Christian Rudder, one of the founders of the dating site OkCupid, so you figure he knows a thing or two about the subject.

If you’re unfamiliar with the book, Rudder draws on data pulled from OkCupid,, Twitter, Facebook, and other sources to explore various aspects of ourselves. This involves examinations of how perceived attractiveness affects one’s dating prospects, racial biases when seeking romantic partners, the mechanics of online outrages, and the various ways people describe themselves, held up against what their actions betray as their true selves.

People are complex. One doesn’t need to read a book to know that. Lived human experience proves it every day. But there are a few implications from this book that I’d like to talk about, as I find them particularly fascinating.

The first is our tendency to trust impersonal devices with data that we’d blush at relating to a fellow human being. On paper, we have a well-documented history of not trusting technology. We fret over robots replacing us in our jobs, that technology will make our quality-of-life worse, that it may turn on us, or be far more fallible than we’re told. But at the same time, we will answer deeply personal questions on dating sites, post our darkest thoughts on blogs and social media. Via services owned by billion-dollar companies, we talk, we fight, we fall in love, we break up, we discuss controversy, we harass, we threaten, we laugh, we cry, all the while not thinking too much about the medium we’re using–and who owns and controls it.

There’s no use scaremongering over the issue. That’s not what interests me. Someone else owns almost everything you use to live your digital life. You may own your computer, and possibly your smartphone, but some company owns your connection and logs everything you do with it. Every service you use–Google, Facebook, Twitter, Bing, OkCupid, you name it–is tracking and recording and logging and analyzing. Of course, they are unlikely to use this information against you. They want to make money, not pointlessly persecute random people on the Internet. But what they bank on is that trust: that if you are placed in front of a screen that asks, “What’s on your mind?” you will gladly tell it.

This is not a new phenomenon, for what it’s worth. The ELIZA program, first cooked up in the ‘60s, may be the first type of computer program that users felt a human connection with. Although conceived as a parody of how psychotherapists conduct sessions, it resonated with the people who used it. Users sometimes told their best-kept secrets, and their worst fears. They discussed embarrassing anxieties and other issues they would’ve been extremely reluctant to share with another person–even a real therapist. But the computer–the computer didn’t judge. It offered simple prompts and canned reactions. Users thought they were teasing out a conversation from the machine, but in fact ELIZA did nothing more than mangle its input back into questions and statements. Why are we so quick to trust the machine?

I am not, in fact, suggesting that we shouldn’t, only that it’s fascinating that we will admit to a computer prompt what we might not admit to our closest friends and family members. Google knows when you think you might have a sexually transmitted disease before any of your sexual partners do. With a bit of work, one can tell by analyzing your public social media records whether you’re gay, or atheist, or cheating on your partner. Some of this is due to what Dataclysm calls “breadcrumbs”–implicit information left behind by actions that you don’t intend to offer meaning to others, but which do nonetheless. As an example, your list of Facebook friends, by itself, can reveal a startling amount of information about you, even if little else in your profile is publicly available. We may not think that the act of adding someone as a friend helps tell a computer system what our politics, religious belief, and sexual orientation are, but it does. Every data point helps the systems that crunch all this information learn more and more about you.

It is certainly troubling to think about, in some ways. The NSA and other intelligence agencies collect much of this information, and a lot more. Even cell phone metadata–frequently declared to be “harmless” by the government agencies that collect it–reveals a great deal about you. It can be determined without much difficulty what sorts of people you associate with, what your politics might be, what your movements consist of, what your economic status is, what race you are, whether you are romantically involved with anyone, whether you are employed and what industry you work in, when you might be having financial or legal problems, when you might be ill and what you are suffering from, and on and on and on. All of this can be gathered by taking what phone numbers your phone number has connected with, then looking up the owners and purposes of each number.

The answer is not to forbid collecting data. That ship has sailed. Governments, certainly, should be constrained from collecting (or at least accessing) data as they please. But thus far there has been little accountability in terms of how companies manage data. Even massive breaches in which millions of credit card numbers and bank account details are compromised have failed to resort in any meaningful policy action. It’s as if we are content to live with this as a new status quo–that this age in which data ebbs and flows between dozens or hundreds of computers and companies and networks every day simply prohibits any kind of effective management or protection.

Asking people not to share aspects of their lives online is clearly a non-starter, too. Our Twitters and Instagrams and Facebooks have become extensions of ourselves. They have replaced handwritten letters, phone calls, and to some extent, face-to-face encounters. We can keep up with friends and family members who live hundreds or thousands of miles away. All of this convenience and communication has a cost that, I think, we have yet to reckon with. Dataclysm touched only on a very small amount of that data, but was able to find stark racial biases in OkCupid’s usage patterns, and significant cultural differences between various kinds of users, all by aggregating and analyzing data that users have submitted freely.

I am simultaneously fascinated and concerned by what we might learn in the years to come, about ourselves and each other–the things we communicate that we never even meant to.

Photo by r2hox