Guest Post by David Meyerson, @dbmeyerson, a Software Engineer at Microsoft and co-teacher of computer science in Boston Public Schools.
Personal data from 87 million Facebook users in the U.S. was used without those users’ consent to help political consulting firm Cambridge Analytica campaign for President Donald Trump. Cambridge Analytica obtained the user data in 2015, according to contemporaneous reporting from The Guardian. But only more recent reporting from The New York Times has revealed to the public the massive scope of the data shared, how it was collected, and details of how it was used to support Trump’s presidential campaign. Those most recent revelations have caused public outcry aimed at Facebook, led to Mark Zuckerberg testifying before the Senate, and encouraged the public to take a critical look at what happens with our data on social media, how much damage access to our data can cause, and what power we have to control our data.
Cambridge Analytica used user data and machine learning to build what they called psychographic models. The term psychographic comes from “demographic.” The idea is that just as users belong to demographics, with whom they share a general trait like age or gender, they also belong to even smaller groups with whom they share psychological traits like conscientiousness, extraversion or neuroticism. Hidden camera footage published by Britain’s “Channel 4 News” shows Cambridge Analytica salesmen boast that they can predict with great specificity the “hopes and fears” of particular users, and subsequently what kinds of political messages would most likely persuade each user. With some or all of those millions of users’ data, and without those users’ consent, Cambridge Analytica trained an algorithm that they claim targeted voters with ads based on their “hopes and fears.” Political campaigns, including Ted Cruz’s, and major conservative donors like Robert Mercer have invested millions of dollars in Cambridge Analytica’s services. Even if psychographic micro-targeting isn’t effective yet, there are at least some who find it very promising.
This phenomenon has sparked fear and outrage among privacy advocates, liberals, and others opposed to Trump. In its defense, Cambridge Analytica has claimed that micro-targeting is not new. Advertisers, political and otherwise, have been tracking users and showing them micro-targeted ads for years now, with little public controversy. As Cambridge Analytica tweeted, “Obama’s 2008 campaign was famously data-driven, pioneered microtargeting in 2012.”
Obama's 2008 campaign was famously data-driven, pioneered microtargeting in 2012, talking to people specifically based on the issues they care about. 6/8
— Cambridge Analytica (@CamAnalytica) March 17, 2018
It’s true that micro-targeting has been around for several years, and that Obama’s 2008 campaign pioneered micro-targeting in politics. Cambridge Analytica’s response, however, glosses over an important distinction. Unlike other advertisers, who enjoy at least modicum of consent over the user data they collect (maybe the user explicitly consents to give their data to the advertiser by checking a box somewhere, or maybe they implicitly “consent” by googling a certain retailer) Cambridge Analytica obtained their trove of personal data without any consent from tens of millions of Facebook users. Those who are upset have a right to be. Their personal data were raw materials out of which the Trump campaign built a multi-million-dollar political micro-targeting tool. And, out of 87 million unwitting data donors, none were asked for consent.
How Cambridge Analytica Made Off with the Data
From Christopher Wylie, former Cambridge Analytica developer, we know that in 2015, Cambridge Analytica obtained their trove of user data when a relatively small number of users took a personality quiz through a Facebook app called “This is Your Digital Life,” and gave the quiz app permission to use their data. At that time, Facebook’s developer API, by design, allowed the quiz app to access not just the personal data of those users who took the quiz, but also the personal data of all of those users’ friends who had never used the quiz app. Once the friends were included, the relatively small number of users who consented via the quiz blosomed into the 87 million since confirmed by Facebook. Most of those 87 million did nothing but have one or more quiz-takers among their friends.
When this story broke in March of 2018, a Facebook executive defended the company, somewhat bewilderingly, by urging the public to see that the incident wasn’t a “data breach,” but rather that Facebook had allowed Cambridge Analytica access and Cambridge Analytica had merely misused the data that it was allowed access to. It was as if Facebook was stressing there had been no break-in; just that the door had been left open by design and a stranger had walked in and made copies of all your personal documents.
Plaintiffs have filed a few different lawsuits against Facebook in recent weeks. Facebook investors have sued for value lost as the company’s stock tanked following The New York Times story, and users have filed at least one class action suit that claims the company treated their privacy with “absolute disregard,” according to The Verge.
Whether or not Facebook faces consequences in court for this episode, it has been a wake-up call for citizens of the internet. Senators in Washington D.C. asked Zuckerberg whether or not the government should regulate Facebook to ensure data isn’t misused the same way again. The U.S. may implement tighter data-collection restrictions in the coming years. The European Union already has, with its General Data Protection Regulation (GDPR), which takes effect at the end of May. GDPR restricts what data companies can collect from users, and guarantees every user the right to request all of their personal data or have all of it deleted from a given data-collector’s systems.
Data privacy is a civil right. Sir Timothy Berners-Lee, inventor of the world-wide web, has long advocated that users must own their data and be able to find and reclaim it from anyone who holds a copy. A full realization of Berners-Lee’s vision might be a world in which users can, at any time, easily and conveniently see not just every piece of data a company like Facebook has about them (the raw data about where they live, what they like, and who they follow) but also the products that raw data goes in to (like machine learning models and advertising campaigns). In that world, users would actualize their right to know what their data is being used for, and be able to make informed decisions about how they would like to share or protect it. If users or the Facebooks of the world enforce the right to user data and informed consent, the next Cambridge Analytica scandal doesn’t need to happen.