Last week, I took time off from work and attended the Future of Privacy Forum and the Stanford Center for Internet & Society’s “Big Data and Privacy: Making Ends Meet” workshop in Washington D.C on September 10th. I have a two decade tradition upon arriving in Washington DC of doing a Forest Gump like jog for as long as my schedule and joints will allow. Late Monday afternoon before the conference I continued my DC jogging tradition. I left from the Washington Hilton and managed to go on a 10 + mile jog past the White House, the Washington Monument, the Jefferson Memorial, across the Potomac and then back to the Hilton. What struck me on my run was the grid like structure built around the Washington Monument (pictured above). This monument provides a good bookend for me to describe the stark contrast between the opening panels of privacy experts and the lunchtime keynote delivered by Rayid Ghani, former Chief Scientist for Obama for America 2012.
Held at the Microsoft Innovation and Policy Center, moderators lead panels of privacy experts to share their perspectives on the privacy implications of Big Data (hashtag #BigPrivacy). Many of the panelists submitted papers in advance of the workshop. As part of my LL.M. in IP and Tech Law studies at Washington University School of Law, I co-authored and submitted a paper with Professor Neil Richards of Washington University called The Three Paradoxes of Big Data. Stanford Law Review Online published our paper with some others on September 3rd as part of a Symposium Issue. All of the papers submitted in advance of the workshop are published here on the Future of Privacy Forum website.
The first panel moderated by Jules Polonetsky, Co-Chair of the Future of Privacy Forum, was called Framing Big Data and Privacy. Professor Richards joked early on saying:
“There is mystical thinking that infuses the thinking around big data. Replace the word big data with magic – if we had magic and wizards we would have to regulate them.”
Professor Richards summarized some of the arguments we made in our paper saying, “As a society if we are going to making important life changing decisions based on algorithmic processes then we need to regulate and have transparency.” Professor Richards also described the need in this rapidly changing environment for the development of data ethics and the protection of ‘intellectual privacy’ rights for fundamental intellectual activity such as reading.
Professor Deirdre Milligan, Assistant Professor of School of Information and Co-Director, Berkeley Center for Law and Technology, framed the debate more widely than privacy saying, “we have to get past man versus machine discussions and look at the ‘socio-techno’ systems and realize that people wrote those algorithms.” “Ethical issues with big data,” Said Professor Milligan, “go far beyond privacy.” She continued saying that “if the policy conversation is not broadened in several ways then decisional autonomy is at stake.”
Eric Jones, Policy Director and Assistant Attorney General to Lisa Madgian, Attorney General of the State of Illinois, provided a legislative perspective on Big Data. He described his previous role working for Senate Rockefeller on the Senate Commerce Committee and his new role as Policy Director with Attorney General Madigan. Jones shared that he was focused on providing general oversight on consumer protection issues for Senator Rockefeller until they started to see that “technology was having huge impact across the gamet.” Commenting on the federal perspective, Jones said “Congress as a whole leads to not getting in the way of innovation. They don’t want to do something that will prohibit innovation. Because of that you see congress focusing on specific harms and fixing those.” Now working for Attorney General Madigan, Jones said that he is focusing on opening some investigations into data brokers not because there are problems necessarily but “to ask the right questions.”
Natsha Singer of The New York Times moderated a lively second panel called the Social Ramifications of Big Data. Professor Evan Selinger, Associate Professor at Rochester Institute of Technology, shared his experiences teaching a privacy law class for science and engineering students. Professor Selinger said that at the beginning of his class students did not have privacy concerns. As he started to teach them hands on applications of big data, however, he said “you could see a change happen.” The students were able to see first hand how “seemingly innocuous information could become harmful.” Professor Selinger continued saying, “We are so used to thinking about big data in a big organizational way that we are not yet fully to think about what is going to happen about individuals.”
Karen Levy, Ph.D. Candidate in the Department of Sociology at Princeton University, shared a fascinating perspective gained from studying long haul truck drivers monitored by GPS. Levy chose to study truck drivers for her PhD as part of a larger inquiry into the impact of big data on relationships. Levy said “We should think about top down institutional collection but we should also think about smaller data practices in our relationships.” Levy is interested in looking at social domains in which data is being applied in relationships such as family. She observed that there is a proliferation of tools to track teenagers that “were not around when I was a teenager.” Levy’s studies are showing that monitoring products used across friendship, employment and family relationships have the “potential to change trust relationships, control relationships and change accountability.”
True to their titles, these opening panels provided big picture privacy and social perspectives on Big Data. It was here that Rayid Ghani, former Chief Scientist for Obama for America 2012, took the podium and quite literally stole the show.
Introduced by Chris Wolf, Co-Chair of the Future of Privacy Forum and partner at Hogan Lovells LLP, Ghani is now at the Computational Institute and the Harris School of Public Policy at the University of Chicago and co-founder of Edgeflip. Ghani opened saying it was an “interesting morning hearing from people I typically don’t talk to.” Ghani shared that he normally heard from other computer scientists and that privacy is not talked about without a trade off. Put another way, privacy is a constraint for data scientists. Ghani said, “I don’t know much about privacy and I don’t care that much about privacy because I don’t care much about it.” Qualifying his statement, Ghani shared that, like many of us, he gives up information because “he is lazy and he wants to connect.” Ghani then commented on the over used term “big data” itself saying “no one in the computational world talks about big data.” He dismissed the term big data as one that vendors had come up with to sell more. He also observed that the previous sessions uttered the term big data more than he had ever witnessed.
Ghani then proceeded to deliver an insightful talk on data science and the role it played in the 2012 Obama campaign. Ghani first observed that nothing fundamentally has changed in the past ten years in data analysis. That said, Ghani shared four ways in which access to more data is changing data science predictions:
- Better Predictions: “Most people use data to make predictions … When you have more data, the implications are that you can make finer grained predictions.”
- Earlier Predictions: “We can make these predictions much earlier than we used to.”
- More Accurate Predictions: “The goal is better than random, not 100% accuracy … Things that they are showing about Axciom about big data is not data about you, it is inferences about you … People not in the big data world think about this as very deterministic. There is no this or that, it is a continuum. More accurate means we can really do something…”
- Reduce Risk Of Taking Certain Action: “Work the Obama campaign did was decisive in winning elections. We won for a lot of different reasons. What better analytics and data meant is that we reduced the risk of losing. What we did was increase the probability of winning [by predicting on] election day an 88% likely to win instead of 64% likely to win.”
Ghani then described the important role of experimentation in driving actions and decisions and that privacy reactions themselves can become part of the experiments. Ghani said:
When on the Obama campaign we struggled every day about what we could do and not do; what would be perceived as privacy violation even though it would not be a privacy violation. For example, recommendations on which friends you should recommend to get out to vote … We then started to send users email.
Ghani walked thru how they added additional features to the emails they sent such as referencing names in subject lines and adding profile pictures from authorized Facebook friend lists. Netting out his discussion on experimentation, Ghani said, “for every additional personalization the response rate doubled.”
Ghani said their were big arguments internally about how many emails to send so they would continue to run experiments. By sending emails “you are asking people what they want, not a survey, but what do they actually respond to. Instead of hypothesizing you can do it. When people start unsubscribing, then you can adjust.”
In the most fascinating part of his talk, Ghani outlined how the Obama campaign gathered voting data and plotted the electorate on a grid. The campaign did not use private data, it used public voting records. Ghani shared later in a response to a question from the audience, that they used voting records because they are the best predictor of how a voter will vote. Using voter records, the campaign could predict three things on every voter in a swing state:
- How likely you are to support Obama.
- How likely you are to be persuaded to support Obama.
- How likely you are to vote.
With these simple predictions, the Obama campaign could then plot every voter in a swing state into a grid. Ghani descrbed the four quadrants of the grid and the corresponding action the campaign would take in each grid:
- People who are not likely to vote and not supporting Obama: “Too expensive to focus on.”
- People who are unlikely to vote but have a high likelihood to vote for Obama: “Focus on getting those people to vote.”
- People who are not supportive of Obama but have a high likelihood of voting: “Focus on small sliver of percentage that are persuadable.” Ghani shared that it is hard to identify these people because undecided voters typically translate into not telling you how they will vote. Therefore, Ghani shared how the campaign ran experiments to find a subset of persuadable people in order to develop models to rank persuadability. The campaign would have volunteers “talk to people about Obama’s policies and then poll again and figure out what kind of people increase their support as result of this persuasion to then apply to everyone else in the country and how likely they are to be persuaded.” The campaign would then create a ranking of the most persuadable to least persuadable and have volunteers go top to bottom on the list.
- People who are likely to vote and vote for Obama: “Use this segment to really expand reach.” Referring to the earlier personalized emails, Ghani described that the focus for this group was to give them as many tools as possible to call the right people in the right bucket.
I don’t want to know what you know about me. I want to know what you predict about me. You can infer a lot more or less about data, what is important is what they predict about.
The contrast between Ghani and the first two privacy panels highlighted the need for many more workshops between policy makers and technologists. Ghani by his own opening admission did not seem to hear the privacy panelists and I wonder if the privacy attendees had the technical aptitude to understand Ghani. Many more workshops like the one hosted by the Future of Privacy Forum and the Stanford Center for Internet and Society last week are in order. At a time when the potential for data science applications are not only revolutionary but in many cases required, when the technology is changing rapidly and its use already being rapidly experimented with and applied across all facets of life, we need grounded discussion about the way forward and the corresponding policies to guide us.
And now I come back to the Washington Monument surrounded by a protective grid (pictured above). The grid is there to allow repairs to the iconic monument after a 5.8 magnitude earthquake damaged the structure on August 23, 2011. I find the picture of this structure a fitting backdrop for the earthquake going on in Washington right now around the sharing and use of data. This most recent data earthquake originated with the tragic events of September 11th, 2001. Laws were passed to protect and defend the United States from the asymmetric threat of terrorism. This same body of law is now likely to be needed for new threats in cyberspace. Perhaps we should leave the grid around the monument to protect and defend it from the next earthquake much like we keep the laws to protect and defend us from these new threats. Perhaps we should keep the grid around the monument more importantly to remind us that these laws are still in effect.
As part of my continued LLM in Intellectual Property Law studies at Washington University School of Law in St Louis, I co-authored an article with Professor Neil Richards called The Three Paradoxes of Big Data which was recently published in Stanford Law Review Online. In the fall of 2012, I took a seminar called Civil Liberties in Cyberspace taught by Professor Richards. The seminar consisted of reading a number of books, writing reviews along the way and then submitting a final paper reviewing two of the books. There were so many good books to chose from that I chose to review three books:
- Jonathan L. Zittrain’s The Future Of The Internet – And How To Stop It (2008)
- Lawrence Lessig’s Code: And Other Laws of Cyberspace, Version 2.0 (2006), and
- Helen Nissenbaum’s Privacy In Context: Technology, Policy, and the Integrity of Social Life (2010).
Below is an edited version of my final paper for my blog: Future Code Context.
The Future Of The Internet depends on Code: And Other Laws of Cyberspace protecting Privacy In Context . Such is my mash up of the reading assignments for my Civil Liberties in Cyberspace Seminar. More than a convenient combination, I believe these three (not two) books begin to provide the recipe for protecting civil liberties online in the 21st Century.
We take for granted the hard fought stable of civil liberties each one of us possess each day we awake. From Magna Carta, to Printing Press, to Renaissance, to Reformation, to French and American Revolutions, to two World Wars and one Cold War, mankind at large has seen a steady march of progress in fits and starts. There have been devastating travesties, horrors and setbacks along this journey but there has been astonishing progress as well. The iron curtain has fallen and the few dark corners of the world that remain cannot help escape the self evident power of freely expressed and expanding civil liberties. Yet we take for granted that these hard fought liberties will continue in Cyberspace.
Such is the backdrop for the opening of Lawrence Lessig’s book Code: And Other Laws of Cyberspace, Version 2.0. Lessig’s first chapter entitled “code is law” starts comparing the developing regulatory norms of cyberspace to the post communist regime changes of the early 1990s. The exhaustion of communism lead at first to a period of unmanaged freedom where security evaporated. New norms needed to be established to enable effective governments to balance unhealthy freedoms against the needs of running a modern society. These norms are still developing as former Soviet bloc countries rediscover their free market and democratic norms stifled by a lost century to communism.
Lessig describes “the change from a cyberspace of anarchy to a cyberspace of control.” The ‘first generation’ of cyberspace had a state of nature feeling with open domain name registrations, open protocols and benevolent researchers as regulators. The second generation focused on commerce with the .com boom followed by web 2.0. The third generation, according to Lessig, risks a dire outcome where “left to itself, cyberspace will become a perfect tool of control.”
Lessig’s point in starting with the post soviet period of uncertainty is to emphasize the balance that is required not only in the real world but also in cyberspace. Lessig writes:
Liberty in cyberspace will not come from the absence of the state. Liberty there, as anywhere, will come from a state of a certain kind. We build a world where freedom can flourish not by removing from society any self-conscious control, but by setting it in a place where a particular kind of self conscious control survives.
Without referencing him in depth, Lessig paraphrases Jean-Jacques Rousseau. Rousseau believed man possessed natural or innate morality but required civil society to become fully realized. Rousseau believed that an ideal state for man existed somewhere between brute animal and the decadent evils enabled by society. In The Social Contract, Rousseau writes:
The problem is to find a form of association which will defend and protect with the whole common force the person and goods of each associate, and in which each, while uniting himself with all, may still obey himself alone, and remain as free as before. This is the fundamental problem of which the Social Contract provides the solution.
Man can achieve justice, virtue and a higher standard of living with a ‘social contract’, but these societal virtues also increase the risk of deteriorating moral influence of pride, vanity, jealousy and fear.
I add Rousseau to my critique of Lessig’s book because it helps me frame the critical point that Lessig is trying to convey. Code risks tilting the very balance of modern day social contracts. Increasingly in today’s computerized world, code is law and law is code. Lessig writes:
We can build, or architect, or code cyberspace to protect values that we believe are fundamental. Or we can build, or architect, or code cyberspace to allow those values to disappear.
Therefore, Lessig correctly expresses concern that regulators now have an omnipresent regulatory tool that should give everyone pause to stop and think.
Lessig describes both the substantive and structural values at stake in this rapid change. Lessig also speculates with a tone of alarm that the United States is ripe to respond with undue or irrational passion to increasing cyberspace challenges. He rightly emphasizes “that there are choices to be made about how this network evolves.” We seem one pretext away from Big Brother’s telescreens.
I don’t know about the outcome, but the framing again paraphrases the balance Rousseau described in the Social Contract. Describing a civil state, Rousseau wrote:
The passage from the state of nature to the civil state produces a very remarkable change in man, by substituting justice for instinct in his conduct, and giving his actions the morality they had formerly lacked. Then only, when the voice of duty takes the place of physical impulses and right of appetite, does man, who so far had considered only himself, find that he is forced to act on different principles, and to consult reason before listening to his inclinations.
The key to the civil state is that the “voice of duty” is not corrupted. Lessig seemed consigned to defeat back in 2005-6 when he completed Code 2.0 writing:
There is much to be proud of in our history and traditions. But the government we now have is a failure. Nothing important should be trusted to its control, even though everything important is.
I think a couple of presidential elections and a great recession have sobered us up a fair amount since 2006 when Lessig wrote the above. We are certainly still in need of much improvement including especially the legislative process. Perhaps this explains Lessig’s departure from cyberspace for the halls of congress in his most recent book, Republic, Lost: How Money Corrupts Congress — and a Plan to Stop It.
Lessig’s emphasis on choice only gets us part of the way. It is here that Jonathan L. Zittrain’s The Future of The Internet contributes valuable ideas to the discussion. Zittrain opens his book describing how the Internet overtook proprietary networks such as AOL, Compuserve and Prodigy. Zittrain describes how generative platforms like the PC and the Internet overtook their proprietary, non generative alternatives by allowing a much wider base of innovation. Zittrain believes that this historic dominance of generative platforms with the inherent benefit of open participation is now coming to a close. “The future is not one of generative PCs attached to a generative network,” writes Zittrain. “It is instead one of sterile appliances tethered to a network of control.”
Zittrain recounts how the PC and internet were designed to enable unexpected outcomes. Zittrain writes:
The essence – and genius – of separating software creation from hardware construction is that the decoupling enables a computer to be acquired for one purpose and then used to preform new and different tasks without requiring the equivalent of a visit to the mechanic’s shop.
With this design, PCs enabled the support of an endless variety of programs and programmers which quickly overwhelmed bundled word processors and other appliance alternatives.
After PCs, Zittrain describes how the network can also be “more or less generative” . It is here that Zittrain frames the generative vs appliance tradeoff. An appliance managed by a single vendor (ie the iPhone) “can work more smoothly because there is only one cook over the stew, and it can be optimized to a particular perceived purpose.” Whereas a more generative device like a PC “makes innovation easier and produces a broader range of applications.”
The internet tilted generative from its earliest founding. The internet’s designers believed that features should reside at the endpoints rather than in the middle. Network participants were presumed to have the network’s best interests in mind. Zittrain writes,
Generative systems are built on the notion that they are never fully complete, that they have many uses yet to be conceived of, and that the public can be trusted to invent and share good uses.
Security and quality of service took a back seat to flexibility. If something was broken, the “procrastination principle” said that problems cold be solved later.
This worked for a time until a Cornell University graduate student unwittingly unleashed a worm and demonstrated the vulnerability of a generative internet in 1988. This first exposure during the ‘state of nature’ phase of the internet was forgiven. Administrators took corrective action. The student hacker went on to Harvard and then a dot-com startup to make a fortune. As time progressed, however, the generativity of the internet combined with the generativity of PCs caused an explosion of security incidents. “The idea of a Net-wide set of ethics,” writes Zittrain, “has evaporated as the network has become so ubiquitous. ” “A massive number of always-on powerful PCs,” continues Zittrain, “with high-bandwidth connections to the Internet and run by unskilled users is a phenomenon new to the twenty-first century. ” The innocent state of nature internet is over.
While acknowledging the security challenges, Zittrain extolls the virtues of a generative platform. To Zittrain, generative platforms are to technology innovation like the muse is to the writer. Generativity fills in the gaps. Zittrain writes:
Generative systems allow users at large to try their hands at implementing and distributing new uses, and to fill a crucial gap that is created when innovation is undertaken only in a profit-making model, much less one in which large firms dominate.
Generative platforms not only allow innovation, they allow expression. Entirely new, unexpected business models and technologies result from generative platforms. Generativity fosters human being instead of human doing.
Yet how to bring security without destroying all of the benefits of generativity? “The paradox of generativity,” writes Zittrain, “is that with an openness to unanticipated change, we can end up in bad – and non-generative – waters. ” The incomplete design and open innovation of generative platforms “is both the cause of their success and the instrument of their forthcoming failure. ” Zittrain’s concern is that over reaction to security threats will cause a shift away from generative platforms to more restrictive appliance models by users, commercial providers and governments. Zittrain frets that the proverbial baby of generativity is going to be thrown out with the bath water of cyber threats.
Equally if not more impactful than loss of innovation and expression, however, Zittrain explains how a shift from generativitiy to appliances will increase regulability of the Internet. Zittrain writes:
A shift to tethered appliances also entails a sea change in the regulability of the Internet. With tethered appliances, the dangers of excess come not from rogue third-party code, but from the much more predictable interventions by regulators into the devices themselves, and in turn into the ways that people can use the appliances.
Zittrain goes on to explain and defend how the current trajectory of the computer and network towards tethered appliances “is on balance a bad one.” Zittrain picks up where Lessig left off on trusted systems to explain how generativity counteracted regulability because users cracked the constraints or the market rejected. Now however, the rapid adoption of essentially uncrackable tethered appliances (ie iphones) by end users is leading toward a different regime: perfect enforcement.
Zittrain’s concern is well placed. The mere possibility of perfect enforcement, let alone its actual implementation, risks a profound revision of Rousseau’s social contract. Law itself has been generative for centuries. The realities of time, distance, cost and human involvement in enforcement are a kind of generativity. Laws could be disobeyed in protest or in outright disregard so they in turn could be improved. Many civil liberties are a result of this generativity. Whether by mob action as we witnessed recently in the middle east or by individual action, civil liberties themselves emerge from a legal platform which is generative. A perfect enforcement capability with tethered appliances risks negatively impacting technology AND legal generativity. A transit system requiring payment by mobile phone could prevent the modern day Rosa Parks from even making it to the seat she refused to give up. A regime in control of the network can increasingly target dissidents to more ruthlessly hold onto power as we see in Iran and Syria. Therefore, regulators must stop and ask questions or risk unbalancing centuries of implicit generativity in society itself.
It is here that Helen Nissenbaum’s book Privacy In Context is helpful. While writing in a fairly incomprehensible fashion, Nissenbaum’s book correctly casts the debate around context. After cataloging technology and other approaches to privacy, Nissenbaum spends the balance of here book on what she calls contextual integrity.
Nissenbaum defines contexts as “structured social settings characterized by canonical activities, roles, relationships, power structures, norms (or rules), and internal values (goals, ends, purposes). “ Nissenbaum writes, “We can think of contextual integrity as a metric, preserved when informational norms within a context are respected and violated when they are contravened. ” Nissenbaum continues:
The central thesis of the framework of contextual integrity is that what bothers people, what we see as dangerous, threatening, disturbing, and annoying, what makes us indignant, resistant, unsettled, and outraged in our experience of contemporary systems and practices of information gathering, aggregation, analysis, and dissemination is not that they diminish our control and pierce our secrecy, but that they transgress context-relative informational norms.
I think of contextual integrity as a penumbra of individual rights and norms that adjust in real time relation to people, places and things. Talking to my doctor on a cell phone in a public mall, is still a private conversation. I expect contextual integrity.
Nissenbaum’s contextual integrity framework offers several benefits. Contextual integrity provides an extensible framework to apply to the problem areas today around technology adoption as well as the unanticipated problems of tomorrow. In a rapidly changing world of ever increasing technology adoption, future proofing a regulatory framework is critical. Contextual integrity also provides a balanced approach requiring an appropriate flow of information, not just restricted access. In an age of genuine cyber threats, appropriate flow of information is table stakes as well. Finally, contextual integrity automatically adjusts to the global world we live in today. The contextual integrity I can expect in one country or region will naturally differ.
Bringing it all together again: The Future of the Internet hinges on Code, and Other Laws of Cyberspace protecting Privacy in Context. I am afraid, however, that protecting civil liberties online is much like the climate change debate, only in earlier stages. Should we prevent climate change? What do you mean climate change? Don’t you mean global warming? Man can’t influence such a large body as the earth? The sun is causing it. On and on. We still debate if there is a problem with the climate in the first place. We play a similar game of delay and denial with rights in cyberspace right now.
What I feel is missing from all of these books is a deeper focus on the immediate change that a commingling of cyberspace and civil liberties present for current and future generations. “Cyberspace” broadly defined to include internet, cloud, mobile, big data and every other digital signal moving and computing around us may more profoundly impact our way of life than any other invention(s) in human history combined. For example, we have made machines in the past decade that are starting to pervasively connect with nearly every minute of our daily lives. They are called smartphones.
Think for a moment how many times you touch your phone in a given day. When you wake up, when you go to bed, when you go to the bathroom, when you shop , when you drive (hopefully not), when you are with your kids. Increasingly you are your phone and your phone is you. You are not alone. According to an infamous Mobile Marketing Association of Asia statistic, more humans on earth have mobile phones (4.8B) than own a toothbrush (4.2B) .
While the numbers are astonishing, what is more astonishing is the speed of adoption. Call it Moores law or call it Kurzweil’s Law of Accelerating Returns, “the pace of change of our human-created technology is accelerating and its powers are expanding at an exponential pace.” It is not just mobile phones. Tablets, software, cameras and every manner of digital device is experiencing exponential growth. Even the methodologies used in programming today are designed to enable acceleration. Where programs used to be built with “Waterfall” methodologies, they are today increasingly designed using agile methodologies with rapid iteration. While certainly cautioning on what is at stake, these books mostly miss this exponential physical and logical acceleration in digital capabilities. Given the stakes and the pace of change, these books and others like them need to articulate a greater sense of urgency.
Yet these books are still important works to consider in the debate of protecting civil liberties online. Toward the end of his book, Lessig describes how a democracy should code writing:
There is a magic in a process where reasons count – not where experts rule or where only smart people have the vote, but where power is set in the face of reason. The magic is in a process where citizens give reasons and understand that power is constrained by these reasons.
There is indeed a magic in a process where reason counts. Such is the magical process of an evolving social contract where the “voice of duty” in each of us keeps the future of the internet and contextual integrity top of mind as we code.