By Shaan Ray
Posted March 3, 2018
Technology has eroded our privacy protections. Most things individuals or organizations do are now in the public domain. Third-parties monitor, store, and use personal and organizational data, patterns, preferences, and activities. Many emerging business models rely on the collection, organization, and resale of our personal data.
Technology has also made it easier to link data back to an individual, even if that individual opts out of social networking platforms. For example, breakthroughs in facial recognition technology have found broad application in commerce and security, especially in China and Russia.
Blockchain technology could potentially limit the impact of this erosion of privacy, while still releasing personal information when it is useful. For example, a user could store personal information on a blockchain and release parts of it temporarily to receive services. Bitcoin and other blockchain-based digital currencies have demonstrated that trusted and transparent computing is possible using a peer-to-peer decentralized network and a public ledger.
This essay explores the impact that data generation by individuals and organizations is having on privacy. It also briefly considers how blockchain-enabled systems could help put users back in charge of their data.
Part 1: The Individual
Privacy is important to individuals because their personal information is valuable to organizations, marketers and other individuals. A common saying among internet users is: if you aren’t paying for the product, you are the product. This means that for companies that offer free services, such as social networking platforms, personal information is valuable.
In the US, privacy is important for the exercise of various constitutional rights, including the rights to free speech, free association, free press, protection from unreasonable searches and seizures, and protection from self-incrimination. A lack of privacy from state actors imperils these rights. Knowing that their statements will be attributed to them in the public domain, an individual may not speak up about just causes, especially if their opinion or the subject is controversial or unpopular. Since many important opinions are unpopular or controversial, self-censorship should be of serious concern to society. Knowing you are being watched may also prevent you from meeting with like-minded people. Whistleblowers and others who expose organizational wrongdoing are less likely to do so if they cannot remain anonymous. As technology has advanced, courts have outlined limits on state behavior.
If information is power, protecting that information can help a person negotiate with organizations. For example, health insurers can use negative health information to charge particular individuals higher premiums, or to deny coverage. An individual therefore has an incentive to prevent such information from entering the public domain. Public release of health information can also be awkward or embarrassing to a person’s social or professional life.
Financial information is another realm in which personal information empowers its possessor. An individual may not want a financial institution to be aware of a poor credit record from several years ago, and certainly would not want nefarious actors to use the individual’s personal information to commit identity theft or financial fraud.
While people have to disclose personal financial information frequently, for example to pay taxes, or to obtain a mortgage, if such information enters the public domain, it is of interest to various actors. Corporations seeking customers, charities seeking donors, and political candidates seeking donations are all interested in knowing an individual’s personal financial information. Though these actors are largely benign, people have varying preferences for whom they want to share personal financial information with, and what they want it to be used for.
For example, people may not want their financial information used by companies to determine the maximum amount they would be willing and able to pay for something which is generally available for a lower price. Yet, Orbitz showed the same hotel rooms to Mac users for higher prices than to PC users. This shows that even a little personal information, such as whether a person uses a PC or a Mac, can be useful to marketers in gauging the financial capacity of a person, in a way that most people would find uncomfortable.
Studies have shown that people act differently when they know they are being watched. So, at a more fundamental level, privacy enables a person to act in an inspired, intimate, or silly manner, without fear. Just as people intuitively close or lock their front doors, they desire privacy in their interactions with technology. As discussed above, there are many good reasons for organizations to collect personal information (ranging from convenience in using a product or service in the future, to ease of future interaction, to researching consumer preferences to create better products or services). Savvy organizations make efforts not to violate individuals’ expectations of personal privacy, because personal privacy is inherently valuable: it allows people to be themselves and act in an uninhibited manner.
Anonymity and Pseudonymity
There is a paradox at the heart of the internet: people expect it to be an open and transparent forum to exchange ideas, information, goods, and services, and yet they also expect to surf the internet anonymously. As a prominent New Yorker cartoon caption put it: on the internet, nobody knows you’re a dog. Expectations of privacy on the internet vary by interest group and by country. For example, due to government regulation, South Koreans have lower expectations of internet privacy than Americans. In early 2015, China’s government passed new regulations requiring internet users to use their real names when blogging or using social media in the future. The situations in which anonymity and pseudonymity are desirable on the internet are subject to debate. For example, while it would be reasonable for a college student to expect to be anonymous on the internet, the same cannot be said for someone using the internet to plot a violent crime. People have different opinions on whether professors should be allowed to write under a pseudonym.
Though interesting, the debate on when anonymity or pseudonymity should be respected risks becoming moot as new technology poses new and unprecedented challenges to personal privacy.
Social Media and the On-Demand economy require disclosure of personal data
The rise of Facebook, Twitter, and other social networking platforms has changed expectations surrounding internet privacy. People who use these platforms volunteer personal information and agree to share it with either limited audiences or with everyone. However, even people who do not want to share their personal information (for example, those without Facebook or Twitter accounts) can feel pressure to do so for career or marketing purposes. Employers often perform a Google search on candidate employees. While no results are better than bad results, today’s job candidates generally use the internet to market themselves, either by creating a LinkedIn account, or showcasing accomplishments through a personal website, blog, or article. The same is true for small businesses: most either have an online presence, or lose business because they do not have one.
Another major technology trend in the past decade has been the emergence of the on-demand economy (briefly called the sharing economy), in which providers and users of goods or services connect through web or phone applications or sites to transact. This has the effect of cutting out established middle players (a process called disintermediation). Just as ride-sharing services like Uber and Lyft have rendered traditional taxi services and city taxi medallion systems nearly obsolete, Wikipedia has replaced large privately edited encyclopedias like Microsoft Encarta. Travelers today check not just hotel prices in other cities, but also Airbnb prices to rent short-term rooms or homes directly from other people. As Airbnb and other companies emblematic of the on-demand economy have emerged, they have addressed customer security needs by requiring hosts and guests to provide personal information. Additionally, after an Airbnb stay, the host and guest are invited to review each other, and these reviews often contain personal information and are freely available to anybody browsing the Airbnb website.
Though companies such as eBay have used identity verification and user reviews to fight fraud and encourage quality in online marketplaces for decades, the new generation of sharing economy apps consider equate personal information with credibility: the more the better. Since participants in on-demand services interact more closely with one another than buyers and sellers on eBay, this makes sense. Getting in a stranger’s car, sleeping in a stranger’s house, and allowing a stranger to walk your dog require a high degree of trust in the application, site, or platform. Platforms that collect more information engender more trust. For example, while Airbnb guests can build trust through a history of positive host reviews, guests with no prior history may upload government ID, or even link to their Facebook account, to signal credibility. So, while the sharing economy’s disintermediation allows people to trust each other, it also often expects them to disclose public information to the platform or even into the public domain.
The personal data individuals provide to social media networks and on-demand economy services can provide a base for organizations or people looking to profile individuals based on attributes such as location, age, gender, consumer preferences, social preferences, and other similar attributes. Personal information empowers its possessor, so individuals should carefully consider the level of personal privacy they are comfortable with in different situations.
Part 2: Organizations
Protecting Organizational Strategy and Proprietary Information
Organizations need to protect sensitive information, such as corporate strategy insights, or proprietary information (such as intellectual property), which could be used by competitors or nefarious actors to undermine the organizations. This is not just the case for organizations that manufacture cutting-edge technology in their industries (such as those in the defense, aerospace, or automobile industries), but for all kinds of organizations. In 2016, hackers targeted the Democratic National Committee and released its internal e-mail communications, undermining the credibility of several Democratic Party candidates. Also in 2016, hackers posing as Bangladesh central bank officials sent instructions to the New York Federal Reserve in an effort to steal $951 million from the Bangladesh central bank’s account at the New York Federal Reserve. They succeeded in siphoning $101 million, most of which international authorities have still not recovered. These headline-grabbing hacks are emblematic of the challenges organizations face in protecting their organizational strategy and proprietary information from competitors and bad actors alike.
Collecting Customer Information for Marketing and Research Purposes
Organizations collect customer information for marketing purposes. Grocery stores sift through location information collected from people’s phones to determine how much time they spend in certain aisles. Frequent flyer and other loyalty programs provide perks to people in exchange for their personal information, which can then be used for enhanced promotion of products and services. Both for-profit and not-for-profit organizations build customer or donor profiles that help them make sales and raise funds.
Customer information can also be used to help understand consumer preferences and provide better products and services in the future. Additionally, an organization’s collection of customer information often makes it more convenient for a customer to transact with that organization in the future.
In collecting such information, organizations must comply with applicable federal and state law. In the United States, health care companies, financial services companies, and other companies are limited in different ways in the types of customer information they are permitted to collect. Sophisticated organizations are familiar with the legal frameworks governing their industries. As data collection and analysis is increasingly performed by non-human actors, legal and regulatory compliance is growing significantly more difficult.
Protecting Customer Information
For all consumer-facing companies, and perhaps especially for technology companies such as Google, Apple, and Amazon, collecting personal information from consumers helps to provide more personalized service to each consumer. It also creates a responsibility for the companies to safeguard that information. For example, syncing data across a person’s Mac, iPhone, and iPad allows for convenient access to information, but also creates an expectation that Apple will take steps to protect this information from outsiders.
Protecting Client Information
For organizations that service other organizations (such as accounting or legal services organizations), protecting client information is similarly crucial to maintain client relationships. Hackers have previously targeted professional service companies in order to get their clients’ data, in attempts to circumvent the clients’ sophisticated data protection measures. Understanding this risk, major professional services firms have undertaken significant efforts to protect client information.
Current Organizational Measures to Protect Sensitive Information
Organizations have adopted many measures to protect sensitive information: for example, hiring internal and external information security professionals, educating employees on how to protect information, and vetting employees to protect against insider threats. Some organizations that have been targeted or are likely to be targeted have also started using honeypots (fake, company-monitored data rooms purporting to hold sensitive company information, which bait and deceive hackers and help companies understand hackers’ motives). These measures are important and necessary, and need to evolve to meet new challenges.
Part 3: Big Data
Organizations collect data that encompasses all kinds of information, such as about a company’s products and services, internal processes, market conditions and competitors, supply chains, trends in consumer preferences, individual consumer preferences, and specific interactions between consumers and products, services, and online portals. The amount of data collected by organizations has been expanding significantly.
Concerns about the expanding quantity of stored data have existed at least for the past century. In April 2014, research organization IDC released a report entitled ‘The Digital Universe of Opportunities: Rich Data and the Increasing Value of the Internet of Things’, which predicted that “from 2013 to 2020, the digital universe will grow by a factor of 10 — from 4.4 trillion gigabytes to 44 trillion”. The present-day expansion of data is powered by increases in computing and data storage capabilities, an increase in sensors, and an increase in connectivity.
Increases in Computing Power and Data Storage Capacity
Over time, we find it easier to store more information because both computing power and storage capacity have increased dramatically in the past decades. Intel co-founder Gordon Moore observed in 1975 that the number of transistors on a chip doubled every two years. This observation, known as Moore’s law, accurately described exponential gains in computing power well into the twenty-first century. Similarly, and concurrently, the last few decades have seen exponential growth in data storage capacity, and especially in our ability to store vast quantities of data on ever smaller drives.
Thanks to these increases, it has become possible, cheap, and even convenient to store larger amounts of data. Consequently, organizations have tended to err on the side of storing more data, because the risks of losing data that subsequently turns out to be useful outweigh the benefits of minor cuts in storage costs by saving less data.
The Rise of Cloud Computing
Cloud computing simply means the on-demand use of remote (rather than local) computing power, data storage, applications, or networking. Thanks to cloud computing, organizations can simply rent computing power or storage from cloud providers, and no longer need to store large amounts of data on their own servers, maintain vast computing power, or maintain traditional databases with vendors (though, traditional databases can still be useful for large organizations). Since cloud providers must provide reliable on-demand, scalable solutions, large corporations like Amazon, Microsoft, and Google are the dominant cloud providers.
Cloud computing services for business come in three flavors: Software as a Service (SaaS), in which the cloud provider allows an organization to use its software online (for example, accessing TurboTax online), Platform as a Service (PaaS), in which an organization develops applications for its members to access using the cloud provider’s resources (for example, Microsoft Azure), and Infrastructure as a Service (IaaS), in which the cloud provider provides basic infrastructure services for the customer to offer a service on (for example, Netflix using Amazon Web Service infrastructure to allow video streaming). (Individual users may be more familiar with cloud applications such as iDrive, Google Docs, or Dropbox.) Cloud computing is enabled not only by the advances in computing power and data storage capacity discussed earlier, but also by significant increases in internet access, speed, and reliability. Though cloud computing was adopted partly in response to the difficulties of storing ever-growing amounts of data, its widespread adoption has contributed to the expansion in the amount of data stored.
Increased Sensor Use, Sensor Sophistication, and Connectivity
While traditionally, the internet has consisted of human users interacting with each other and with databases, the internet is increasingly used to connect physical devices to one another. This qualitative and quantitative growth in Machine to Machine (M2M) interactions is expected to result in an omnipresent Internet of Things: a network of smart devices that integrate with one another and perform increasingly complex functions with minimal human intervention.
The Internet of Things does not require radically new infrastructure. Instead, it comes about by embedding existing devices and infrastructure with sensors. For example, when a family air conditioning system uses sensors to determine when people are home, and uses this information to decide temperature to keep each room, it becomes ‘smart’. Similarly, when urban energy and water infrastructure is embedded with sensors, smart systems can emerge to monitor use, predict future use, and allocate resources more efficiently. Advanced sensors can improve almost any technology, from a smartphone, to a heart monitor, to a massive farm or factory.
Smart devices require different kinds of sensors depending on their uses. Computers have had cameras and microphones for decades. Now, devices can also be equipped with sensors that measure temperature, motion, distance of other objects, pressure, chemicals, and various other things. Sensors are improving in quality every year, and are also becoming available at lower prices.
While sensors alone can improve a device greatly, communication among devices is necessary for many smart systems. For example, though it would be useful to have your car drive itself with the use of various sensors embedded in the car, it would be even more useful to have all the cars on the road communicating with each other. This would ease traffic flow, and ideally make car travel safer. Similarly, when the Internet of Things is adapted for large scale production or warehousing, it will require excellent communication between devices. Advances in communication technology of all kinds, but especially wireless communication technology, have unlocked potential for Internet of Things technologies.
Smart Devices often collect Personal Data
Before smart devices, you would create little to no data by going on a morning run. Today, if you install a health app on your smartphone and go on the same run, data on the duration of the run, the route you took, the number of steps you took, and your heart rate at various points throughout the run can all be collected and stored in your phone. Similarly, thirty years ago, an air conditioner installed in your window would not have created much data. Today, a smart home temperature system collects significant amounts of data daily, on temperature and humidity. Once collected, this data could be used not only to keep a home at a desired temperature and save energy, but also to deduce when you are usually at home, or when you are most likely to adjust your temperature settings. The examples of your morning run and your home air conditioning system show that smart devices create more data. They also show that personal data collected for one purpose can be used for many other purposes.
The digital universe is expanding at an exponential rate: in an April 2013 post, Ralph Jacobson published an article on IBM’s Consumer Products Industry blog estimating that the world created 2.5 billion gigabytes of data every day. Just as advances in computing power and storage capacity have contributed to an increase in the amount of data collected by organizations, so have the emergence of improved sensor and communications technologies. Sensors measure things, and these measurements are communicated to other devices or servers, collected, and stored as data points.
Artificial Intelligence emerges from data, as I discussed in a prior post.
Most Artificial Intelligence research is funded by Google, Amazon, and Baidu, because data is the food of Artificial Intelligence, and these firms generate massive amounts of data every day. Microsoft acquired LinkedIn not for its platform, but for its data.
Thus far, companies have used Artificial Intelligence to revolutionize finance and manufacturing. Protecting our financial information can sometimes seem like a lost cause, since the government, landlords, banks, credit card companies, and credit rating agencies all collect it (and as the Equifax hack showed, store it poorly).
People often consider their healthcare information to be even more sensitive than their financial information. Artificial Intelligence has been used on healthcare data, but since this data is often dispersed and fragmented due to to privacy concerns, the impact of Artificial Intelligence in healthcare has been somewhat limited. Eventually, however, companies will compile increasingly comprehensive databases with our private health information.
If technology has caused this erosion in privacy, perhaps technology can help solve it.
Part 4: Blockchains
Blockchains with special protocols allowing varying degrees of anonymity, confidentiality, and privacy can enable the protection of healthcare, financial, and other personal data while still allowing this data to be used in Artificial Intelligence applications. For example, a user could have a blockchain with personal health information and only release particular elements of this information (such as vision prescriptions) briefly to product or service providers (such as contact lens manufacturers) for particular purposes.
Though a blockchain is a public ledger, stored in multiple locations, it can enable anonymity and trust, as demonstrated by multiple blockchain-based applications and high-privacy cryptocurrencies already in existence.
Currently, the personal data that third-parties collect is usually stored on centralized databases with a single point of failure. Leaks of this data often go unnoticed and unreported. Once our data is in the hands of an untrusted party, we have no control over how it is used.
The Blockchain Solution
As Bitcoin has shown, cryptography and well-thought-through economic incentives can create a secure way of storing and managing information, including personal information.
Private data on the blockchain is protected by cryptography. I have discussed elsewhere how hashing is at the heart of blockchain technology. Below are some ways in which blockchain technology can be used to protect personal data, even while making parts of that data available for processing by algorithms.
A new kind of encryption called ‘Homomorphic Encryption’ allows for computations to be done on encrypted data without first having to decrypt the data. This means the privacy and security of the data can be preserved while computations are performed on it. Only users with the appropriate decryption keys can access the private details of the data or transaction.
Cryptographic techniques such as Zero Knowledge Proofs (ZKPs) and zk-SNARKs already use homomorphic encryption. A popular crypto-protocol called Zcash uses zk‑SNARKs to encrypt its data and only gives decryption keys to authorized parties for them to see that data.
The blockchain could provide models for non-blockchain solutions, or be part of a hybrid solution to protect privacy.
State channels are blockchain interactions which could occur on the blockchain, but are instead conducted off the blockchain. State channels work through three processes.
Locking: the transaction is locked using a smart contract on the chain.
Interaction: interactions happen off the chain or on a sidechain.
Publishing: after the interactions are complete and the state channel is closed, the smart contract is unlocked and a reference to the transaction is published on the blockchain.
State channels could allow service providers to keep user data private and secure. Transactions could take place off the blockchain with a reference hash of the transaction (revealing no confidential details about the transaction) being saved on the blockchain.
Our private information is currently centralized on the internet and in company databases, hence controlled by a few players. Exponential rates of growth in the creation and collection of data can be expected to continue to erode our privacy. Blockchain-based or blockchain-inspired solutions could help reduce this erosion of privacy, while allowing us to benefit from faster transactions, better service, and more capable Artificial Intelligence algorithms.