Data Minimization and Anonymization: Essential Tools for Reducing Privacy and Security Risk and Enhancing Trust

Innovative companies are increasingly recognizing that when it comes to data, more is not always better, and indeed can sometimes be harmful. By understanding which information actually adds value, companies can use tools such as data minimization and anonymization to simplify compliance with security and privacy laws, as well as increase trust among consumers.  Data minimization refers to efforts to collect or retain less information.  On the other hand, anonymization refers to the manipulation of data a company possesses, with the intent of rendering re-identification unlikely.

This post will examine a couple of case studies in data innovation, specifically companies that have developed new ways of using data minimization and anonymization to address regulatory requirements, as well as potential security and privacy concerns of their consumer base.

A.  Case Study I:  Healthcare Provider

Healthcare companies are generally subject to a number of regulations, among them the Health Insurance portability and Accountability Act (HIPAA). This highly complex regulation includes multiple rules and clauses regulating data collection and storage, resulting in potential increases in expenses and risk for companies required to comply with HIPAA. Some healthcare companies are decreasing their exposure to HIPAA through innovating to minimize the collection of “protected health information” (PHI).

     i.  HIPAA Scope Overview

HIPAA promulgates privacy and security standards through its Privacy Rule (and Security Rule for ePHI) to protect PHI, or individually identifiable health information. Such information includes “an individual’s past, present or future physical or mental health or condition; the provision of health care to the individual; or the past, present, or future payment for the provision of health care to the individual; and that identifies the individual or for which there is a reasonable basis to believe… can be used to identify the individual”.[i] The broad range of information that falls within this definition includes name, birth date, birthplace, address or phone number, biometric identifiers, social security number, drivers license number, passport number, medical or health record numbers, email address, and URLs or IP addresses.[ii]  

Although the Privacy and Security Rules delineate standards for how an organization should handle this information, this case study will focus on minimizing the PHI collected, which may assist with regulatory compliance through the reduction of applicable data.             

     ii.  Innovation Through Anonymization

A San Francisco Bay Area startup has managed to limit its exposure to HIPAA compliance requirements, and simultaneously increase patient trust, through minimizing data collected about users of the service. This startup is one of a growing number of alternative online healthcare providers which provide services such as appointments with therapists or medical advice from physicians.[iii] On most such platforms users seeking online advice or healthcare are typically required to register for services with identifying information, however this platform is designed without such a requirement, thus providing an increased draw for those users who value their anonymity.[iv]

The platform promotes anonymity and data minimization through a number of design decisions. First, it allows users to create a profile using a user-generated nickname and does not ask for or require a full name or contact information. The information provided to the healthcare provider is shared in a private online “room” where information relevant to the visit is entered by the user. Second, the provider has direct contact with the user/patient using the online room to correspond and answer questions, but past medical histories and medical images are generally not shared. Finally, the company avoids collecting payment-related identifying data by handling payment via credit card. A user may therefore pay via prepaid credit cards, which are not linked to a specific individual.  (See Case Study II for further developments in anonymous payments.)

This approach prevents the collection of most PHI, with the exception of technical identifiers such as URLs and IPs. However, the site or application may simply determine not to log these potential identifiers to truly minimize exposure to HIPAA.

Although the above case study is likely not an absolute solution to avoid being subject to HIPAA, it opens a conversation on how a company may make innovative decisions in data minimization and anonymization to minimize the impact of HIPAA and increase user trust.

B.  Case Study II:  Privacy and Security in Online Purchases

As the online economy grows every year,[v] and serves as a default or preferred location for shopping and paying bills for a large part of the population, many are still distrustful of ecommerce. A review of recent developments in credit cards validates some of these concerns for end users: In-person purchases have improved their protections with the addition of the computerized EMV chip for security, as this removes use of the magnetic stripe, which was the primary vulnerability of prior credit cards. With this development attackers are increasingly focusing on online transactions to perpetrate fraud, forgery, and identity theft, as the security of the EMV chip is of no protection online. This has resulted in an increase of 113% in new account fraud, accounting for 20% of all losses, since the 2015 switch by the US to EMV chips. [vi]

     i. Increases in Data Breaches and Identify Theft Impact Consumer Finances, and Confidence

In 2015 the United States was subject to a total of 781 data breaches, according to the Identity Theft Resource Center Breach Report, and breaches involving Social Security Numbers totaled 338, with significantly more records impacted than in previous years.[vii] In the same year, hacking incidents reached 37.9%, the highest in 9 years. [viii] In 2014, there were roughly 17.6 million US cases of identity theft, [ix] a rate which held relatively consistent in 2015. [x]

A majority of identity theft and fraud victims reported direct financial loss, with 14% reporting losses in excess of $1,000.[xi] Studies have shown that consumers who lose trust in their financial institutions are often less likely to utilize account monitoring options, which can result in their data being utilized fraudulently for up to 75% longer than consumers who remain trustful of these institutions. This very real and impactful problem clearly has a connection to consumer trust – and is issue that can be addressed through innovation.

     ii. Innovation in Online Payments Privacy

Currently, consumers who shop or pay bills online usually create an account with identifying data including their personal debit or credit card information, which is often linked to their bank account. If this information is obtained by attackers, either the credit card company (as is often the case with breached credits cards) or the consumer (generally the case with debit cards) are likely to lose the money spent when the credit card information is used illicitly on other sites.

One company has developed a way to keep this information, including credit or debit card information, private from online retailers and other third parties. A consumer creates an account on the platform (including entering actual financial information) and installs the associated browser extension. When the person visits a website that requires payment information, the payment alternative icon will appear in the form and may be clicked to create a new “card” and underlying payment information solely for the purpose of this site. After that single-site card is charged, funds are withdrawn from a selected funding source identified in the user’s account. Their account also provides them with the ability to cancel or pause cards, control payments, or set limits for charges.

This innovative approach to providing consumers with the online equivalent of burner credit cards, as well as fairly granular control over their use, offers a variety of advantages. To begin with, should a breach of any of the sites occur, a person would only have to log into their account and cancel their site-specific card and replace it with a new one. The card cannot be used for other purposes, negating the considerable loss often associated with stolen credit and debit card numbers. And if (or when) that data breach does occur, the consumer doesn’t have to wait for a replacement card to arrive in the mail. Considering that most breaches are not detected for at least six months after they occur,[xii] this approach limits potential damage by the attacker.

While the company offering this service does collect a significant amount of information about users, many consumers are likely to have greater trust in this organization than any credit card processors for several reasons. First, they advertise increased privacy and security controls and restrict data collected to that necessary for providing the service. Second, their end goal is providing trusted services, not of selling additional services like many third parties. The approach offered by this solution help to increase consumer trust, which the continually increasing number of breaches has served to damage.

C.  Potential Areas for Innovation Application and Research

Just as technology is ever-changing, so too are companies constantly looking for new ways to innovate to keep up with new technologies and regulations, as well as increase the trust of their consumer base. Through the use of tools such as anonymization and data minimization, there are countless ways that businesses can move forward towards security and privacy innovation while maintaining business viability and demonstrating their consumer-centric priorities. A couple areas that could be used as a starting point for additional research in this area are discussed below.

While there are many pieces of information that are useful for a business to deliver their products or services, many companies collect far more information than is actually necessary for this purpose. Auditing the information collected and categorizing that which is absolutely required and that which is optional can give companies the ability to collect less personal information, or at minimum provide customers the option to opt out of providing such additional, unnecessary information.

Similarly, companies can categorize information that does not need to be accessed but does need to be verified. This data can be hashed prior to storage. Hashing is different than encryption in that it is designed to be irreversible. One-way hashing uses an algorithm to transform information into randomized output of a set size. While the process is irreversible, it is repeatable simply by providing the same input to create the same output. Such functionality could be of use for information that must be collected for identification purposes, but the information needs to be secured and does not need to be accessible other than to ensure that it matches future identification or authentication attempted via entry of the same information. Hashing is currently used successfully in some industries – such as advertising; however, its use could be expanded to with additional research and innovation.

D. Implications of a Changing Approach to Personal Data

The increasing impact of collecting and storing personal information may inspire more businesses to step back and re-evaluate the circumstances in which it is important to know a person’s specific identity, as well as those when it is not precisely relevant. As a digital society, we are slowly moving further from face-to-face encounters and towards digital interactions. Intentionally or not, the privacy laws which seek to restore some of the anonymity of the analog version of interaction may be spurring us toward this future.  Perhaps privacy is not dead, but simply lives on through the understanding that not all aspects of a person need to be known in order to serve them.

[i] http://www.hhs.gov/hipaa/for-professionals/privacy/laws-regulations/

[ii] http://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html

[iii] http://content.healthaffairs.org/content/21/4/168.full

[iv] http://content.healthaffairs.org/content/21/4/168.full

[v] https://www.census.gov/retail/mrts/www/data/pdf/ec_current.pdf

[vi] https://www.javelinstrategy.com/coverage-area/2016-identity-fraud-fraud-hits-inflection-point

[vii] http://www.idtheftcenter.org/ITRC-Surveys-Studies/2015databreaches.html

[viii] http://www.idtheftcenter.org/ITRC-Surveys-Studies/2015databreaches.html

[ix] http://www.bjs.gov/content/pub/pdf/vit14_sum.pdf

[x] https://www.javelinstrategy.com/coverage-area/2016-identity-fraud-fraud-hits-inflection-point

[xi] http://www.bjs.gov/content/pub/pdf/vit14_sum.pdf

[xii] http://www.zdnet.com/article/businesses-take-over-six-months-to-detect-data-breaches/