Breaking Down the Complexities Of Data De-identification In Layman Terms
Now that the entire planet is online and connected, we are collectively generating immeasurable quantities of data. An industry, a business, market segment, or any other entity would view data as a single unit. Still, as far as individuals are concerned, data is better referred to as our digital footprint.
From the moment we enter a website or open an app, we are giving out chunks of our Personally Identifiable Information (PII), i.e., names, email to some extremely confidential details on our Protected Health Information (PHI) such as our diagnoses, symptoms, diseases, allergies, and more.
These data and information stored online are tagged and labelled systematically for easy retrieval and access. However, with the increase in data exploitation, awareness in data usage, privacy policies, among other factors, healthcare professionals, businesses, and organizations are required to comply with the new HIPAA policies and protocols for the protection of user data and their identities.
This article should shed light on the key mandates and regulations to be adhered to when handling PII and PHI. That’s why we have explored the process of data de-identification in detail and explained it in layman terms for easy comprehension.
Let’s get started.
What Is Data De-identification?
In the simplest terms, data de-identification is the process of separating the personal identity of an individual from their data. With the current status of ML technologies, it is easy for a machine to detect patterns and identify someone based on personal information provided. But with such power comes great responsibility, too. There needs to be regulation in place to stall machines or people from tracing back an individual’s identity based on data or information provided.
Contrary to popular belief, data de-identification is not a single-step process. It is a systematic collection of procedures, algorithms, protocols, and tools deployed at various levels to redact certain features and kinds of personal information. With data de-identification, healthcare professionals can still get access to the data they require for their procedures while simultaneously securing patients’ privacy.
Data De-identification Methods
HIPAA recommends two approved methods for data de-identification –
● Expert determination
● Safe harbour method
Let’s take a look at both individually.
Though these are HIPAA recommendations, neither of the methods completely separate personal identity from information; there is still a chance that the data could be re-identified. However, the advantage of these two methods is that data re-identification is at acceptable and low levels. If you implement any of the two methods, you won’t be violating the HIPAA restrictions. Let’s explore how to decide which methods may be right for your needs.
Like the name suggests, expert determination involves bringing in an expert who assesses the data de-identification techniques, processes, and results. The expert here primarily works on using the available data and information to connect the dots and see if they can be traced back to an individual’s identity and, if they can be, to what extent.
Based on their observations, experts will assess the risks involved with the implementation and procedures followed and shared their opinions accordingly on how safe and effective the de-identification is. Their review and formal reports must be properly documented and made available to the board or regulators when an audit or investigation is called for.
Additionally, they should refrain from being vague or ambiguous in their response to data de-identification models. If their reports reveal that risks are negligible, they need to precisely define the term ‘negligible’ concerning the context and volume of the dataset.
The second method recommended by HIPAA requires removing all identifiers from data. These identifiers can directly point out an individual’s identity or give away just enough information to access them. Below are the 18 key identifiers outlined by HIPAA:
● Address or geographical details (reduced to states)
● entities that are indicative of age (usage of only years in dates)
● Email addresses
● IP addresses
● Contact details including telephone, mobile phone, fax, and other traceable numbers
● Electronic health records numbers
● Social security numbers
● Health plan numbers
● serial numbers of devices
● license numbers and certificate numbers
● account numbers
● Vehicle numbers, including the numbers of number plates
● Website URLs with UTM parameters
● Photos or images
● Biometric details
● Unique identification numbers, codes, and characteristics
Are There Any Tools or Software Applications to de-identify Data?
There are two ways companies can de-identify their data sets. To give you a brief overview:
- Companies can choose to remove data or their identifiers completely from their records or encrypt them in a tamper-proof way. They can also choose to change the values of these identifiers to remove any chance of data re-identification.
- Or, they can use a de-identification API like the one provided at Shaip to remove these identifiers from their datasets. Intelligent software solutions can recognize identifiers such as names, personal details, gender, and more autonomously and eliminate them.
Though the first method is effective, there might be situations in the future where you might want to re-identify data in-house for diverse research or study purposes. At that time, retrieving the data you possess becomes difficult.
That’s why it can ultimately be the more practical option to find a 3rd party tool or a platform to perform the cumbersome task of data de-identification.
There are tons of compliance and mandates involved in HIPAA, and we recommend that it be ideal for getting the process automated for optimum efficiency. To get started, you can use our data de-identification services for your needs. With over 100 million documents de-identified, we know the protocols and mandates like no other. Get in touch with us today.