This article is based on a July 24 Computerworld piece by Jay Cline, a former chief privacy officer at a Fortune 500 company who is now president of Minnesota Privacy Consultants.
Many organizations use de-identified patient data—that is, data in which all information that could be used to identify individuals has been removed—to support clinical research and development. The Veteran’s Administration for example, has recently announced it will use such data to improve the care for medical conditions ranging from cancer to PTSD. Similarly, private-sector organizations such as IMS Health and SDI Health use de-identified prescriber-level data to create reports that inform the CDC, the US Department of Public Health and most major pharmaceutical companies.
At the moment, the HITECH provisions contained within the American Recovery and Reinvestment Act (also known as the stimulus plan) do not require patient notification when de-identified data are breached.
However, at the request of privacy advocates, the US Department of Health and Human Services has decided to review this policy.
A reversal of the status quo would force health care organizations to notify patients if their de-identified data has been breached.
The consequences of such a reversal are enormous, as the California experience has shown. California recently mandated that patients should be notified when their de-identified data has been breached. This year alone, more than 800 breaches have been reported to the beleaguered state, which has managed to investigate 122 of them.
The state has yet to detect a case in involving criminal mischief, or an intent to use the information for personal financial gain.
How is Patient Data De-Identified?
The Health Insurance Portability and Accountability Act says that health information can be linked with particular individuals if it is associated with any of the following “Direct Identifiers.”
- Names
- Geographic subdivisions smaller than a state, including street address, city, county, precinct, ZIP code, and their equivalent geocodes, except for the initial three digits of a ZIP code
- All elements of dates (except year) for dates directly related to an individual (e.g., date of birth, admission)
- Telephone numbers
- Fax numbers
- Email addresses
- Social Security numbers
- Medical record numbers
- Health plan beneficiary numbers
- Account numbers
- Certificate/license numbers
- Vehicle identifiers and serial numbers, including license plate numbers
- Device identifiers and serial numbers
- Web universal locators (URLs)
- IP address numbers
- Biometric identifiers such as fingerprints and voice prints
- Full-face photographic images
- Other unique identifying numbers, characteristics or codes
It follows that de-identification involves scrubbing these elements from the patient files before they are used for research purposes.
According to HIPAA, there are 3 acceptable ways to de-identify patient data. The first is the “safe harbor” option, in which all 18 identifiers are removed. The second is the “statistical” option, in which a retained statistician determines which of the 18 identifiers can be maintained without creating greater than a “very small” risk that the data could be re-identified. The third is the “limited data set” technique, in which the organization removes 16 identifiers and protects what remains with special security precautions.
In Part II of this post, we discuss the benefits and risks associated with de-identified patient data and argue that HHS should not reverse current HITECH provisions on the matter.
Glenn Laffel MD, PhD, Sr. VP, Clinical Affairs



















