The NY Times reported this story on Friday. Sure, this is a great thing for Big Pharma, but sucks for the rest of us. The article mentions how data in so-called anonymous databases can be matched back to patients.
I saw a presentation by Purdue's Prof. Tiancheng Li on how easily this can be done. Here's an example.
The Massachusetts Group Insurance Commission (GIC), which is responsible for purchasing health insurance for state employees, publishes for each employee zip, dob, sex, diagnosis, procedure, ... A researcher then purchased the Massachusetts Voter registration list, which contained name, party, ..., zip, dob, sex. Using three attributes--dob, sex, zip--the researcher was able to identify the medical record of then Governor William Weld.
This was a fairly benign example. But consider, for example, insurance companies using similar techniques to identify pre-existing conditions, or employers using them to dig into backgrounds of present or potential employees.
We know we can't trust industry to self-regulate, or place PII about its own self-interests.
It just so happens that we have two new books that deal with this problem, should you care to solve it.
Guide to the De-Identification of Personal Health Information by Khaled El Emam and
The Complete Book of Data Anonymization: From Planning to Implementation by Balaji Raghunathan.
Click here to read An Overview of Data Anonymization.