HIPAA regulations safeguard patient’s personal health information but can also complicate the process of safeguarding public health. Information aggregated from the medical records of vast numbers of individuals is needed to develop new drug therapies and novel medical treatments, to stop epidemics or prevent other life threatening events.
In a study published in the September-October issue of the Journal of the American Medical Informatics Association, Jeff Friedlin, D.O., of the Regenstrief Institute, Inc. and the Indiana University School of Medicine, discusses a new computer program that may resolve the complex issue of privacy vs. public good. Dr. Friedlin writes about the Medical De-identification System (MeDS), a highly accurate and speedy computer software program he has developed and successfully tested for de-identifying patient information while retaining the essential data key to medical research.
“Medical researchers need data from really large numbers of actual patients, but must protect their privacy. The more data we can access, the better our studies will be. This is not the first software program to remove or “scrub” patient identifiers from medical records, but compared to programs that have been evaluated and described in peer reviewed studies, it is both broader and more accurate,” said Dr. Friedlin, who is a research scientist at Regenstrief and an assistant professor of family medicine at the IU School of Medicine.
MeDS can eliminate identifying data from history and physicals, discharge notes, and laboratory, pathology and radiology reports. The current generation of de-identifying software concentrates on removing patient identifiers from pathology reports.
The new software program replaces the deleted identifying data with a symbol so the researcher knows something was taken out. To further insure confidentiality, MeDS does not indicate the nature of what was removed. “This software does something that a human could easily do but in a fraction of the time and expense. A human could ‘white out’ personal identifying information in 10 hefty medical records in about 6 hours. MeDS can do the same thing in under two minutes,” said Dr. Friedlin.
MeDS is the first system described in peer-reviewed literature which attempts to detect and eliminate misspelled names. In addition to deleting the patient’s name, Smith, for example, MeDS also is able to find and delete misspellings like Ssmith or Smithh or Smmith or even mith. While acknowledging that this sometimes leads to eliminating information that does not identify the patient (“red” being eliminated from the record of a patient whose name is “Reed”), Dr. Friedlin says he would rather accept some degree of what he calls over-scrubbing than risk release of personal data by setting the bar too low.
No system is infallible. What information might MeDS neglect to protect? Not much, according to Dr. Friedlin, although something similar to “the patient is a former president of the United States with Alzheimer disease” would not be caught.
MeDS has been tested on data from the Regenstrief Medical Record System, a large (more than 660 million distinct observations) repository of 35 years of patient data and on data from other institutions.
Regenstrief Institute medical informatics research scientists comprise one of the largest medical informatics physician brain trusts in the United States.
Study co-author Clement McDonald, M.D. is an internationally recognized medical informatician. He is the “father” of the Regenstrief Medical Record System and directed the informatics program at the Regenstrief Institute for three decades. He is currently the director of the Lister Hill Research Center at the National Library of Medicine.