I am not an information technology guy, but by now even I have heard the term "big data" applied to all sorts of topics in our public discourse, from government and accounting to the environment and biology.
Well, consider this big-data issue: There are more than 3 billion (3,000,000,000) nucleotide letters in an individual’s genetic code – and the diagnostic use of "whole genome sequencing" aims to spell out the entire code of an individual to find the one or two variant letters that are causing that individual’s rare disease presentation. Oh, and by the way: There will be about 3 million (3,000,000) variant letters in each genome – you just need to figure out which of these matter and which do not.
There are some relatively simple concepts that can be applied to allow for filtering of the data, such as comparing variants found in an individual against those found in large groups, then eliminating the "common variants" as candidates for the cause of a rare disease. Or generating the complete data set from the relatives of the patient, then comparing the patient’s data to family members’ data.
Among the uses of family data: Positively select for shared variants of patients and their family members with the same medical diagnosis, or reveal both parents to be carriers of single mutations in cases where their children have rare autosomal recessive diseases (caused by two mutations).
The simple concepts aside, this is a massive data analysis task requiring many kinds of clinical and scientific expertise – and the strategies for this interpretation are not fully worked out yet.
Enter into this complex situation a challenge, and a prize.
The Children’s Hospital of Boston (CHB) designed its recent CLARITY (Children’s Leadership Award for the Reliable Interpretation and Appropriate Transmission of Your Genomic Information) Challenge as a way to "inform the creation of much-needed ‘best practices’ in genome analysis, interpretation, and reporting – providing the most meaningful results to patients and their families."
The initiative attracted 23 research teams from around the world, which agreed to receive an abstracted medical record together with genomic information from three children with genetic disorders of unknown causes and their parents. The teams’ challenge was to interpret the genomic data from the cases to find the diagnostic genetic changes in the genomes of the children, two of whom had severe neuromuscular disease and one of whom had died of congenital heart defects.
As incentives, the initiative offered the winning team a $15,000 prize, while finalists would earn $5,000 prizes.
The 23 groups all approached the diagnostic problem in slightly different ways. None of the teams was perfect, but the judges recognized eight of the teams for excellence in one or more aspects of the process. And a team led by Brigham and Women’s Hospital and including researchers from Massachusetts General Hospital, Partners Laboratory for Molecular Medicine, Brown University, and Utrecht University shared the $15,000 prize.
The three families’ stories are shared on the CHB website.
Medical details aside, we as a medical community and as a society are indebted here to the brave families who agreed to share their difficult stories.
Although the "best practices" remain to be better defined, the CLARITY Challenge process has reassured many of us in the field that we are getting closer to standardizing solutions for the most intimate of "big data" issues – namely, the one carried around by each of us in every cell in our body.
Dr. Murray is the director of clinical genomics at Geisinger Health System in Danville, Pa.
