Thank you to all of you who joined the conference call yesterday to help us decide on the initial set of trios from the Personal Genome Project, in addition to the possibility of NA12878. NIST began the call with an update on the current status of the NIST IRB process for genomes to be distributed as NIST Reference Materials. NIST is currently seeking IRB approval for distribution of CEPH/Utah NA12878 and genomes from the Personal Genome Project (PGP), as summarized in a recent blog post (http://genomeinabottle.org/blog-entry/post-ashg-update-genome-bottle). We then discussed the criteria for selecting an initial set of trios from the PGP, and the desired characteristics that are not currently represented in the PGP. You can see existing PGP trios, along with comments (click on “Comments”) about distinctive trios at https://docs.google.com/open?id=0B7Ao1qqJJDHQUXdXNHFVRjNyOVE. To select the proposed initial set of 3 trios described below, we discussed several characteristics that might be attractive for Reference Materials:
1. Ethnic diversity: Since the goal of Genome in a Bottle Reference Materials is to assess technical sequencing performance, we do not inherently need people from every ethnic group like a population genetics study. However, the general consensus was that some ethnic diversity is useful for assessing technical performance. For example, African-Americans generally have more variants compared to the NCBI Reference Assembly, and admixed Hispanic genomes would have distinct phasing challenges. Francisco de la Vega suggested that having a genome that was recently admixed and one that was distantly admixed could pose different types of phasing challenges. Lisa Brooks suggested the following distribution of ethnicities:
2 European-ancestry – one northern/western, one southern/eastern
2 African-American – one AA, one African, or two AA from different parts of the US.
2 Latino – from different ancestral places, US or South/Central America
1 East Asian
1 South Asian
2. Family size: One PGP quartet has two monozygotic twin daughters, which might have interesting properties, though it is not immediately apparent how useful they would be for assessing technical performance. It was suggested that it would be valuable to have at least one quartet with two children that are not monozygotic twins, since sequencing of the children would allow phasing of the parents and not just the children (see http://www.ncbi.nlm.nih.gov/pubmed/20220176). Currently, this is not represented in the PGP participants.
3. Male/female split – ideally half of the children should be males and half females
4. Presence of known genetic disease – The primary goal of the Reference Materials is to assess technical performance rather than clinical interpretation methods, but since one trio has Stickler syndrome, we discussed whether there is added benefit to having genomes with mutations known to cause simple genetic diseases. The consensus was that this is a very low priority since every genome has variants that are related to diseases, so that they can still be used to assess some clinical interpretation methods if needed. In addition, labs could easily tune their pipelines to detect the single mutation in a reference material.
5. Cell lines with engineered mutations: Joshua Kapp from Horizon Diagnostics said that they have been looking into the informed consent for the cell line MCF10A from ATCC, and will send information to NIST. This cell line is not from a trio, but Horizon has engineered many mutations into it for reference materials to test detection of specific mutations. Horizon will follow-up with NIST about possibly developing normal MCF10A into a reference material, and we are interested in the relative priority this has for consortium members, so we’ll appreciate your feedback. Particularly, how much value would be added by characterizing the whole genome sequence of the unengineered MCF10A and making it into a NIST Reference Material?
Based on these discussions, we propose that we select three PGP trios for the initial Reference Materials (along with using existing data for NA12878 for bioinformatics methods development and releasing it as a Reference Material if the NIST IRB approves it): the East Asian trio with son (hu91BD69/hu38168C/huCA017E), the Ashkenzim Jewish trio with Eastern European ancestry with son (huAA53E0/hu8E87A9/hu6E4515), and the Caucasian quartet with 2 monozygotic twin daughters (huCDC3B8/ huFE01E1/hu1E8957/hu961968). Blood from these trios has been sent to Coriell, and cell lines are under development, but they are not yet completed. The PGP has posted a blog (http://blog.personalgenomes.org/2012/11/29/seeking-diversity/) asking for volunteers with more ethnic diversity, so we plan to revisit their list of trios over the next year to hopefully achieve a diverse set of 8 trios. As always, we seek your feedback, and please let us know if you have any other suggestions.
Other discussions on the call:
NCI has been developing synthetic DNA spike-ins with known cancer-associated mutations that are barcoded so they can be spiked into sequencing experiments, and they have started testing these.
Several people on the call thought that it would be valuable to characterize DNA from blood and/or saliva of at least some of the trios, since it would be useful for better understanding Mendelian inconsistencies. The Reference Materials would still be developed from cell lines so that they can be renewed indefinitely, but having DNA sequence of blood or saliva would be useful supporting information to understand changes induced by cell line immortalization. PGP noted that they occasionally take blood draws from participants, but they don’t want to over-burden them. Therefore, they have developed methods to extract 40-50 mg of DNA per ml of saliva, with the majority being human DNA, and have sequencing ~50 genomes from saliva. Saliva is less of a burden to participants, so it can be collected more frequently, and also has medical relevance.
It was discussed whether the PGP consent allows for withdraw of participants from the study, which it does, but with the understanding that it will likely be impossible to destroy existing cell lines and sequence data.
Thanks again to all who participated on the call, and please leave comments and questions here.