Personal Genome Project genomes and ethnic diversity

Jason Bobe and Pete Estep from the Personal Genome Project (PGP) have made a list of 11 enrolled participants that are part of father-mother-child trios, which might be useful for Reference Materials. They are all in different stages of having cell lines made from them at Coriell, with the parents of one Caucasian trio in the QC process. Most of the trios are self-identified as White or Caucasian, but one trio self-identifies as Asian (hu91BD69/hu38168C/huCA017E) and one trio self-identifies as mixed White plus American Indian/Alaskan Native ethnicity (hu620F18/huD4BF17/huD62596). The remaining trios are Caucasian except one (hu1053CC/huFAF1FE/hu40D515) has unknown ethnicity. Note that rows 6 and 7 are twin sisters with the same parents.

We are interested in your feedback about the desired ethnic diversity for NIST Reference Materials. Currently, there are no trios with African-American or Hispanic ancestry, which may be interesting since I think they generally have a greater number of differences from the reference assembly and shorter haplotype blocks. How important are these for performance assessment of whole genome sequencing? To achieve greater diversity, one path forward would be to select a few trios from the PGP now, and then the PGP could ask participants if they have family members interested in enrolling, particularly for ethnicities not currently represented in the set of trios. Your input as part of the consortium will help the PGP focus on which ethnicities or family structures might be most important to include, so please leave comments on the webpage or contact us if you have opinions. You are also welcome to click on the links for each genome here (https://docs.google.com/open?id=0B7Ao1qqJJDHQUXdXNHFVRjNyOVE) to see if any have phenotypes or other characteristics that would be useful for Reference Materials.

(1) Trio w/ cell lines farthest in development at Coriell
Father:
PGP Public Profile: https://my.personalgenomes.org/profile/hu6E4515
Coriell accession ID: GM24149
Lymphoblast Cell Line Status: As of 9/7 this cell line was established and "pending QC".
Genome Data: Publicly available (linked in public profile above), Complete Genomics platform
DNA from blood or saliva: Available from PGP

Mother:
PGP Public Profile: https://my.personalgenomes.org/profile/hu8E87A9
Coriell accession ID: GM24143
Lymphoblast Cell Line Status: As of 9/7 the cell line was established and "pending QC".
Genome Data: Genome is expected to be received before November 8 from Complete Genomics.
DNA from blood or saliva: Available from PGP

Son:
PGP Public Profile: https://my.personalgenomes.org/profile/huAA53E0
Coriell accession ID: GM24385
Lymphoblast Cell Line Status: Coriell received blood sample 9/27 and hopefully began creating a cell line immediately.
Genome Data: Genome sequence expected from CGI in 3-6 months.
DNA from blood or saliva: Available from PGP