In the Reference Material (RM) Selection and Design working group, there was considerable discussion about selecting genomes with consents that are appropriate for use as RMs from the consortium.
Two primary issues were raised:
(1) The risk of re-identification of the individual through use of genomic and other information will be higher for a genome chosen as a national RM than for genomes used in population genetics or other research.
(2) How extensive should the consent for commercialization be (e.g., for commercial use, commercial redistribution of derived products, etc.)?
Because of the extensive data available for the HapMap/CEPH/1000 Genomes sample NA12878 and her pedigree, much discussion has revolved around whether her consent (most recently for the HapMap Project) is appropriate for a NIST RM. Personal Genome Project samples have also been proposed as attractive genomes due to their broad open consent for re-identification and commercialization, as well as other materials such as iPSCs and tissues.
Since the meeting, we've had a number of discussions by email and in person, which we've included below. In this forum, we hope to make this a transparent, public discussion to get input from all interested parties so that we can make the best decision in consultation with our NIST IRB, legal staff, and others.
Email discussion so far, with most recent emails at the top:
---------------------------------------------
Dear Colleagues --
Thank you all for your input and discussion regarding the propriety of the consent for NA12878 to be used as the first NIST Whole Human Genome Reference Material. After discussing the HapMap consent with Jean McEwan and Lisa Brooks at NHGRI, we think that it is probably best to use PGP samples for NIST Reference Materials. The primary concerns with NA12878 involve (1) the high profile of this as the first NIST genomic Reference Material, leading to a greater risk of re-identification than originally anticipated by the consent and (2) the lack of consent for commercial redistribution and other possible uses in the future, such as creation of induced pluripotent stem cells.
Fortunately, we can still learn much from analyses of the existing data for NA12878, and will certainly apply these lessons to the NIST Reference Materials. We are still really interested in this discussion and in your input. If there is consensus that we should move forward with PGP samples, we hope to select at least the first one or two trios from the PGP project, and will start the process to gain IRB approval here at NIST.
Please feel free to respond directly to this mail, to cc others as appropriate, or contact us directly if you have concerns or other opinions.
Best regards,
Marc Salit
---------------------------------------------
On Aug 25, 2012, at 8:32 AM, george church wrote:
I agree. A transparent public record (as you mentioned at the Aug 16-17 meeting) sounds like a good idea.
---------------------------------------------
On Aug 24, 2012, at 5:00 PM, Salit, Marc L. wrote:
Ditto Linda Beth's thanks -- this is a good analysis that will help us ask the right questions and make the right decisions.
Any further thoughts from anyone?
We'll keep this email list apprised of anything we learn from further discussion at NIST -- and unless I hear objection to it, we'll plan to make this email chain public on the very-soon to be spun-up genomeinabottle.org consortium site, so everyone knows what's going on. I think this conversation should be transparent and open for discussion on that site.
Best regards -
Marc Salit
---------------------------------------------
On Aug 24, 2012, at 9:15 AM, Schilling, Linda Beth wrote:
Thanks so much for all the additional insights. This will be very helpful.
Linda Beth
Linda Beth Schilling
Senior Coordinator and Policy Advisor for Human & Animal Subjects Research at NIST
Office of Special Programs
Laboratory Programs Office
National Institute of Standards and Technology
---------------------------------------------
From: george church
Sent: Friday, August 24, 2012 7:12 AM
Subject: Re: NA12878 consent?
Below are some comments for NIST legal folks to consider, including some from Dan Vorhaus http://www.genomicslawreport.com/index.php/author/dvorhaus/
The three issues for 12878 are: 1) consent for non-research commercial use. 2) Explicit consent for re-identification, 3) removing samples (not just data) after withdrawal of the HapMap participant (or child, in case of death of participant, http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2106154). As was said by several people at the Aug 16-17 meeting, this is higher profile project and less controllable than previous research use -- and general sentiment to see this as a new project and keep it from getting off on the wrong foot for an expedient that will probably seem tiny soon, since the technology is changing so swiftly. So why not reconsent 12878 (or children, if deceased) to specifically address these three issues? If it is hard to get reconsent, then that is another red flag.
LB: "The possibility of accumulated genetic information being used eventually to identify people is very general for genomic research, but does not prevent use of samples, by NIH policy."
GC: Yes somewhat general, but not totally, since some protocols do squarely address re-identification (and enable frequent recontact). Also, recent NIH policy (e.g. for dbGAP samples) aims at requiring researchers to promise to keep samples away from people who might re-identify and/or identify high penetrance traits. This is hard enough with research-use, but will be even harder for the proposed much more widespread use.
LB: The one possible sticky point is “The Repository does not let anyone sell material from samples or cell lines.” This was meant to prevent secondary distribution by companies, not necessarily to prevent a government agency from distributing a standard.
GC: I agree that this is sticky. What "was meant" is not aligned with what is said. "Anyone" includes governments and their employees. Even if we stick to spirit rather than letter of the law, and even if we can guarantee that all DNA recipients refrain from redistribution, nevertheless the non-research uses (for example as part of diagnostic clinics leading to abortions) may not meet expectations of the HapMap participant.
=============================================
HapMap Consent Provisions & DBV Commentary
=============================================
"It also will not include any information that could identify who the individual people or families are." (pg 1)
[Note the explicit and absolute promise of de-identification, although this is qualified by the disclaimer on pg 2.]
Because the database will be public, people who do identity testing, such as for paternity testing or law enforcement, may also use the samples, the database, and the HapMap, to do general research. However, it will be very hard for anyone to learn anything about you personally from any of this research because none of the samples, the database, or the HapMap will include your name or any other information that could identify you or your family." (pg 2)
[The de-identification language is, as we have discussed, problematic. It suggests that any re-identification is very unlikely, and does not discuss any of the potential consequences should such identification occur. I am not entirely familiar with how these samples are being proposed to be used but, presumably, if one or more is being used as a "national reference standard genome" then the risk of re-identification increases simply due to frequency of use and the potential for greater interest in breaking anonymity. Note also that the language here is all framed around "research" uses.]
"The Repository will send the cell lines to researchers around the world to create the HapMap and to use in many future genetic studies as described in this form. The researchers will have to follow all U.S. and international laws and guidelines that apply to research. All studies using the cell lines from the Repository will have to be approved by the Institutional Review Board (IRB) of the Repository." (pg 2)
[As above, the discussed uses are all research in nature, with no mention of anything other than an IRB-approved research study.]
"The Repository does not let anyone sell material from samples or cell lines. However, information from genetics research sometimes helps companies make products to diagnose or treat diseases. If information from your family’s cell lines leads to making a product, it would probably contribute only in a very small way. Also, because the cell lines will not have names on them, neither the researchers nor anyone at the Repository would know if your samples were even used. So you will not get any additional payment for having your sample used in this project." (pg 2-3)
[This appears to me to be a fairly explicit ban on the direct sale of materials, including cell lines. What it does not prohibit is secondary commercial uses, for instance the commercialization of a diagnostic or therapeutic emerging out of a research laboratory where the underlying research was performed using the HapMap consented sample. I think that, per Lisa's email, "commercial research" is really only allowed in the sense that it is disclosed as an unavoidable byproduct of primary scientific research performed in non-commercial settings. I do think that is a distinction.
To help see the distinction more clearly, it might be useful to compare the HapMap's language to the explicit authorization for third party commercial use provided by the PGP's consent, Section 8.4, which reads in relevant part: "However, information and materials that you provide, including DNA sequence data and cell lines derived from your tissue samples or specimens, may be made available to third parties for research, patient care, commercial or other purposes, and these third parties may commercially profit from the data or other information that you contribute to the PGP."
On balance, I think the language in this section, as well as the focus elsewhere in the document on exclusively research uses, suggests that a participant's expectation would almost certainly be that his or her samples and resultant cell lines would not be made directly available for commercial research, even if they might incidentally or indirectly further commercial objectives.]
"How will you protect my privacy? We will protect your privacy carefully, just as we have always done in the past. The only people who will know your name or any other personal identifying information will be the clinic coordinator, the physician, and the principal investigator for the Utah Genetic Reference Project at the University of Utah. We will not give this information to anybody else. While the University of Utah will keep your new, signed consent form, nobody else will see it. The sample stored at the Repository and used for the HapMap will not have your name on it. Although it will have a code number, nobody except us will know the name of the person the code number is linked to. So nobody at the Repository or who studies your sample will know that it came from you." (3)
[In the section devoted specifically to privacy, no mention is made of the possibility that privacy might be compromised, either intentionally or accidentally, furthering the assessment that, despite the earlier and arguably insufficient disclaimer, the participant would be led to believe that there are no meaningful privacy or related concerns associated with participation.]
"What are the risks of having my sample used for this project? If your family’s samples are used, lots of genetic information from your samples will be put in the database, and lots of people will be able to look at it for any purpose. However, there are only a couple of ways anybody could trace the information back to you. One is if they thought your information might be in the database, got another sample from you, did many tests on that sample, and then compared the
genetic information from those tests with the information in the database. The other is if somebody compared the information in the database with genetic information known to be from you that was in another database and figured out who you were. The risk of either of these things happening is very small, but it may grow in the future."
[This section at least acknowledges that re-identification is possible, but asserts that its risk is exceedingly small. It also acknowledges that this risk profile may change. I understand that NIH's current policy is to permit research premised upon de-identification, even when there is a risk of de-identification. However, as mentioned above, additional consideration may need to be given to this issue if this or similar samples might be used in a manner that creates greater visibility and, potentially, risk of re-identification than was perhaps initially contemplated.]
=============================================
/HapMap Consent Provisions & DBV Commentary
=============================================
---------------------------------------------
On Aug 23, 2012, at 4:38 PM, Salit, Marc L. wrote:
Thanks Mark -
We'll hold off on celebrating for now -- we haven't put anything through our legal folks for approval yet. The HapMap reconsent looks reasonably appropriate; there is at least one concern I've heard upon a more careful read. We're not certain that we understand all the implications of the terms of withdrawal in that document (see attached page 4 "Can I change my mind…" and checkbox 2 on the consent signature page). We'll be looking at that, and the history of the original and re-consents for this genome, when we meet with our colleagues at NHGRI.
We hope to know more in the next week or so.
Best regards,
Marc
---------------------------------------------
On Aug 23, 2012, at 2:28 PM, Mark Depristo wrote:
Hi Marc and Justin,
Thank you for the update. Your conclusion is very reassuring.
Best,
Mark
---------------------------------------------
On Wed, Aug 22, 2012 at 12:05 PM, Salit, Marc L. wrote:
Hi Lisa --
Thanks for the information and the invite to chat with you and Jean -- we'll take you up on that. We've spoken with Linda Beth Schilling here at NIST, who's championing the IRB process for biologicals here at NIST to give her an update on the questions that arose at last week's meeting, and we're now comfortable with the HapMap consent for NA12878. which we'll plan to use as our pilot reference material.
At this point, the PGP samples are very attractive for the balance of the reference material portfolio, as there is a broad consent for commercial use, and primary tissues and IPSCs available for some of them.
We'd like to follow up with Jean and you to learn more of the history of the re-consenting of the HapMap samples from the CEPH/NIGMS collection (this mostly so we can answer questions at NIST), and to review any ELSI considerations you might see for our reference material project.
Best regards,
Marc and Justin
--
Marc Salit, Ph.D.
Leader, Multiplexed Biomolecular Science Group
NIST
Materials Measurement Laboratory
100 Bureau Drive, Stop 8313
Gaithersburg, MD 20899-8313
---------------------------------------------
On Aug 21, 2012, at 2:15 PM, Brooks, Lisa (NIH/NHGRI) [E] wrote:
The CEU consent form is at
http://hapmap.ncbi.nlm.nih.gov/downloads/elsi/CEPH_Reconsent_Form.pdf
All the CEU samples were reconsented specifically to be in the HapMap project and future projects, and to have their data released publicly on the web. The cell lines may be used for RNA and protein studies. Commercial research and forensic research are allowed.
The form says that „However, it will be very hard for anyone to learn anything about you personally from any of this research because none of the samples, the database, or the HapMap will include your name or any other information that could identify you or your family.‰
The possibility of accumulated genetic information being used eventually to identify people is very general for genomic research, but does not prevent use of samples, by NIH policy.
The one possible sticky point is „The Repository does not let anyone sell material from samples or cell lines.‰ This was meant to prevent secondary distribution by companies, not necessarily to prevent a government agency from distributing a standard.
Jean McEwen, the NHGRI ELSI expert who oversaw the HapMap consent process, is out this week but will return on July 27. We should talk with her when she returns.
Best regards, Lisa.
---------------------------------------------
From: Mark Depristo
Sent: Tuesday, August 21, 2012 12:45 PM
Subject: NA12878 consent?
Hi all,
At the Genomes in a Bottle meeting last week George Church (CC'd) suggested that NA12878 might not consented for commercial activities. This seems in direct opposition to her being included in HapMap and 1000 Genomes, as well as her cell lines being sold by Coriell. Is her cell line restricted to research purposes only? If so this is a great surprise to me. It's critical to resolve this issue as there's some discussion that NA12878 could not be used as a national reference standard genome because of this issue.
Best,
Mark
--
Mark A. DePristo, Ph.D.
Associate Director, Medical and Population Genetics Analysis
Broad Institute of MIT and Harvard

Table of possible genomes to be used as RMs and characteristics
To help us decide which genomes to select as RMs for the consortium, I've made a spreadsheet in Google Docs (https://docs.google.com/spreadsheet/ccc?key=0ArAo1qqJJDHQdEhiei04aDQ4b0Z...). Please add additional information to this, as well as any columns or rows we are missing. Thanks for your input!
What are the relative priorities for genomes?
Based on our discussions so far regarding selection of the first NIST Reference Materials, it looks like we have a few options with different advantages and disadvantages, so we are interested in your input. You can see table I posted above for characteristics of various potential genomes.
Option 1: Use a PGP genome that already has a cell line established, so we could get started with sequencing immediately with a genome that has a broad, open consent for full commercialization and re-identification. The most likely choices for this path would be GM21846/hu604D39 (an African-American male with CG data available) or PGP1/GM20431/hu43860C (a Caucasian with CG-LFR data and iPSCs already established). The major disadvantage of this option is that cell lines from parents are not available for either of these individuals. Future genomes would be selected from trios, but we need to decide if parents are needed for the first "trial" reference material.
Option 2: Use one of the PGP genomes that is already consented and part of a trio. There are a few Caucasian trios, one Asian trio, and a few trios of undeclared ethnicity. The major disadvantage of choosing one of these as the first NIST RM is that it could take a few months before cell lines are available for any of them and the consortium could get started with sequencing.
Option 3: Use NA12878 as the first NIST RM. The major advantage of this genome is the momentum and existing data and analyses that are available, allowing us to characterize the genome more quickly. The major disadvantage of this genome is the uncertainty regarding whether the consent for it is appropriate for a NIST RM (see discussion at the beginning of this forum). It would likely be a long process to move through the IRB and legal processes at NIST and the Utah repository, with significant uncertainty about whether it would be approved in the end.
We're very interested in your thoughts about these options or any other possible options, so please leave comments here or contact us directly.
HuRef
Looking at the google docs spreadsheet of the possible samples, looks to me that HuRef has many obvious advantages, except for the lack of parents. Data is also available now for piloting the methods. Can somebody then explain me why HuRef is not high on the list? (or actually non even in the 3 options above).
Regarding the other priorities, I also agree that NA12878 should be used as a minimum to pilot the methods. So far I have not seen anything convincing in this forum on why the consent is not appropriate. NGHRI seems to think so.
Finally, PGP samples may be very good indeed. But we are going to need someone to spend some money on Sanger sequencing of fosmid clones for one of this samples. I am not convinced that sequencing any of the samples solely with short-read technology will give us a good standard.
Francisco
Francisco M. De La Vega, D.Sc.
VP Genome Science
Real Time Genomics, Inc.
Options for NIST WGS RMs
I like option #2 or #3, or even better, a hybrid of the two. There is a critical need to get going fast, so I think we could
Option 4) develop a lot of tools and techniques on the NA12878 family in the next year, while we build up enough trios and larger families within PGP, and then aim to repeat the "official" RM with PGP genomes then.
Option 5) Can't we ask if we can re-consent with broader terms from the NA12878 family?
Support for NA12878
We've recently been hearing significant support from members of the consortium to use NA12878 at least as a pilot reference material if possible. Here's a recent email from Mark DePristo at the Broad Institute:
Hi all,
I'd just like to reiterate my concerns about not using NA12878. In particular there is so much existing supplementary data for her -- genotype, expression, etc. data, all of which will be absent for PGP samples. Using PGP samples will disconnect the NIST reference sample from the broader scientific community, which I think will significantly reduce the value of the NIST effort to standardize NGS analysis assessment.
Best,
Mark
George Church comment
Also note, extensive genotype and expression data for PGP samples (e.g. more extensive set of tissue types), and additional similar data sets can be quickly produced for any particular cell lots chosen. A "broader scientific community" is arguably also connected to PGP samples because of fewer restrictions on available cell types, re-identifiability and commercial use.
--George Church
Morris Foster comments regarding HapMap consent
We'd like to thank Morris Foster, who was co-chair of the Communication Group for the International Haplotype Map Project, for taking the time to comment on the HapMap consent for NA12878. He recently wrote us in support of the consent for NA12878 being appropriate for a NIST Reference Material. He gave permission to share his comments here.
Message from Morris:
There is a question about the use of HapMap & CEPH samples for the purpose
of providing a standard genome, and about NA 12878 in particular.
I can't see why the HapMap samples should be considered as ethically
deficient.
Here is why:
(1) As you know, the HapMap samples have been used in a variety of
genotyping and sequencing projects since HapMap. Most prominently, the
NIH-funded 1000 Genomes project used all available HapMap collections for
its genotyping and sequencing activities
(http://www.1000genomes.org/category/frequently-asked-questions/population)
.
(2) HapMap samples from various populations have been approved by Coriell
for whole genome sequencing and public data accessibility, perhaps most
prominently the Yoruba trio that Illumina sequenced in 2008. I'm unaware
of any subsequent ethical issues that have resulted from WGS of HapMap
samples. If the HapMap consent was approved for those sequences and public
data deposit, I'm not sure what may have changed subsequently to render
those consents deficient.
(3) The sample you note below, NA12878, is from the Utah CEPH collection.
Again, as I'm sure you know, that collection pre-existed the HapMap
project (in some cases by decades) and is a very good example of how
samples obtained with "old" consents have continued to be used for many
years after. Although some of the HapMap Utah CEPH samples were
re-consented for HapMap purposes, not all were (mostly because the donors
were deceased).
(4) HapMap data have been publicly available for nearly a decade now
(starting with initial public data releases as samples were genotyped)
without a single report of a privacy harm to any of the anonymous donors.
That would seem pretty good empirical evidence to me that any privacy
risks from future uses such as you outline below are vanishingly
infinitesimal.
(5) In contrast, I would have some concerns about use of a sample from a
self-identified individual (as is the case with the PGP samples) for a
standardized genome reference. In that regard, you may recall some of the
public reaction to the knowledge that Craig Venter's DNA was used as the
"reference" in Celera's human genome sequencing effort. Absent a
significant privacy risk, there are advantages to using an anonymous
sample or samples for a standardized reference.
(6) There may be some concern about "selling" a sample donated as part of
an altruistic research project as part of a standard genome kit. In that
case, I'd suggest providing the DNA sample at cost. Of course, Coriell
"sells" such DNA samples all the time, as do many hospitals and academic
medical centers -- and often to for-profit drug and health research
companies. Selling a DNA sample, even one collected for a non-profit
scientific protocol, is not that unusual.
(7) The advantage of using samples that already have been extensively
genomically characterized for purposes of a standard reference is
considerable compared with using a "naive" sample newly collected for the
purpose. While that scientific advantage should not in itself over-rule
any ethical concerns, it nonetheless should be taken into account when
weighed against what appear to be fairly unlikely ethical concerns (as
noted above).
Morris