Participate in a Follow-up Replication Study of Genetic Patterns in PANS/PANDAS

If you or your child have a PANS or PANDAS diagnosis from a licensed medical practitioner, and you have 23andMe genetic data for that person, you are invited to participate in a follow-up research study titled “Replication of a Genetic Association Among Patients with PANDAS or PANS”.

This study is being conducted by Bob Horvath, Michaela Holden and Sam Keating. We are "citizen scientists” with some qualifications in statistics and data manipulation, and direct experience (ourselves or family members) with PANS and other autoimmune or immunological conditions, as well as autism.

The purpose of this study is to replicate a specific variation in DNA that was found in a previous pilot study to occur more commonly among those with PANS or PANDAS (P/P) than the general population.  You can see details of that pilot study here: https://osf.io/pf7q2/.  In this replication study, we also hope to demonstrate statistical significance for up to 9 other genetic variations known as SNPs, and begin development of a predictive model of P/P based on genetics.  We will post a link here to the results.

Participation in this is entirely voluntary. You can choose to participate anonymously, or with your name attached to the data. There will be no effect on your relationship with the researchers, or any other negative consequences with not participating.

If you agree to participate, you will be asked to click on a link below and upload 23andMe data for one person.  You may also email your data (see below).  Note that for this study we cannot use additional data of close relations, and, if you participated in the previous pilot study, we also will not be able to make use of that same data, or that of close relations of those that participated in the pilot study.

The data will be collected regularly from the upload site, until a total of 70 valid data sets is reached. All data uploaded will be safely stored (with no direct identification of participants) on two computers only. Those received after the 70th data set will also be retained for possible later use. Even if you give your name, the data will be separated from and stripped of that name before being stored. After the initial upload and de-identification steps, no other person, website or online service will have access to your data with your identification attached to it. The de-identified data will be uploaded to GEDmatch in order to obtain ancestry, and to confirm no close relatedness to other participants.

For those that contribute anonymously, the only link between the principle researcher (Bob Horvath) and you will be a fake name and email address that you give at the upload site. You are free to withdraw from this study at any time. However, once you submit your data, the only way to withdraw anonymous data is if you contact Bob Horvath and reveal your fake name, so that it can be known which data is to be removed. This step could reveal your identity, but your data will be removed from the study.

Only Bob Horvath will have access to this full data. De-identified results of the SNPS of interest and analysis of the data will be made known at the Open Science Foundation website.

There are no known risks associated with this study, beyond any risk there may be associated with the original data existing (e.g. on the originating site, such as 23andMe). While you will not likely experience any immediate direct benefits from participation, information collected in this study may benefit you and others in the future by helping to determine genetic factors associated with P/P.

If you have any questions regarding the survey or this research project in general, please contact the principal investigator, Bob Horvath, at bobhorvath@alumni.uwaterloo.ca

By clicking on one of the links below to the upload site, or sending data to the email address above, you are indicating your consent to participate in this study. If you want to contribute anonymously, submit only a fake name and email address at one of the links below. If you use a fake name, make it unique (unidentifiable by others) and make a record of it, in case there is any need to try to contact you (via a comment to this poll in the online groups it is listed in).

To upload data for a person that has been diagnosed by a licensed medical practioner(s) with both PANS and PANDAS, click on this link:

To upload data for a person that has been diagnosed by a licensed medical practioner with PANDAS (but not PANS), click on this link:

To upload data for a person that has been diagnosed by a licensed medical practioner with PANS (but not PANDAS), click on this link:

I am going to post here paraphrased questions and answers from other private places where this post was also placed.

Q: I can't remember - Can you tell me if I participated in the original study?

A: ... I can't tell you right now because I don't have a list of the names on my computer where the de-identified data is. I've never had a computer hack or theft or break-in at my house, but just in case any of that happens, I kept the list of names identifying which data belonged to whom written on an (unhackable) piece of paper locked in a completely different place, and I am not there right now.  I will have the same security in place for this study, and will let you know if you participated later.

More Q & A from elsewhere:

Q: Can you use DNA date from another source ... or do you need the raw data? 
Because the frequencies of risk alleles can vary quite a bit by ethnicity, I use the full raw data to give an ethnicity report (at GEDmatch, where I load only de-identified data - this is mentioned in the post above that doubles as a "consent form"). I use the frequencies associated with 6 main ethnicities to effectively create a control group from dbSNP (a database of allele frequencies).  
So, sorry - I do use a good chunk of the raw data.

Q: Will participants get to see the final results of the research?

A: Most definitely! You can see most of the results from the original pilot study at a link quoted near the beginning of the post (which actually serves as a consent form). We are hiding the identity of the SNP of significance so that this replication study cannot be accused of collecting data from people that knew they had they risk allele for that SNP!  ... you (and everyone else) will be able to see results at an Open Science Framework website. It will be somewhat similar to the results shown for the pilot study.What I will do for any participant that asks, is point out (by PM) which column of data represents you or your child, so that you don't have to look up those 10 SNPs we are looking at in your raw data.

Q: Can you use AncestryDNA raw data?

A:  We did use ancestry for the pilot study, but it doesn't call 7 of the 10 SNPs I am looking at this time around. Because I am going to stop when I get 70 data sets, getting Ancestry data sets will mean less data for those 7 SNPs, which means weaker res
ults for them.  So, we won't be taking AncestryDNA data sets this time.

Q:  So are you looking at genetic vulnerability? What are the long term goals of your study and how might it affect kids with these conditions in the future?

A:  Yes. The “holy grail” of a genetic study would be an “if and only if” genetic variation as has been found for Cystic Fibrosis (CF), where if you have the genetic variation you have the disease, and if you don’t have the defective gene, you don't have the disease.
But autoimmune diseases don’t seem to be like that. Instead, dozens or hundreds or maybe even more than a thousand genetic variations seem to increase or decrease the odds of getting the disease/disorder/syndrome. Nobody (that I know of) has established any genetic variations associated with PANS, and for PANDAS, there have been (as far as I can find) 3 published studies (all from Turkey), that haven’t yet been replicated. Two of them didn’t replicate in our pilot study (that doesn’t prove either study is wrong; different ethnicities could have some different causal variants), and on the other I have concerns on the way the stats were done.  PANS and PANDAS is way behind other diseases and disorders in laying this genetic groundwork.
Once the genetic groundwork is laid, and replicated, experts in biological pathways need to go at it, to figure out what each genetic variation’s role is, if any, to understand which processes in the human body are affected. Here is an example of some of this kind of biological pathway discussion: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3654249/ . We don't have all processes and pathways in the human body mapped, so this step can't be timelined. I am not well informed around this part of the puzzle, but I can easily imagine it being years or decades before there are significant understandings established. Or, we might get lucky and stumble across something sooner rather than later. Then, the final phase would be treatment development, which could also be a long timeline. There is nothing innate about genetic studies that dictate what ultimate treatment could be. They don't have to beget pharmaceutical interventions, though our financial and academic structures do seem to be geared that way. But still, there is no reason a genetic understanding can't spawn a dietary or other non-pharmaceutical treatment solution.
These details tend to make the long road feel overwhelming, and there is still so much more to the overall story. But as an example, the discovery of the CF gene (in 1989) has over the past 30 years greatly extended the life and comfort of those with the disease (http://www.sickkids.ca/AboutSickKids/Newsroom/Past-News/2014/25-years-later-the-impact-of-the-cystic-fibrosis-gene-discovery.html). With PANS and PANDAS, we need to get going.
There is one other nearer-term possible benefit from this kind of study that I have stumbled across, an idea inspired by the very Turkish study whose math bothered me. It's the idea of a predictive model based on genetics: a long shot, and bound to be controversial (23andMe is "not to be used for diagnostic purposes"). This would ultimately require more data than I am getting here now, and likely more SNPs of significance than I can show in this study (which is 10, if luck is in the wind), in order to work even half-decently. But I have to at least poke at it.
For those that may not have 23andMe raw data on their own computer, here are some instructions for re-downloading from 23andMe:
1. Sign into your account at www.23andMe.com (the sign-in button is near the top right).
2. On the right hand side click on the arrow beside your name and scroll down to "browse raw data"
3. Click on "browse raw data" and on the page that opens just under 'Your Raw Data' there is a note "You can view or download your data at anytime". Click on the word download which is highlighted in blue. At the bottom of the page that appears (read, and scroll down), click the "I understand" box, and then the blue 'Submit Request' button.
4. Once 23andMe have retrieved your data, you will receive an email that the data is ready to be downloaded. It might be a while.
5. Following the instructions in that email, you can then save the downloaded data to your computer - remember where you save it to.
7. Then return here, and click on one of the links at the bottom of the starting post to this thread.
8. That link takes you to Dropbox, which requests your data (pick the left button), and you can then browse to your downloaded data file on your computer to select it.
Another paraphrased Q&A from one of the 10 or so FB groups this is also posted in:

Q: I'm happy there is research being done but ppl need to be careful where u r sending your child's identity and genetics! ... This doesn't sound safe and not sure results would be publishable.

A: The publishing is less formal, in a different arena than most researchers publish, which is at Open Science Framework.

If you don’t know me, you can always upload anonymously. ...
 If data is completely anonymous (the first thing I do is anonymize it anyway), I can’t fathom any danger. 

I personally don’t trust the corporations.  When I got my sons 23 and me data, I put in a fake name, and didn’t answer any health questions. That’s not exactly per their policy, but that way some insurance company (or anyone) in the future can never know my sons data. I have similar misgivings about the various websites that let you upload data, like genetic genie and nutrahacker. What do they do with your data, and your name, and health questions that you might answer? Doing it anonymously for people and organizations that you don’t know and trust -that is the ultimate protection if you are concerned.

13 hours ago, Cristo-Krista said:

When paying for 23 and me how did you do that ? And genetic genie ? I worry about that too.  

Oh - I did use my credit card, so they have my name - but it wasn't my data.   So, I need to admit that I was not completely hidden from them.  it would have been slightly smarter to use my wife's (she kept her different last name), but that wouldn't have been totally bullet-proof either.  I wonder if a paypal account can be temporarily setup.

I think not answering health questions is important.

I used a fake name for genetic genie, and it was free.

The company just want to learn from people’s data.  I get it.  Maybe cures can come out of it.  That’s good.  But there is another side too. 

 People just want to know about their lives. Some want to find people genetically linked to them. That’s a whole other deal.  That is good if that’s what they want.  But not everyone does.  

Some just want to know about their genetics but want to just know themselves. 

On ‎8‎/‎11‎/‎2019 at 4:42 PM, Cristo-Krista said:

Was genetic genie easy to use 

It has been a while since I used it, and I don't fully remember.  I don't recall struggling, though.  I also used Promethease, and that was more difficult, but with much more results.  I recently had recommended to me SelfDecode (www.selfdecode.com), but I haven't used it.

More Q&A from elsewhere:

Some of you have looked at or downloaded the report for the previous pilot study at https://osf.io/pf7q2/ (there have been 174 downloads to date).  Recently, I did an important update to that report.

It was unfortunately a disappointing update to do
, because it was to include statements (and corresponding analysis) about a Turkish genetic association study (the one presented at the Common Threads conference last year) that didn't replicate amongst our kids.

It could be the case that genetics among Turkish P/P kids are not the same as among the mostly European kids of our study, so this doesn't mean that one or the other study is wrong. But even that scenario is disappointing, if P/P genetics is that complex.

This lack of replication also does not mean that mannose-binding lectin (the gene implicated in that Turkish study) is not important among our P/P kids - there are very very many other SNPs in the MBL2 gene; this was just one.

please forgive me ... but can you clarify your comments in more simplified terms?

A: Sure, and I can go just a little further, too.

Someone else (in Turkey) says that there is a genetic variation they showed was related to PANDAS. Anyone with genetic data from lots of P/P kids can check that. We have some (collected back in January)
, but it doesn't show that genetic variation to be associated with PANS and PANDAS.

This doesn't mean their study is wrong, or ours is wrong. It could be that the genetic variation they found was associated with PANDAS in Turkish kids, but not the (mostly) European kids we had in our data.

This is disappointing. But we still do also have a result from our pilot study, that we are trying to show is true with a 2nd independent batch of data - that is one of the goals of this post requesting participation. It is a good idea to check a result by replicating (doing it again).


