Introduction

I. Data summary
II. Data collection

1. Literature search
2. Inclusion criteria
3. Data extraction

III. Data analysis

1. Annotation

1.1 Linkage disequilibrium analysis and functional annotation for SNP
1.2 Annotation for gene

2. Analysis for significant genes

2.1 Pathway enrichment analysis
2.2 Protein-protein interaction analysis

IV. Data update

1. Data statistics for V2018q1 (till Jan 1 2018)

1.1 Update information
1.2 Pathway enrichment analysis for significant genes
1.3 Protein-protein interaction analysis for significant genes

I. Data summary

Through full-text reading of 127 papers and carefully data extraction, PTSDgene provides multitype genetic factors. As of Jan. 1, 2018, there is neither copy number variation (CNV) nor chromosomal region reported for PTSD. All statistical information of PTSD related genetic factors are in Table 1.

Table 1 PTSDgene data content and statistics as of 01 Jan. 2018

II. Data collection

1 Literature search

To obtain the literature-origin PTSD genetic factors, a comprehensive search of PTSD genetic publications in PubMed, PsychINFO and PsychArticles were made by using the following search terms:

For PubMed:

((("posttraumatic stress disorder"[Title/Abstract] OR "post-traumatic stress disorder"[Title/Abstract] OR PTSD[Title/Abstract] OR "delayed psychogenic reaction"[Title/Abstract]) AND (polymorphism[Title/Abstract] OR SNP[Title/Abstract] OR haplotype[Title/Abstract] OR interaction [Title/Abstract] OR variant[Title/Abstract] OR variation[Title/Abstract] OR mutation[Title/Abstract] OR CNV[Title/Abstract] OR "copy number variation"[Title/Abstract] OR repeats[Title/Abstract] OR deletion[Title/Abstract] OR duplication[Title/Abstract] OR (gene[Title/Abstract] OR locus[Title/Abstract] OR loci[Title/Abstract] OR chromosome[Title/Abstract] OR genetic[Title/Abstract] OR genome[Title/Abstract] OR genomic[Title/Abstract])) AND (linkage[Title/Abstract] OR associat*[Title/Abstract] OR meta-analysis[Title/Abstract] OR gene x environment[Title/Abstract] OR gene-environment[Title/Abstract])))
For PsychINFO and PsychArticles: (Any Field ("posttraumatic stress disorder" OR "post-traumatic stress disorder" OR PTSD OR "delayed psychogenic reaction") AND Any Field : ("polymorphism" OR "SNP" OR "haplotype" OR "interaction" OR "variant" OR "variation" OR "mutation" OR "CNV" OR "copy number variation" OR " repeats" OR " deletion" OR " duplication" OR "gene" OR "locus" OR "loci" OR "chromosome" OR "genetic" OR "genome" OR "genomic") AND Any Field : ("linkage" OR "associat*" OR "meta-analysis" OR "gene x environment" OR "gene-environment") AND Language: English AND Document Type: Journal Article) It resulted in 1762 English publications as of March 15, 2016.

2. Inclusion criteria

Abstracts of these publications were manually screened on the basis of inclusion criteria: 1) written in English language; 2) genetic studies (such as association, linkage, meta-analysis and gene-environment interaction studies) conducted in human patients/controls or families to identify association of genetic markers (gene, SNP, variants etc.) with PTSD phenotype; 3) phenotype (e.g. current PTSD, lifetime PTSD, PTSD symptoms, PTSD severity, PTSD diagnosis) should be unambiguously assessed for PTSD with specific scales by trained interviewers or expert psychologists. After filtering, 105 articles were retained and included in PTSDgene.

3. Data extraction

The full text of each eligible article was read carefully, and detailed information of each genetic factor in the study was extracted manually, including statistical values (P-value, odds ratio, etc.), author comments and phenotype. When gene and environment interaction were contained, we also extracted the corresponding environment variable. In addition to SNP, other loci such as VNTR, microsatellite, haplotype were also included. In order to better interpret the results, study design, population, sample size, age and gender were also displayed. Taking into account of the unique attributes of PTSD, we specifically extracted and presented traumatic event types, diagnostic criteria, assessment scales, controls whether exposure to traumatic events and other information for each article. To illustrate the association between genetic candidates and PTSD, all statistical results for genetic marker under certain phenotype were categorized into "Significant", "Non-significant" and "Trend" according to the criteria described in our previous studies[1, 2]: 1) For candidate-gene association studies, the result with p < .05 was defined as "Significant". 2) For GWAS, p < 1 × 10^-8 indicates a "Significant" result, p > 1 × 10^-5 indicates a "Non-significant" result, and a value falling between these thresholds represented a "Trend" result. If other statistical values were used, the criteria were referred to the statistical method in original papers. The category for the association result of gene under certain phenotype was identified as “Significant” only if one of its related markers was identified as “Significant”.

III. Data analysis

1. Annotation
1.1 Linkage disequilibrium analysis and functional annotation for SNP

The LD data used in the LD analysis were downloaded from MaCH web site[3]. The data was calculated by using Haploxt with windowsize = 3000 for 1000 Genomes Project Integrated Phase 1 Release ASN population data[4]. SNPs in LD (r2 > 0.8) with published SNPs were defined as LD-proxies. The population used in the LD analysis is consistent with original studies. For the SNPs involving multiple races in the articles, LD- proxies for all populations are included. The SNPs with uncertain race information were not LD extended. All SNPs and their LD-proxies were annotated by three ways to explore their function: (1) variant effect on transcript from Ensembl[5], predicted by SIFT[6] and PolyPhen[7]; (2) if the SNP is regulatory SNP, and SNP related chromatin state, regulatory elements and target genes from rVarBase[8]. The chromatin state was from RoadMap project[9], regulatory elements were mainly from ENCODE project[10]. (3) SNP related eQTL to denote if the SNP regulates the expression of some gene under certain condition. The eQTL data were downloaded from several eQTL databases, including GTEx Portal[11], SCAN[12], seeQTL[13], eQTL Browser, skin eQTL database[14], BRAINEAC[15], RTeQTL database[16], and the data from a eQTL study of peripheral venous blood in twins[17].

1.2 Annotation for gene

To ensure a comprehensive collection of candidate genes for PTSD, published SNPs, which were not mapped to genes in the original publications, and LD-proxies from LD analysis were mapped to genes according to their chromosomal locations. All genes acquired by the above mappings, as well as target genes annotated by rSNP annotation, were regarded as the extended candidate genes for PTSD. Gene position information was from Ensembl with version GRCh38 and other attributes were from HUGO Gene Nomenciature Committee. Gene related GO terms[18] and KEGG pathways[19] were annotated for each gene (including extended gene). Gene related interactions were annotated by using the data in PINA v2[20], in which, interaction type and experiment method to detect the interaction were also annotated.

2. Analysis for signficant genes
2.1 Pathway enrichment analysis

To interpret the function of PTSD genes with at least one reported significant result, pathway enrichment analysis was implemented by inputting the significant gene lists into DAVID 6.7 to highlight the most relevant pathways associated with the genes[21, 22]. Pathways with FDR < 0.01 and number of genes < 350 were kept.

2.1 Protein-protein interaction analysis

Numerous evidences have showed that, disease risk genes tend to function together through a protein-protein interaction (PPI) network[23, 24]. To further investigate the relationship between variations and disease risk, we performed a PPI analysis for the PTSD significant susceptibility genes list from published studies. The PPI data from PINA v2 were used to extract the genes that interacted with published identified significant gene. For the clear view of the figure, we removed those nodes which only connected with one published identified significant gene.

VI. Data update

During the database update process, fewer new documents were added in each quarter, so we will update the database every six months or one year.

1. Data statistics for V2018q1 (till Jan 1 2018)
1.1 Update information

V1 Stage (2016/03/15-2018/01/01): Records identified from PubMed, PsycINOF and PsycARTICLES are 339 in total. There are 194 records left after removing duplicates.

1.2 Pathway enrichment analysis for significant gene

Pathway enrichment analysis was reanalyzed using DAVID 6.8, and the results have been re-uploaded to the database.

1.3 Protein-protein interaction analysis for significant gene

Protein-protein interaction analysis was analyzed using GeneMANIA (http://www.genemania.org) online, which is a flexible user-friendly web interface for generating hypotheses about gene function, analyzing gene lists and prioritizing genes for functional assays. The results have been re-uploaded to the database.

Reference:
1. Zhang, L., et al., ADHDgene: a genetic database for attention deficit hyperactivity disorder. Nucleic Acids Res, 2012. 40(Database issue): p. D1003-9.
2. Chang, S.H., et al., BDgene: a genetic database for bipolar disorder and its overlap with schizophrenia and major depressive disorder. Biol Psychiatry, 2013. 74(10): p. 727-33.
3. Li, Y., et al., Genotype imputation. Annu Rev Genomics Hum Genet, 2009. 10: p. 387-406.
4. Abecasis, G.R., et al., An integrated map of genetic variation from 1,092 human genomes. Nature, 2012. 491(7422): p. 56-65.
5. Kersey, P.J., et al., Ensembl Genomes 2013: scaling up access to genome-wide data. Nucleic Acids Res, 2014. 42(1): p. D546-52.
6. Kumar, P., S. Henikoff, and P.C. Ng, Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc, 2009. 4(7): p. 1073-81.
7. Ivan A Adzhubei, S.S., Leonid Peshkin , Vasily E Ramensky , Anna Gerasimova , Peer Bork , Alexey S Kondrashov, Shamil R Sunyaev, A method and server for predicting damaging missense mutations. Nature Methods, 2010. 7: p. 248–249.
8. Guo, L., et al., rVarBase: an updated database for regulatory features of human variants. Nucleic Acids Res, 2016. 44(D1): p. D888-93.
9. Roadmap Epigenomics, C., et al., Integrative analysis of 111 reference human epigenomes. Nature, 2015. 518(7539): p. 317-30.
10. Psych, E.C., et al., The PsychENCODE project. Nat Neurosci, 2015. 18(12): p. 1707-12.
11. Consortium, G.T., Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science, 2015. 348(6235): p. 648-60.
12. Gamazon, E.R., et al., SCAN: SNP and copy number annotation. Bioinformatics, 2010. 26(2): p. 259-62.
13. Xia, K., et al., seeQTL: a searchable database for human eQTLs. Bioinformatics, 2012. 28(3): p. 451-2.
14. Ding, J., et al., Gene expression in skin and lymphoblastoid cells: Refined statistical method reveals extensive overlap in cis-eQTL signals. Am J Hum Genet, 2010. 87(6): p. 779-89.
15. Ramasamy, A., et al., Genetic variability in the regulation of gene expression in ten regions of the human brain. Nat Neurosci, 2014. 17(10): p. 1418-28.
16. Liang, L., et al., A cross-platform analysis of 14,177 expression quantitative trait loci derived from lymphoblastoid cell lines. Genome Res, 2013. 23(4): p. 716-26.
17. Wright, F.A., et al., Heritability and genomics of gene expression in peripheral blood. Nat Genet, 2014. 46(5): p. 430-7.
18. Ashburner, M., et al., Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 2000. 25(1): p. 25-9.
19. Kanehisa, M., et al., From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res, 2006. 34(Database issue): p. D354-7.
20. Cowley, M.J., et al., PINA v2.0: mining interactome modules. Nucleic Acids Res, 2012. 40(Database issue): p. D862-5.
21. Huang da, W., B.T. Sherman, and R.A. Lempicki, Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc, 2009. 4(1): p. 44-57.
22. Huang da, W., B.T. Sherman, and R.A. Lempicki, Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res, 2009. 37(1): p. 1-13.
23. Oti, M. and H.G. Brunner, The modular nature of genetic diseases. Clin Genet, 2007. 71(1): p. 1-11.
24. Oti, M., et al., Predicting disease genes using protein-protein interactions. J Med Genet, 2006. 43(8): p. 691-8.