######################################################## # DDIG: Detecting DIsease-causing Genetic variations # # due to indels, nonsense, and synonymous variants # ######################################################## DDIG predicts how likely it is for a genetic variant to be disease-causing. It uses a support vector machine (SVM) model trained on a dataset of putatively benign variants from the 1000 Genomes Project and disease-causing variants from the Human Gene Mutation Database (HGMD). Each variant is encoded with six to nine predictive features depending on the type of the genetic variation (NFS/FS indel, nonsense, synonymous). Please note that DDIG currently supports prediction of NFS/FS indels, nonsense, and synonymous variants. All missense and non-coding variants are ignored during prediction. DESCRIPTION OF THE OUTPUT FIELDS: ################################# chrom. --> chromosome in which the variant is located (user input) position --> 1-based coordinate position of the variant on the given chromosome (user input) ref. allele --> reference genome allele located on the given chromosome in the given position (user input) alt. allele --> describing the variation, 'ref. allele' will be substituted by 'alt. allele' gene name --> gene overlapping the location of the variant protein sequence --> a protein product of the given gene so that the variant is located in an exon of its transcript --> if there are several such candidate transcripts, only this one was used for prediction --> this is based on the Consensus CoDing Sequence (CCDS) project reference transcript set protein position --> 1-based position of the variant's impact on the protein sequence ref. --> reference amino acid residue at the given protein position alt. --> describing the protein-level variation, 'ref.' will be substituted by 'alt.' prediction --> denotes if the variant was predicted as disease-causing or neutral based on a binarizing threshold prob. --> probability that the above prediction is true (thus, 'prob.' is never < 50%) --> 'neutral 90.0%' means 90.0% confident that the variant is neutral --> 'disease-causing 90.0%' means 90.0% confident that the variant is disease-causing raw score --> LibSVM output score when probability estimates are turned on --> ranging (0,1) where 0 means certainly benign and 1 means certainly disease-causing --> 'prob.' is derived from 'raw score' by taking: --> [100 * (1 - 'raw score')] for neutral predictions --> (100 * 'raw score') for disease-causing predictions DDIG model --> DDIG comprises three models: one for NFS indels, one for FS indels and nonsense variants, and one for synonymous variants --> check out our papers for more details RELATED PUBLICATIONS: ##################### NFS indels: https://doi.org/10.1186/gb-2013-14-3-r23 FS indels and nonsense variants: https://doi.org/10.1093/bioinformatics/btu862 Synonymous variants: https://doi.org/10.1002/humu.23283