########################################################
#  DDIG: Detecting DIsease-causing Genetic variations  #
#   due to indels, nonsense, and synonymous variants   #
########################################################

 DDIG predicts how likely it is for a genetic variant
 to be disease-causing. It uses a support vector machine
 (SVM) model trained on a dataset of putatively benign
 variants from the 1000 Genomes Project and
 disease-causing variants from the Human Gene Mutation
 Database (HGMD). Each variant is encoded with six to
 nine predictive features depending on the type of the
 genetic variation (NFS/FS indel, nonsense, synonymous).

 Please note that DDIG currently supports prediction
 of NFS/FS indels, nonsense, and synonymous variants.
 All missense and non-coding variants are ignored during
 prediction.


 DESCRIPTION OF THE OUTPUT FIELDS:
 #################################

 chrom. --> chromosome in which the variant is located (user input)

 position --> 1-based coordinate position of the variant on the given chromosome (user input)

 ref. allele --> reference genome allele located on the given chromosome in the given position (user input)

 alt. allele --> describing the variation, 'ref. allele' will be substituted by 'alt. allele'

 gene name --> gene overlapping the location of the variant

 protein sequence --> a protein product of the given gene so that the variant is located in an exon of its transcript
		  --> if there are several such candidate transcripts, only this one was used for prediction
		  --> this is based on the Consensus CoDing Sequence (CCDS) project reference transcript set

 protein position --> 1-based position of the variant's impact on the protein sequence

 ref. --> reference amino acid residue at the given protein position

 alt. --> describing the protein-level variation, 'ref.' will be substituted by 'alt.'

 prediction --> denotes if the variant was predicted as disease-causing or neutral based on a binarizing threshold

 prob. --> probability that the above prediction is true (thus, 'prob.' is never < 50%)
       --> 'neutral 90.0%' means 90.0% confident that the variant is neutral
       --> 'disease-causing 90.0%' means 90.0% confident that the variant is disease-causing

 raw score --> LibSVM output score when probability estimates are turned on
	   --> ranging (0,1) where 0 means certainly benign and 1 means certainly disease-causing
	   --> 'prob.' is derived from 'raw score' by taking:
		--> [100 * (1 - 'raw score')] for neutral predictions
		--> (100 * 'raw score') for disease-causing predictions

 DDIG model --> DDIG comprises three models: one for NFS indels, one for FS indels and nonsense variants,
                and one for synonymous variants
	    --> check out our papers for more details


 RELATED PUBLICATIONS:
 #####################

 NFS indels: https://doi.org/10.1186/gb-2013-14-3-r23
 FS indels and nonsense variants: https://doi.org/10.1093/bioinformatics/btu862
 Synonymous variants: https://doi.org/10.1002/humu.23283