Scripps Genome ADVISER | Annotation and Distributed Variant Interpretation Server

Result Description

SG-ADVISER will send the user an email once results are available. Please note variant files submitted to the SG-ADVISER server will be destroyed after successful completion of variant annotation. Annotation files will be destroyed 30 days after their generation.

Annotations are provided as a tab-delimited flat file. The output file can be manipulated with our UI tool. Download the UI.

Variants are annotated at the transcript level and presented as a single line per variant - thus any column containing annotations relevant to multiple transcripts will be further subdivided by triple back slashes ("///"). When an annotation is not applicable to a variant or transcript, the null value is represented by a "-" character - often in the format of the column. For example, a column where entries are formatted as "Value1~Value2", if null, will receive a value of "-~-".

Example of a annotated file can be found here.

The specific source and format of each column is presented below. The full citation for references listed in this section can be found in References


Columns 1-5
Columns 1 - 5 are user supplied or derived from alternate input formats. Each column (except free-form columns) contains a single value.


Chromosome - Chromosome containing the variant in "chr#" format where # is 1 - 22 or X or Y.
Begin - Physical start position of the variant. 0-based coordinates. Coordinates correspond to hg19.
End - Physical end position of the variant. 0-based coordinates. Coordinates correspond to hg19.
VarType - Variant type ('loss' and 'gain').
Notes - Any free text to be carried over to the annotation file. Examples include genotypes, quality scores, etc.


Columns 6-8

Gene - The transcript(s) nearest to the variant by physical distance. Gene models are derived from the UCSC genome browser known genes track (Meyer et al. 2012). HUGO gene symbol is provided for each gene followed by the UCSC transcript ID in parenthesis. Format: Gene_Symbol(UCSC_transcript_ID)
Gene_Type - The transcript type. Possible values are "Protein_Coding" or "Noncoding_RNA."
Location - Location of the variant relative to the nearest transcript(s). Exons and introns are numbered in the direction of the reading frame. Multiple nucleotide subsitutions may span multiple locations (e.g. Exon_6-Intron_6). Potential values are "Upstream", "Downstream", "5UTR" (5' untranslated region), "3UTR" (3' untranslated region), Exon_# (where # is the coding exon number), Intron_# (includes introns flanking coding and non-coding exons), and noncoding_rna for variants landing in non-intronic noncoding RNA sites.


Columns 9 - 11

Distance - Absolute physical distance from nearest transcript. Shortest distance from transcription start or stop site is calculated. All variants within the transcription start and end site receive a value of "0".
Coding_Impact - The impact of a variant on a protein coding transcript(s). Multiple transcripts are delimited by "///". It is possible for transcripts with the same gene symbol to receive different values. Potential values include:

Protein_Pos - The position(s) of the variant within the amino acid sequence of a protein coding gene.


Columns 12-16


Prop_Cons_Affected_Inside - the fraction of the conserved portion of the protein coding sequence affected by the CNV.
Prop_Cons_Affected_Outside - the minimum of:

Exonic_Bases_Inside_CNV - the fraction of the conserved portion of the protein coding sequence affected by the CNV.
Exonic_Bases_Outside_CNV - the minimum of:



Columns 17-20

DGV_Gain_Overlap - Overlap of the CNV with previously identified gains in the Database of Genomic Variants . Format identifier ~ %_of_reference_region_contained_in_overlap ~ %_of_CNV_contained_in_overlap ~ allele_frequency.
DGV_Loss_Overlap - Overlap of the CNV with previously identified losses in the Database of Genomic Variants . Format identifier ~ %_of_reference_region_contained_in_overlap ~ %_of_CNV_contained_in_overlap ~ allele_frequency.
Wellderly_Gain_Overlap - Overlap of the CNV with previously identified gains in the Scripps Wellderly Genomes. Format identifier ~ %_of_reference_region_contained_in_overlap ~ %_of_CNV_contained_in_overlap ~ allele_frequency.
Wellderly_Loss_Overlap - Overlap of the CNV with previously identified losses in the Scripps Wellderly Genomes. Format identifier ~ %_of_reference_region_contained_in_overlap ~ %_of_CNV_contained_in_overlap ~ allele_frequency.


Columns 21 - 25

Known_Gain_Syndrome - Overlap of CNV with region known to result in pathogenic syndrome due to chromosomal gain. Format syndrome_name ~ %_of_reference_region_contained_in_overlap ~ %_of_CNV_contained_in_overlap ~ allele_frequency.
Known_Loss_Syndrome - Overlap of CNV with region known to result in pathogenic syndrome due to chromosomal loss. Format syndrome_name ~ %_of_reference_region_contained_in_overlap ~ %_of_CNV_contained_in_overlap ~ allele_frequency.
ClinVar_Gain - Overlap of CNV with gain region annotated in Clinvar. Format Disease_Name ~ %_of_reference_region_contained_in_overlap ~ %_of_CNV_contained_in_overlap ~ Pathogenicity ~ Evidence ~ Accession.
ClinVar_Loss - Overlap of CNV with loss region annotated in Clinvar. Format Disease_Name ~ %_of_reference_region_contained_in_overlap ~ %_of_CNV_contained_in_overlap ~ Pathogenicity ~ Evidence ~ Accession.


Columns 26 - 29

1000genomes_Gain - Overlap of CNV with gain region annotated in 1000 genomes. Format ID ~ percOverlapIn1000genomes ~ percOverlapInAnnotatedCNV ~ Aggregate_AF ~ AFR_AF ~ AMR_AF ~ ASN_AF ~ EUR_AF.
1000genomes_Loss - Overlap of CNV with loss region annotated in 1000 genomes. Format ID ~ percOverlapIn1000genomes ~ percOverlapInAnnotatedCNV ~ Aggregate_AF ~ AFR_AF ~ AMR_AF ~ ASN_AF ~ EUR_AF.
miRNA_genomic - List of microRNAs whose non-coding pre-miRNA reading frame within the genome houses the variant. Multiple microRNAs are separated by '///'. Note the different microRNAs listed here have no assumed relationship with the nearest gene nor does the order of presentation have any bearing on the order of presentation of the nearest transcript.
omimGene_ID~omimGene_association - OMIM gene ID and associated phenotype if any (McKusick 1998). Presented as OMIM_ID~Phenotype. Transcripts delimited by "///".


Columns 30-33

Protein_Domain_Gene_Ontology - Gene Ontology (Ashburner et al. 2000) annotations assigned to Protein_Domains detected by InterProScan. Transcript specific - i.e. all gene ontology annotations per protein domain in each individual transcript are combined.
HGMD_Gene~disease_association - List of phenotype associated in HGMD with the gene(s) nearest to the variant. Different phenotypes for the same transcript delimited by "$", transcripts delimited by "///".
COSMIC_Gene~NumSamples - Number of times the gene(s) impacted by the variant have been observed mutated in cancer samples. Transcript specific. Format: cancer_type~number_of_observations. Multiple tumor types separated by "$". Transcripts separated by "///".
MSKCC_CancerGenes - Determination as to whether the impacted gene(s) are considered cancer genes as catalogued by the Memorial Sloan Kettering Cancer Center (Higgins et al 2007). Potential values: Tumor Suppressor or Oncogene.


Columns 34-37

Atlas_Oncology - Determination as to whether the impacted gene(s) are considered cancer genes as catalogued by Atlas Oncology .
Sanger_Germline_CancerGenes - Determination as to whether the impacted gene(s) are considered germline cancer genes as catalogued by the Sanger Cancer Gene Census. Format: cancer type name. Multiple cancer types separated by "$".
Sanger_network-informed_CancerGenes~Pval - Significant cancer genes imputed by network connectivity to known cancer genes. Manuscript under preparation. Format: gene_name~p-value.
Sanger_Tumor_Type(Somatic) - Sanger Amplifications or Large Deletions tumor type .


Columns 38-41

Sanger_Cancer_Syndrome - Sanger Large Deletions cancer syndromes.
Mitelman_Database - Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer. Format: gene_name~Pubmed ID(s).
DrugBank - DrugBank ID (Wishart et al. 2006) of compounds known to target the impacted gene(s).
Reactome_Pathway - Biological pathway to which the impacted gene(s) belong to as annotated by Reactome (Joshi-Tope et al. 2005). Multiple pathways separated by "~". Transcripts separated by "///".


Columns 42-45

Gene_Onotology - Gene Ontology (Ashburner et al. 2000) annotations for impacted gene(s). Multiple terms separated by "$". Transcripts separated by "///".
Disease_Ontology - Disease Ontology (Osborne et al. 2009) annotations for impacted gene(s). Multiple terms separated by "$". Transcripts separated by "///".
ADVISER_Score~Disease_Entry~Explanation - Modified American College of Medical Genetics summary categorization for variants residing in genes previously causally associated with disease. See ADVISER Scoring for details.
Warning - If a CNV is classified as Class 1, yet there is no coding impact (all coding impact entries are "-") then warning column should state "Prior knowledge conflicts with prediction". Otherwise warning column is "-".

Scripps Genome ADVISER | Annotation and Distributed Variant Interpretation Server