General introduction

What is dbSAP?

The Single Amino-acid Polymorphism Database (dbSAP) is a free public archive for human protein variations. dbSAP mainly contains three types of protein variations caused by: i) single nucleotide polymorphisms (SNPs), ii) mutations, and iii) post-translational modifications (PTM).
A mutation is the permanent alteration of the nucleotide sequence in a genome or extrachromosomal DNA or other genetic elements
A single amino acid polymorphism (abbreviated to SAP) is a variation of a single amino acid that occurs at a specific position in a protein resulted from a non-synonymous nucleotide substitution in coding region of a gene, where each variation is present to some appreciable degree within a population (e.g. > 1%).
Post-translational modification (PTM) refers to the covalent and generally enzymatic modification of proteins during or after protein biosynthesis.
In proteomics, the mass shift resulted from SAP can be very similar to that of protein post-translational modification (PTM) for a peptide, because they all can be detected by the mass difference between unaltered (unmodified) and altered (modified) forms.

Why did we build dbSAP?

Variations at genomic level in human, such as single nucleotide polymorphisms (SNPs) and mutations, have been proven to have strong correlations with various diseases. Millions of genetic variations have been identified in human using high-throughput sequencing technologies. Those variations that located in coding regions have the potential to affect the translated amino-acids, which may result in the variations at amino acid level called single amino-acid polymorphisms (SAPs). Although some studies have tried to identify human SAPs, to our best knowledge, only a small number of SAPs have been detected. In order to further explore the human variations at protein level, we developed a workflow and detected 16,854 unique variant peptides based on a large amount of proteomic mass spectrometry data (11,865 experiments).

What's new in dbSAP?

Here we first built a synthetical human variation database, the variation data was collected from eight different public databases, including NCBI dbSNP database, Ensembl variation database, Catalogue Of Somatic Mutations In Cancer, protein mutant database (PMD), human protein mutant database (HPMD), UniProt variation database, MSIPI database and MS-CanProVar database. We then constructed a workflow to identify variant peptides and associated proteins based on a large amount of proteomic mass spectrometry data (11,865 experiments) collected from public databases. After a series of strict quality control steps (global FDR < = 0.01, group FDR < = 0.01), we identified 16,854 unique variant peptides supported by 439,537 unique spectra. dbSAP integrated multiple level information and corresponding evidences as a new landscape of human proteome to facilitate related researches regarding protein function and cancer pathogenesis.

Do we intend to commercialize the database?

No, we do not have any intentions to profit from dbSAP. We aim to provide a free online search platform for facilitating human protein variation identification and cancer research.

What are the meanings of variations, variation proteins, SAPs and?

Variations: the SNPs or mutations occurred in the coding regions that can cause non-synonymous substitutions of corresponding amino acid sequence.
Variation proteins: the proteins that harbor non-synonymous variations (SNPs or mutations).
SAPs: single amino-acid polymorphisms, are the amino-acid changes caused by the non-synonymous variations (SNPs or mutations).

Usage

What information can you find in dbSAP?

(1) Novel SAP
(2) Variation position of the related SAP
(3) Related SNP evidence for the SAP
(4) Peptide variation sequence
(5) Spectra related to the corresponding SAP

How to query information in dbSAP?

There are four different ways to query the data information in current version of dbSAP.
1. Search Gene: users can query a single gene SNP information with gene symbol.
2. Search SNP: users can query a Single Nucleotide Polymorphism and its related Single Amino Acid Polymorphism at protein level using its location ID.
3. Search Peptide: users can query a peptide with the corresponding amino acid sequence.
4. Search Protein: users can query all variations of a protein using the corresponding protein name of Ensembl or Swiss-Prot databases.

What are included in returned page after main query?

After user submits query item in the Query Page, four types of information will be returned in general, including related gene, SNP, peptide and protein.

How often do we update dbSAP?

We are planning to update dbSAP annually after we collect enough new mass spectrometry data, novel SNPs and mutations and conduct the related analyses.