Jump to content

ValidatorDB

From WebChemistry Wiki
Revision as of 13:19, 19 June 2014 by Deepti (talk | contribs)

ValidatorDB contains precomputed validation results for ligands and residues in the Protein Data Bank. The database is updated on a weekly basis.

The residues deemed relevant for validation are all ligands and residues with reasonable size (more than six heavy atoms), with the exception of standard amino acids and nucleotides. The validation is performed using MotiveValidator, and the residue models from wwPDB Chemical Component Dictionary (wwPDB CCD) are used as reference structures for validation.

Availability and technical details

Where to find ValidatorDB

ValidatorDB is freely available via the internet since May 2014 at http://ncbr.muni.cz/MotiveValidatorDB. There is no login requirement for accessing ValidatorDB

What you need in order to access ValidatorDB

ValidatorDB is basically a database, or rather a collection of validation results for ligands and residues in Protein Data Bank. The database is maintained on the ncbr.chemi.muni.cz server at the National Centre for Biomolecular Research within Masaryk University, Czech Republic, and updated weekly. All you need in order to access ValidatorDB is an internet browser that is up to date and has JavaScript enabled, and a working internet connection. The only functionality that relies on your system is the display of 3D models, for which your browser will need to support WebGL. If you experience trouble displaying the 3D models, please check http://get.webgl.org in order to find out how to enable WebGL on your system.

How to get around the web page

As soon as you type in the address http://ncbr.muni.cz/ ValidatorDB, you will reach the ValidatorDB synopsis page, which contains a brief, general description of ValidatorDB , along with 3 tabs (Figure 1A). The different tabs on the ValidatorDB synopsis page provide access to overviews and statistical evaluation of validation results for the entire PDB, for each residue across all PDBIDs containing that residue, and for all analyzed residues in each PDB ID, in graphical or tabular form. Click on each tab to discover what type of overview can be accessed. Further, the ValidatorDB specifics page(Figure 1B), which is accessible by looking up specific residues or PDB IDs in the synopsis page , allows to view the results for selected residues in more detail. The specifics page is also organized into tabs that allow different levels of analysis of the results. Last but not least, remember to check the tool tips by hovering the mouse cursor over any graphical or textual element in the ValidatorDB interface.


Before moving on to more extensive descriptions of features, it is important to clearly establish the meaning of a few key terms and principles within the ValidatorDB environment. See Terminology


Basic Principles

Residues and ligands relevant for validation

As mentioned in section 1, well studied residues like amino acids and nucleotides are routinely validated upon submission of new structures to the PDB. Furthermore, reports of the quality of their structure are already accessible. The challenge addressed by ValidatorDB lies in providing access to validation results for residues other than the well studied amino acids and nucleotides. This generally includes ligands and uncommon residues (e.g., substituted amino acids), which exhibit high diversity and nontriviality in their structure, and for which there is generally much less information regarding correct structures. Thus, within the ValidatorDB environment, we further refine the meaning of the terms residue and ligand to refer to residues and ligands relevant for validation. Specifically, these are all ligands and residues with reasonable size (more than six heavy atoms), with the exception of amino acids and nucleotides. All other features of the terms residue and ligand described in sections 2.1 and 2.2 are maintained. Henceforth, all references to residues and ligands in this manual will have the meaning of residues and ligands relevant for validation. Similarly, all references to residues and ligands in the ValidatorDB web pages (including Wiki and tutorial) have the meaning of residues and ligands relevant for validation. The PDB currently holds over 17000 residues and ligands relevant for validation.


Validation

As stated in section, the validation results stored in ValidatorDB are updated every week. Within the ValidatorDB environment, the term validation refers to the process of determining whether a residue or ligand is structurally complete and correctly annotated. This means checking if the topology and chirality of each motif of a validated residue (section 2.4) correspond to those of the model residue (section 2.5) with the same name as the validated residue. The validation of residues and ligands in the entire PDB takes place in a few distinct steps. First, for each PDB entry, the residues which are relevant for validation are detected based on their name (3-letter code) and number of atoms (more than 6 heavy atoms). Amino acid residues and nucleotides are excluded based on their residue name. Then, for each validated residue, the corresponding model (same 3-letter code as the validated residue) is retrieved from wwPDB CCD, and each motif of this residue is validated against the model. ValidatorDB is then built as the collection of validation results for all motifs of all residues in all PDB entries(Figure 2A). The validation of each motif against the model residue can be illustrated on a galactose (GAL)motif from the PDB entry 1bzw (Figure 2B). The validated residue GAL is extracted form PDB entry 1bzw in the form of an input motif , which contains all the atoms of the validated residue, together with all atoms found within one or two bonds of any atom from the validated residue (surroundings). Then, by superimposing the input motif and model residue, the validated motif is obtained as the subset of atoms in the input motif which have a correspondent atom in the model residue. Comparing each atom and bond in the validated motif to those in the model residue produces the validation results.

Validation results

For each validated motif, ValidatorDB contains several types of results. Since the evaluation of the validated motif relies on comparing all atoms and bonds in the validated motif to those in the model residue, the first results that can be encountered are errors. Namely:

  • Missing atoms: an atom in the model residue has no corresponding atom in the validated motif.
  • Missing rings: at least one missing atom originates from cycles (rings).
  • Wrong chirality: an atom from the validated motif has different chirality than the corresponding atom from the model residue.
  • Wrong chirality (planar): the chirality error was found on a planar chiral center. Because of their spacial distribution, planar chiral centers are very sensitive even to small perturbations in the position of the substituents. Therefore, some of the errors reported here might not be significant.
  • Uncertain chirality: the presence of unusual bonds may cause an improper evaluation of chirality.

Chirality is only evaluated for those motifs which are complete. This is because the absence of some atoms can prevent the proper evaluation of chirality on the chiral centers present in the validated motif. Therefore, note that all motifs which are counted in the Wrong chirality category are in fact complete. At the same time, the motifs with no missing atoms and no chirality error are actually counted in a category called Correct chirality.


Suspicious discrepancies between the atoms and inter-atomic bonds in the validated motif and in the model residue are reported as warnings. Namely:


  • Substitution: an atom from the validated motif is of a different chemical element than the corresponding atom in the model residue (e.g., O mapped to N). This happens often at linkage sites.
  • Different naming: an atom from the validated motif has a different PDB atom name than the corresponding atom from the model residue (e.g., the C1 atom mapped to the C7 atom). This happens often when the original PDB files were produced by different software.
  • Foreign atom: an atom from the model residue was mapped to an atom from outside the validated residue (i.e., from its surroundings).
  • Alternate locations: in the original PDB file, the validated residue contains atoms which were given in alternate locations (i.e., most probably different rotamers). Only the first rotamer was considered during validation.
  • Zero model RMSD: the superimposition between the model residue and the validated motif has a root mean square deviation of zero, i.e., the validated motif is identical to the model residue used as reference.

Disabling discrepancies between the atoms and inter-atomic bonds in the validated motif and in the model residue are reported as processing errors, and such motifs are not validated.

Typical validation results that can be found in ValidatorDB are illustrated on the galactose motif mentioned in section 2.6(Fig 2C). As a general rule, in the ValidatorDB interface, errors are marked in red (missing atoms) or dark yellow (wrong chirality), correct structures in green, and warnings in cyan.

Database contents

ValidatorDB contains precomputed validation results for ligands and residues in the Protein Data Bank. The database is updated on a weekly basis. The validation is performed using Validator, and the residue models from wwPDB CCD are used as reference templates for validation. All residues of significant size (a minimum of 6 heavy atoms) have been included in ValidatorDB, with the exception of amino acids and nucleotides, which are checked thoroughly upon submission of the structure to the PDB, and thus do not require additional validation.

The validation results available in ValidatorDB inform whether each motif (occurrence, instance) of a ligand or residue in the PDB exhibits the appropriate topology and stereochemistry expected from its annotation (3-letter code), or how it differs from this annotation. Additionally, all issues related to incorrect or suspicious topology and stereochemistry are explicitly described in a comprehensive and intuitive manner (e.g., location of missing atoms or chirality inversions).

ValidatorDB is organized on two main levels, namely PDB-wide results (synopsis page), and results restricted to specific residues of interest (specifics page). The two levels present the same type of validation results (as described in section 2.7), although the available features differ to some extent (e.g., the specifics page allows 3D visualization of motifs). We shall describe each level of the database in detail below.

Synopsis page

The ValidatorDB synopsis page(Figures1A, 3) contains a brief description of ValidatorDB, along with information about the last database update (date and number of structures that have been processed during the validation). Specifically, in May 2014, over 100000 PDB entries had been processed, containing over 230,000 motifs of 17000 residues relevant for validation. Additionally, the synopsis page allows to access the validation results for specific residues of interest via the LookUp bar (Figure 1A). Simply type a comma separated list of residue names (3-letter codes) into the LookUp bar, and you will be redirected to the specifics page containing validation results for the residues you requested. If you specify a list of PDB IDs (4-letter codes)instead, then the corresponding specifics page will contain validation results for all relevant residues and ligands in the PDB entries you specified. See section 3.2 for a description of the contents of the specifics page, and how to interpret these contents. The ValidatorDB synopsis page further provides access to various data sets of PDB-wide validations via 3 different tabs, namely Overview, Details by Residue, and Details by PDB entry. A full description of each of these tab is given below (sections 2.1.1- 2.1.3).

Overview

The Overview tab of the synopsis page provides a very general statistical evaluation of results across the entire PDB in graphical form (Figures 1A,3A). The elements of the graph represent percentages of the total number of motifs (over 200000) of residues relevant for validation. A graphic element will be displayed in the Overview graph only if it represents at least 0.5% of the total number of motifs. Each element of the graph is described in a tool tip, but note that here the term residue actually refers to occurrence of residue (motif).