ValidatorDB

ValidatorDB contains precomputed validation results for ligands and residues in the Protein Data Bank. The database is updated on a weekly basis.

The residues deemed relevant for validation are all ligands and residues with reasonable size (more than six heavy atoms), with the exception of standard amino acids and nucleotides. The validation is performed using MotiveValidator, and the residue models from wwPDB Chemical Component Dictionary (wwPDB CCD) are used as reference structures for validation.

Availability and technical details

Where to find ValidatorDB

ValidatorDB is freely available via the internet since May 2014 at http://ncbr.muni.cz/MotiveValidatorDB. There is no login requirement for accessing ValidatorDB

What you need in order to access ValidatorDB

ValidatorDB is basically a database, or rather a collection of validation results for ligands and residues in Protein Data Bank. The database is maintained on the ncbr.chemi.muni.cz server at the National Centre for Biomolecular Research within Masaryk University, Czech Republic, and updated weekly. All you need in order to access ValidatorDB is an internet browser that is up to date and has JavaScript enabled, and a working internet connection. The only functionality that relies on your system is the display of 3D models, for which your browser will need to support WebGL. If you experience trouble displaying the 3D models, please check http://get.webgl.org in order to find out how to enable WebGL on your system.

How to get around the web page

As soon as you type in the address http://ncbr.muni.cz/ ValidatorDB, you will reach the ValidatorDB synopsis page, which contains a brief, general description of ValidatorDB , along with 3 tabs (Figure 1A). The different tabs on the ValidatorDB synopsis page provide access to overviews and statistical evaluation of validation results for the entire PDB, for each residue across all PDBIDs containing that residue, and for all analyzed residues in each PDB ID, in graphical or tabular form. Click on each tab to discover what type of overview can be accessed. Further, the ValidatorDB specifics page(Figure 1B), which is accessible by looking up specific residues or PDB IDs in the synopsis page , allows to view the results for selected residues in more detail. The specifics page is also organized into tabs that allow different levels of analysis of the results. Last but not least, remember to check the tool tips by hovering the mouse cursor over any graphical or textual element in the ValidatorDB interface.

Terminology

Before moving on to more extensive descriptions of features, it is important to clearly establish the meaning of a few key terms and principles within the ValidatorDB environment

Residue

The term residue is used to refer to any component of a biomacromolecule or a biomacromolecular complex. This includes amino acid residues and nucleotides, which are commonly referred to as residues as they form proteins and nucleic acids. Within the ValidatorDB environment, any collection of atoms bound by chemical bonds (covalent, coordinative or ionic)can be considered a residue as long as this fact is appropriately indicated in the input PDB file. Specifically, all the atoms that make up a residue should have the same residue name(3-letter code)and residue identifier (index internal to the input PDB file).

Ligand

We use the term ligand to refer to a chemical compound which forms a complex with a biomacromolecule (e.g., sugar, drug, heme). Ions can also function as self standing ligands, or they can be part of a residue (such as Fe in heme). In the PDB format, a ligand has its own residue identifier and 3-letter code, and is composed from HETATM records. The ValidatorDB term residue (section 2.1) thus fully covers ligands, in addition to typical components like amino acids and nucleotides. Within the ValidatorDB environment, any statements pertaining to ligands hold also for residues.

Residues and ligands relevant for validation

As mentioned in section 1, well studied residues like amino acids and nucleotides are routinely validated upon submission of new structures to the PDB. Furthermore, reports of the quality of their structure are already accessible. The challenge addressed by ValidatorDB lies in providing access to validation results for residues other than the well studied amino acids and nucleotides. This generally includes ligands and uncommon residues (e.g., substituted amino acids), which exhibit high diversity and nontriviality in their structure, and for which there is generally much less information regarding correct structures. Thus, within the ValidatorDB environment, we further refine the meaning of the terms residue and ligand to refer to residues and ligands relevant for validation. Specifically, these are all ligands and residues with reasonable size (more than six heavy atoms), with the exception of amino acids and nucleotides. All other features of the terms residue and ligand described in sections 2.1 and 2.2 are maintained. Henceforth, all references to residues and ligands in this manual will have the meaning of residues and ligands relevant for validation. Similarly, all references to residues and ligands in the ValidatorDB web pages (including Wiki and tutorial) have the meaning of residues and ligands relevant for validation. The PDB currently holds over 17000 residues and ligands relevant for validation.

Motif

With respect to the chemistry of biomolecules, the term motif is used to refer to a well defined distribution of structural elements in a biomolecule or biomolecular complex, with characteristics generally associated with a specific function. Within the ValidatorDB environment, a motif is generally a fragment of a biomacromolecule, biomacromolecular complex or ligand, made up of one or more residues or parts of residues. A motif can in principle be any fragment of a biomolecule. Nonetheless, ValidatorDB is focused on the validation of residues, thus here motif generally refers to a fragment made up from the residue under study, together with its surroundings (i.e., atoms from neighboring residues). We can generally say that, within the ValidatorDB environment, all residues can be thought of as motifs. Therefore, different instances of the same residue(such as copies of the same ligand in different monomers) can be considered and processed as different motifs, making their identification straightforward and unambiguous. The PDB currently holds over 230,000 motifs of over 17000 different residues and ligands relevant for validation (section 2.3)

Model residue

We use the term model residue(or simply model) to refer to a particular structure that is known to be correct. Within the ValidatorDB environment, this structure was used as reference template in the validation process, whereby each instance of each residue was compared against the model residue with the same name (3-letter code). The origin of the models is be the wwPDB chemical component dictionary(wwPDB CCD).

Validation

As stated in section, the validation results stored in ValidatorDB are updated every week. Within the ValidatorDB environment, the term validation refers to the process of determining whether a residue or ligand is structurally complete and correctly annotated. This means checking if the topology and chirality of each motif of a validated residue (section 2.4) correspond to those of the model residue (section 2.5) with the same name as the validated residue. The validation of residues and ligands in the entire PDB takes place in a few distinct steps. First, for each PDB entry, the residues which are relevant for validation are detected based on their name (3-letter code) and number of atoms (more than 6 heavy atoms). Amino acid residues and nucleotides are excluded based on their residue name. Then, for each validated residue, the corresponding model (same 3-letter code as the validated residue) is retrieved from wwPDB CCD, and each motif of this residue is validated against the model. ValidatorDB is then built as the collection of validation results for all motifs of all residues in all PDB entries(Figure 2A). The validation of each motif against the model residue can be illustrated on a galactose (GAL)motif from the PDB entry 1bzw (Figure 2B). The validated residue GAL is extracted form PDB entry 1bzw in the form of an input motif , which contains all the atoms of the validated residue, together with all atoms found within one or two bonds of any atom from the validated residue (surroundings). Then, by superimposing the input motif and model residue, the validated motif is obtained as the subset of atoms in the input motif which have a correspondent atom in the model residue. Comparing each atom and bond in the validated motif to those in the model residue produces the validation results.