The advancement of research in structural biology has provided a large body of structural data deposited in various databases. One great example is the Protein Data Bank (PDB), which has been growing exponentially, and which currently consists of more than 100,000 structures of biomolecules and their complexes. Such large bodies of data, especially accumulated over a short period of time using high throughput techniques, will inherently be plagued by various problems.

Validation arose as a major issue in the structural biology community when it became apparent that some published structures contained serious errors, and only sometimes it is properly explained (e.g., due to insufficient electron density in a certain area). Structural databases generally require that the new submissions be checked prior to acceptance. The tools employed for presubmission validations work fairly well for residues like amino acids or nucleotides. However, an essential step in the validation process is checking the ligand structure, because ligands play a key role in protein function, and also because they are the main source of errors in structures. Ligand validation, as well as the validation of uncommon residues, are very challenging tasks, because of the high diversity and nontriviality of their structure, and the general lack of information about correct structures. Therefore, software tools focused on ligand validation were developed relatively recently, <ref name="Lutteke_2004"/><ref name="Kleywegt_2007"/> and the topic is still under active development<ref name="Berman_2014"/>. These tools are able to validate one or more structures (even thousands of structures), but they are not able to provide the broad scientific community with a more complex image of the quality of structures in dedicated and well established structural databases. For example, a general overview and corresponding statistical evaluation of validation results for residues and ligands in the entire PDB is not yet available, despite the exponential growth of the PDB and the development of structural validation tools in recent years.

We had recently developed MotiveValidator<ref name="Varekova_2014"/>, an interactive platform for the speedy validation of ligands, residues and fragments using a novel, straightforward approach based on the validation of residue annotation. MotiveValidator employs advanced algorithms for the detection and comparison of structural motifs<ref name="Sehnal_2012"/>, along with tools for chirality verification<ref name="Boyle_2011" /> and interactive visualization of 3D structures<ref name="web"/>. Using MotiveValidator, we further created ValidatorDB, a comprehensive resource of validation results for residues and ligands in the | Protein Data Bank. Along with validation results for individual residues and ligands, ValidatorDB also provides a summary and statistical evaluation of the validation results at various levels of detail within the PDB. Thus, ValidatorDB offers a comprehensive overview of the quality of the ligand structures in the entire PDB. ValidatorDB contains precomputed validation results for ligands and residues in the Protein Data Bank. The database is updated on a weekly basis.

The residues deemed relevant for validation are all ligands and residues with reasonable size (more than six heavy atoms), with the exception of standard amino acids and nucleotides. The validation is performed using MotiveValidator, and the residue models from wwPDB Chemical Component Dictionary (wwPDB CCD) are used as reference structures for validation.

If you are a first time user, please visit our introduction to ValidatorDB, Terminology used within following lines or results analysis.

Availability and technical details

Where to find ValidatorDB

ValidatorDB is freely available online since May 2014 at http://ncbr.muni.cz/ValidatorDB. There is no login requirement for accessing ValidatorDB.

What you need in order to run ValidatorDB

Up to date internet browser with WebGL support.
JavaScript enabled.

Check it out, if your browser is WebGL and Javascript compliant.

How to get around the web page

For the quick tour on using ValidatorDB service, please go HERE. Whenever you are unsure about the function of different user interface elements, please click the '?' button for the guide.

Before moving on to more extensive descriptions of features, it is important to clearly establish the meaning of a few key terms and principles within the ValidatorDB environment. See terminology.

Basic Principles

Residues and ligands relevant for validation

As mentioned in the introduction, well studied residues like amino acids and nucleotides are routinely validated upon submission of new structures to the PDB. Furthermore, reports of the quality of their structure are already accessible. The challenge addressed by ValidatorDB lies in providing access to validation results for residues other than the well studied amino acids and nucleotides. This generally includes ligands and uncommon residues (e.g. substituted amino acids), which exhibit high diversity and nontriviality in their structure, and for which there is generally much less information regarding correct structures. Thus, within the ValidatorDB environment, we further refine the meaning of the terms residue and ligand to refer to residues and ligands relevant for validation. Specifically, these are all ligands and residues with reasonable size (more than six heavy atoms), with the exception of amino acids and nucleotides. All other features of the terms residue and ligand described in terminology are maintained. Henceforth, all references to residues and ligands in the ValidatorDB web pages (including Wiki and tutorial) have the meaning of residues and ligands relevant for validation. The PDB currently holds over 17,000 residues and ligands relevant for validation.

Validation

A) Overview of validation process. For each PDB entry, the relevant residues are detected based on their name (3-letter code) and number of atoms. Then, for each validated residue, the corresponding model is retrieved from wwPDB CCD, and each motif of this residue is validated against the model. ValidatorDB is then built as the collection of validation results for all motifs of all residues in all PDB entries.

B) Example of validation for a sialic acid (SIA) motif in the PDB entry 4jtv. The validated residue SIA is extracted from PDB entry 4jtv in the form of an input motif, which contains all the atoms of the validated residue, together with all atoms found within one or two bonds of any atom from the validated residue (surroundings). By superimposing the input motif and model residue, the validated motif results as the subset of atoms in the input motif which correspond to atoms in the model residue. Evaluation of the validated motif (which atoms are present and where, compared to the model residue) produces the validation results.

C) Typical validation results for a SIA motif. If the validated motif contains all atoms at their expected positions, it is marked as correct (green). If any atoms are missing, the validated motif is marked as incomplete (red). If some atoms have wrong chirality, the validated motif is marked accordingly (dark yellow). If any unusual features are detected, the validated motif is marked with a warning (cyan) (not shown). The full list of graphical results is available here.

Within the ValidatorDB environment, the term validation refers to the process of determining whether a residue or ligand is structurally complete and correctly annotated. This means checking if the topology and chirality of each motif of a validated residue correspond to those of the model residue with the same name as the validated residue.

The validation of residues and ligands in the entire PDB takes place in a few distinct steps. First, for each PDB entry, the residues which are relevant for validation are detected based on their name (3-letter code) and number of atoms (more than 6 heavy atoms). Amino acid residues and nucleotides are excluded based on their residue name. Then, for each validated residue, the corresponding model (same 3-letter code as the validated residue) is retrieved from wwPDB CCD, and each motif of this residue is validated against the model.

Validation results

For the thorough description of the ValidatorDB outputs please follow this link.

Database contents

ValidatorDB contains precomputed validation results for ligands and residues in the Protein Data Bank. The database is updated on a weekly basis. The validation is performed using MotiveValidator, and the residue models from wwPDB CCD are used as reference templates for validation. All residues of significant size (a minimum of 6 heavy atoms) have been included in ValidatorDB, with the exception of amino acids and nucleotides, which are checked thoroughly upon submission of the structure to the PDB, and thus do not require additional validation.

The validation results available in ValidatorDB inform whether each motif (occurrence, instance) of a ligand or residue in the PDB exhibits the appropriate topology and stereochemistry expected from its annotation (3-letter code), or how it differs from this annotation. Additionally, all issues related to incorrect or suspicious topology and stereochemistry are explicitly described in a comprehensive and intuitive manner (e.g., location of missing atoms or chirality inversions).

ValidatorDB is organized on two main levels, namely PDB-wide results (synopsis page), and results restricted to specific residues of interest (specifics page). The two levels present the same type of validation results (as described below), although the available features differ to some extent. (e.g., the Specifics page allows 3D visualization of motifs). We shall describe each level of the database in detail below.

Synopsis page

The ValidatorDB synopsis page contains a brief description of ValidatorDB, along with information about the last database update (date and number of structures that have been processed during the validation). Specifically, in May 2014, over 100,000 PDB entries had been processed, containing over 230,000 motifs of 17,000 residues relevant for validation.

Additionally, the synopsis page allows to access the validation results for specific residues of interest via the LookUp bar. Simply type a comma separated list of residue names (3-letter codes) into the LookUp bar, and you will be redirected to the specifics page containing validation results for the residues you requested. If you specify a list of PDB IDs (4-letter codes) instead, then the corresponding specifics page will contain validation results for all relevant residues and ligands in the PDB entries you specified.

See specifics page for the description, and how to interpret this contents. The ValidatorDB synopsis page further provides access to various data sets of PDB-wide validations via 4 different tabs, namely Overview, Details by Residue, Details by PDB entry and Custom Search. A full description of each of these tab is given below.

Overview

The Overview tab of the synopsis page provides a very general statistical evaluation of results across the entire PDB in graphical form. The elements of the graph represent percentages of the total number of motifs (over 200,000) of residues relevant for validation. A graphic element will be displayed in the Overview graph only if it represents at least 0.5% of the total number of motifs. Each element of the graph is described in a tool tip, but note that here the term residue actually refers to occurrence of residue (motif).

The elements of the graph can be assigned to roughly 6 categories, depending on which kind of information they contain (e.g., incomplete residue, chirality issues, warnings, etc). The categories are marked by different colors. All the graph elements are explained in the results section.

Note that this table can be downloaded in a *.csv format after clicking 'CSV' in the bottom right corner of the infographics.

Details by Residue

The Details by Residue tab contains an interactive table summarizing the results for each residue validated across the entire PDB. Each row corresponds to one residue, identified by its residue name (3-letter code). The information in the table is organized according to the validation result. The color coding for the table header and the font inside the table is the same as in the categories defined in the Overview tab. Each element of the table header is described in a tool tip, but note that here the term residue actually refers to occurrence of residue (motif).

The table is interactive. Clicking on any element in the table header allows to sort the table entries according to that element. Click on any residue name in order to access the ValidatorDB specifics page with detailed validation results for that residue.

The filter at the top right corner allows to retrieve the table row with a specific residue. Simply type the residue name into the filter. All results can be downloaded in *.csv format using the download button at the top left corner.

Details by PDB entry

The Details by PDB Entry tab contains an interactive table summarizing the results for all residues validated in each PDB entry. Each row corresponds to one PDB entry, identified by its PDB ID (4-letter code). The information in the table is organized according to the validation result. The color coding for the table header and the font inside the table is the same as in the categories defined in the Overview tab. Each element of the table header is described in a tool tip, but note that here the term residue actually refers to occurrence of residue (motif).

The table is interactive. Clicking on any element in the table header allows to sort the table entries according to that element. Click on any PDB in order to access the ValidatorDB specifics page with detailed validation results for all residues in that PDB entry.

The filter at the top right corner allows to retrieve the table rows with a specific residue, or the table rows with selected PDB IDs. Simply type the residue name or PDB ID into the filter. All results can be downloaded in .csv format using the download button at the top left corner.

Custom Search

The Custom Search tab allows you to create your own view of the ligands validation of the PDB database. Simply paste a list of your desired ligands (3-letter code) and/or PDB entries (4-letter code) in provided text boxes separated by commas or a newlines. This is particularly convenient in case you need to retrieve a validation report for a huge number of structures. Such as all the glycosyltransferases or nmr structures. Note that you can retrieve such lists by using advanced search in PDB. Also bear in mind that each of these custom searches will be assigned a unique permanent web address, so you can access these results later on. Also a list of your last custom searches is provided for your convenient.

Specifics page

For the description of all the elements of Specifics page, please follow this link.

References

<references> <ref name="Lutteke_2004">Lütteke,T. and von der Lieth,C.-W. (2004) pdb-care (PDB carbohydrate residue check): a program to support annotation of complex carbohydrate structures in PDB files. BMC Bioinformatics, 5, 69.</ref>

<ref name="Kleywegt_2007">Kleywegt,G.J. and Harris,M.R. (2007) ValLigURL: a server for ligand-structure comparison and validation. Acta Crystallogr. D. Biol. Crystallogr., 63, 935–8.</ref>

<ref name="Berman_2014">Berman, H.M., Kleywegt, G.J., Nakamura, H. and Markley, J.L. (2014) The Protein Data Bank archive as an open data resource. J. Comput. Aided. Mol. Des. </ref>

<ref name="Varekova_2014">Vařeková,R.S., Jaiswal,D., Sehnal,D., Ionescu,C.-M., Geidl,S., Pravda,L., Horský,V., Wimmerová,M. and Koča,J. (2014) MotiveValidator: interactive web-based validation of ligand and residue structure in biomolecular complexes. Nucleic Acids Res., 12, 227–233.</ref>

<ref name="Sehnal_2012">Sehnal,D., Vařeková,R.S., Huber,H.J., Geidl,S., Ionescu,C.-M., Wimmerová,M. and Koča,J. (2012) SiteBinder: an improved approach for comparing multiple protein structural motifs. J. Chem. Inf. Model., 52, 343–59.</ref>

<ref name="Boyle_2011">O’Boyle,N.M., Banck,M., James,C. a, Morley,C., Vandermeersch,T. and Hutchison,G.R. (2011) Open Babel: An open chemical toolbox. J. Cheminform., 3, 33.</ref> <ref name="web">http://www.chemdoodle.com</ref> </references>