Jump to content

ValidatorDB: Difference between revisions

From WebChemistry Wiki
Deepti (talk | contribs)
No edit summary
Line 24: Line 24:


==Basic terms and principles==
==Basic terms and principles==
[[MotiveValidator:Terminology|Terminology]]
Before moving on to more extensive descriptions of features, it is important to clearly establish the meaning of a few key terms and principles within the ValidatorDB environment
Before moving on to more extensive descriptions of features, it is important to clearly establish the meaning of a few key terms and principles within the ValidatorDB environment



Revision as of 13:18, 18 June 2014

The advancement of research in structural biology has provided a large body of structural data deposited in various databases. One great example is the Protein Data Bank (PDB), which has been growing exponentially, and which currently consists of more than 100,000 structures of biomolecules and their complexes. Such large bodies of data, especially accumulated over a short period of time using high throughput techniques, will inherently be plagued by various problems. Validation arose as a major issue in the structural biology community when it became apparent that some published structures contained serious errors, either documented (e.g., due to insufficient electron density in a certain area), or not. Structural databases generally require that the new submissions be checked prior to acceptance. The tools employed for presubmission validations work fairly well for well studied residues like amino acids or nucleotides. However, an essential step in the validation process is checking the ligand structure, because ligands play a key role in protein function, and also because they are the main source of errors in structures. Ligand validation, as well as the validation of uncommon residues, are very challenging tasks, because of the high diversity and nontriviality of their structure, and the general lack of information about correct structures. Therefore, software tools focused on ligand validation were developed relatively recently, and the topic is still under active development. These tools are able to validate one or more structures (even thousands of structures), but they are not able to provide the broad scientific community with a more complex image of the quality of structures in dedicated and well established structural databases. For example, a general overview and corresponding statistical evaluation of validation results for residues and ligands in the entire PDB is not yet available, despite the exponential growth of the PDB and the development of structural validation tools in recent years.

We had recently developed MotiveValidator, an interactive platform for the speedy validation of ligands, residues and fragments using a novel, straightforward approach based on the validation of residue annotation.

MotiveValidator employs advanced algorithms for the detection and comparison of structural motifs, along with tools for chirality verification and interactive visualization of 3D structures. Using MotiveValidator, we further created ValidatorDB, a comprehensive resource of validation results for residues and ligands in the Protein Data Bank. Along with validation results for individual residues and ligands, ValidatorDB also provides a summary and statistical evaluation of the validation results at various levels of detail within the PDB. Thus, ValidatorDB offers a comprehensive overview of the quality of the ligand structures in the entire PDB.

Availability and technical details

Where to find ValidatorDB

ValidatorDB is freely available via the internet since May 2014 at http://ncbr.muni.cz/MotiveValidatorDB. There is no login requirement for accessing ValidatorDB

What you need in order to access ValidatorDB

ValidatorDB is basically a database, or rather a collection of validation results for ligands and residues in Protein Data Bank. The database is maintained on the ncbr.chemi.muni.cz server at the National Centre for Biomolecular Research within Masaryk University, Czech Republic, and updated weekly. All you need in order to access ValidatorDB is an internet browser that is up to date and has JavaScript enabled, and a working internet connection. The only functionality that relies on your system is the display of 3D models, for which your browser will need to support WebGL. If you experience trouble displaying the 3D models, please check http://get.webgl.org in order to find out how to enable WebGL on your system.

How to get around the web page

As soon as you type in the address http://ncbr.muni.cz/ ValidatorDB, you will reach the ValidatorDB synopsis page, which contains a brief, general description of ValidatorDB , along with 3 tabs (Figure 1A). The different tabs on the ValidatorDB synopsis page provide access to overviews and statistical evaluation of validation results for the entire PDB, for each residue across all PDBIDs containing that residue, and for all analyzed residues in each PDB ID, in graphical or tabular form. Click on each tab to discover what type of overview can be accessed. Further, the ValidatorDB specifics page(Figure 1B), which is accessible by looking up specific residues or PDB IDs in the synopsis page , allows to view the results for selected residues in more detail. The specifics page is also organized into tabs that allow different levels of analysis of the results. Last but not least, remember to check the tool tips by hovering the mouse cursor over any graphical or textual element in the ValidatorDB interface.

Basic terms and principles

Terminology

Before moving on to more extensive descriptions of features, it is important to clearly establish the meaning of a few key terms and principles within the ValidatorDB environment

Residue

The term residue is used to refer to any component of a biomacromolecule or a biomacromolecular complex. This includes amino acid residues and nucleotides, which are commonly referred to as residues as they form proteins and nucleic acids. Within the ValidatorDB environment, any collection of atoms bound by chemical bonds (covalent, coordinative or ionic)can be considered a residue as long as this fact is appropriately indicated in the input PDB file. Specifically, all the atoms that make up a residue should have the same residue name(3-letter code)and residue identifier (index internal to the input PDB file).

Ligand

We use the term ligand to refer to a chemical compound which forms a complex with a biomacromolecule (e.g., sugar, drug, heme). Ions can also function as self standing ligands, or they can be part of a residue (such as Fe in heme). In the PDB format, a ligand has its own residue identifier and 3-letter code, and is composed from HETATM records. The ValidatorDB term residue (section 2.1) thus fully covers ligands, in addition to typical components like amino acids and nucleotides. Within the ValidatorDB environment, any statements pertaining to ligands hold also for residues.

Residues and ligands relevant for validation

As mentioned in section 1, well studied residues like amino acids and nucleotides are routinely validated upon submission of new structures to the PDB. Furthermore, reports of the quality of their structure are already accessible. The challenge addressed by ValidatorDB lies in providing access to validation results for residues other than the well studied amino acids and nucleotides. This generally includes ligands and uncommon residues (e.g., substituted amino acids), which exhibit high diversity and nontriviality in their structure, and for which there is generally much less information regarding correct structures. Thus, within the ValidatorDB environment, we further refine the meaning of the terms residue and ligand to refer to residues and ligands relevant for validation. Specifically, these are all ligands and residues with reasonable size (more than six heavy atoms), with the exception of amino acids and nucleotides. All other features of the terms residue and ligand described in sections 2.1 and 2.2 are maintained. Henceforth, all references to residues and ligands in this manual will have the meaning of residues and ligands relevant for validation. Similarly, all references to residues and ligands in the ValidatorDB web pages (including Wiki and tutorial) have the meaning of residues and ligands relevant for validation. The PDB currently holds over 17000 residues and ligands relevant for validation.