Jump to content

PatternQuery:Use Cases: Difference between revisions

From WebChemistry Wiki
No edit summary
No edit summary
 
(30 intermediate revisions by 2 users not shown)
Line 1: Line 1:
==Statement of purpose==
In this section you can find several biologically relevant examples of different queries.
MotiveQuery (MQ) represents a cutting edge and powerful platform for complex, swift and accurate molecular motifs selection and analysis over a vast variety of structural data. In the following text you can find a numerous use cases which will help you to deep dive into the language and the web service itself. Moreover, following text can also serve as a brief language reference. The full language reference is accessible via our [[MotiveQuery_Language_Reference | wiki pages]].  


In order to tune up your queries or just to try an interactive work with the language, feel free to use [http://webchem.ncbr.muni.cz/Platform/App/MotiveExplorer MotiveExplorer (ME)] application, where you can upload a PDB molecule of choice and immediately see the result of queries. ME requires an up-to-date web browser and the [http://www.microsoft.com/silverlight/ Silverlight] framework installed. It has been tested under Windows and Mac OS operating systems.
==== Find all post-translational modified aminoacids ====
*i.e. Those incorporated in the protein backbone and not hetero atoms
<syntaxhighlight lang="python">
NotAminoAcids().
    Filter(lambda m: m.Count(HetResidues()) == 0) </syntaxhighlight>
This query queries all the non-standard amino acids for their presence among Hetatom entries. Equivalently:<syntaxhighlight lang="python">
NotAminoAcids().
    Filter(lambda m: m.Contains(HetResidues()).Not())</syntaxhighlight>


==How to read following text==
====Find all heteroatoms, which are not covalently bonded to the protein structure====
* Takes all the heteroatoms and queries them for being connected to any amino acid of a given protein
<syntaxhighlight lang="python">
HetResidues().
    Filter(lambda m: m.IsNotConnectedTo(AminoAcids()))</syntaxhighlight>


The text is separated to three plus one part which differs by data type queries are operated (Atoms, Residues and Motives). The first two categories deal with only basic Atom or Residue selections. Outputs of these two types of queries are Atoms and Residues respectively.  Last category is Motives, which contain numerous advanced queries demonstrating versatility and power of MQ. These queries operate on all results provided by both Atom and Residue queries. The additional part represents a few use cases of complex queries demonstrating possible application of MQ language.
====Identify Zinc fingers====
*There is a variety of different zinc fingers based on the surrounding residues, in our example we will focus on those comprising two zinc and two his residues (Cys2His2).


Each query looks like this:
<syntaxhighlight lang="python">
===Name of the query()===
Atoms("Zn").
*'''[PDB ID]''' Here you find an example PDB ID where you can try out the query with MotiveExplorer and a rough description of the query function.
    ConnectedResidues(1).
* This is followed by examples of this query with with the explanation such as Residues("HEM").
    Filter(lambda m:  
* Type of data query operate on and the expected returned value.
      (m.Count(Residues("His")) == 2) & (m.Count(Residues("Cys")) == 2)) </syntaxhighlight>
* e.g. ''Type: Atoms(symbols: String*) -> Atoms''. query Atoms() take 0..n strings in parenthesis ("C", "N", etc.) and based on the input returns a list of individual atoms.


==Terminology==
At first the zinc atoms are selected together with their bonded residues. Additionally, these patterns are filtered according to the content of their amino acids.
By an atom we mean an individual point in Cartesian coordinate system as provided by ATOM or HETATOM records in PDB input files. Residue refers to any component of a biomacromolecule or a biomacromolecular complex. This includes amino acid residues, nucleotides and ligand, which are commonly referred to as residues as they provide building blocks for proteins and nucleic acids. A single residue is defined by its id and name defined in PDB input file.  Subsequently, a motive is any sequence of either atoms or residues generated by MQ. Therefore, both atoms and residues can be considered as motives.   


=Queries=
====Identify all the residues, which contain a sugar ring====
* This task can be decomposed to two individual subtasks, since sugars contain either pentose or furanose ring. Pentose ring contains 4 carbon and an oxygen atom. Similarly, furanose ring is composed of 5 carbon atoms and an oxygen atom.


==Atoms==
<syntaxhighlight lang="python">
===Atoms()===
Or(Rings(4 * ["C"] + ["O"]).ConnectedResidues(0),
* '''[2hhb]''' Returns a list of individual atoms based on element type provided in the argument. More elements can be specified, if separated by a comma. In case no argument is provided a list of all the atoms is returned.
  Rings(5 * ["C"] + ["O"]).ConnectedResidues(0)) </syntaxhighlight>
*<code>Atoms("Fe")</code> - Returns all iron atoms in the given structure.
*<code>Atoms("Fe", "N")</code> - Returns all iron and nitrogen atoms in the given structure.
* ''Type: Atoms(symbols: String*) -> Atoms.''


==Residues==
By specifying the <code>Ring()</code> queries, we select only the ring part of the molecule. By extending the <code>Ring()</code> query with <syntaxhighlight lang="python">ConnectedResidues(0)</syntaxhighlight> only the residue which includes this ring is selected. Last but not least we can join both queries with <syntaxhighlight lang="python">Or()</syntaxhighlight> in order to merge results.


==Motives==
====Identify all binding sites of PA-IIL lectin in different organisms====
*Binding sites of this type of lectin comprise of two calcium atoms close to each other and a binded sugar residue. 
<syntaxhighlight lang="python">
Near(4, Atoms("Ca"), Atoms("Ca"))
  .ConnectedResidues(1)
  .Filter(lambda l:
    l.Count(Or(Rings(5 * ["C"] + ["O"]), Rings(4 * ["C"] + ["O"]))) > 0)
  .Filter(lambda l: l.Count(Atoms("P")) == 0)</syntaxhighlight>


==Selected Use Cases==
At first we select all the pairs of calcium atoms, if they are in a vicinity of 4Å and less by <syntaxhighlight lang="python">Near(4, Atoms("Ca"), Atoms("Ca"))</syntaxhighlight> Subsequently all the bonded residues are checked if they contain either pyranose or furanose ring. only the patterns containing either pentose <syntaxhighlight lang="python">(Rings(5 * ["C"] + ["O"]))</syntaxhighlight> or furanose <syntaxhighlight lang="python">(Rings(4 * ["C"] + ["O"]))</syntaxhighlight> are returned. Since a sugar moiety is an integral part of nucleotides, there is a final simple check, assuring, that no patterns containing phosphorus, i.e. nucleotide are retained.
=== Find all post-translational modified aminoacids ===
*i.e. Those incodporated in the protein backbone and not hetero atoms
<pre>NotAminoAcids().Filter(lambda m: m.Count(HetResidues()) == 0) </pre>
This query queries all the non-standard amino acids for their presence among Hetatom entries. Equivalently:**<pre>NotAminoAcids().Filter(lambda m: m.Contains(HetResidues()).Not())</pre>
 
===Find all heteroatoms, which are not covalently bonded to the protein structure===
* Takes all the heteroatoms and queries them for being connected to any amino acid of a given protein
<pre>HetResidues().Filter(lambda m: m.IsNotConnectedTo(AminoAcids()))</pre>

Latest revision as of 13:42, 25 April 2015

In this section you can find several biologically relevant examples of different queries.

Find all post-translational modified aminoacids

[edit]
  • i.e. Those incorporated in the protein backbone and not hetero atoms
NotAminoAcids().
    Filter(lambda m: m.Count(HetResidues()) == 0)

This query queries all the non-standard amino acids for their presence among Hetatom entries. Equivalently:

NotAminoAcids().
    Filter(lambda m: m.Contains(HetResidues()).Not())

Find all heteroatoms, which are not covalently bonded to the protein structure

[edit]
  • Takes all the heteroatoms and queries them for being connected to any amino acid of a given protein
HetResidues().
    Filter(lambda m: m.IsNotConnectedTo(AminoAcids()))

Identify Zinc fingers

[edit]
  • There is a variety of different zinc fingers based on the surrounding residues, in our example we will focus on those comprising two zinc and two his residues (Cys2His2).
Atoms("Zn").
    ConnectedResidues(1).
    Filter(lambda m: 
      (m.Count(Residues("His")) == 2) & (m.Count(Residues("Cys")) == 2))

At first the zinc atoms are selected together with their bonded residues. Additionally, these patterns are filtered according to the content of their amino acids.

Identify all the residues, which contain a sugar ring

[edit]
  • This task can be decomposed to two individual subtasks, since sugars contain either pentose or furanose ring. Pentose ring contains 4 carbon and an oxygen atom. Similarly, furanose ring is composed of 5 carbon atoms and an oxygen atom.
Or(Rings(4 * ["C"] + ["O"]).ConnectedResidues(0), 
   Rings(5 * ["C"] + ["O"]).ConnectedResidues(0))

By specifying the Ring() queries, we select only the ring part of the molecule. By extending the Ring() query with

ConnectedResidues(0)

only the residue which includes this ring is selected. Last but not least we can join both queries with

Or()

in order to merge results.

Identify all binding sites of PA-IIL lectin in different organisms

[edit]
  • Binding sites of this type of lectin comprise of two calcium atoms close to each other and a binded sugar residue.
Near(4, Atoms("Ca"), Atoms("Ca"))
  .ConnectedResidues(1)
  .Filter(lambda l:
    l.Count(Or(Rings(5 * ["C"] + ["O"]), Rings(4 * ["C"] + ["O"]))) > 0)
  .Filter(lambda l: l.Count(Atoms("P")) == 0)

At first we select all the pairs of calcium atoms, if they are in a vicinity of 4Å and less by

Near(4, Atoms("Ca"), Atoms("Ca"))

Subsequently all the bonded residues are checked if they contain either pyranose or furanose ring. only the patterns containing either pentose

(Rings(5 * ["C"] + ["O"]))

or furanose

(Rings(4 * ["C"] + ["O"]))

are returned. Since a sugar moiety is an integral part of nucleotides, there is a final simple check, assuring, that no patterns containing phosphorus, i.e. nucleotide are retained.