Jump to content

PatternQuery:Use Cases: Difference between revisions

From WebChemistry Wiki
mNo edit summary
No edit summary
 
(17 intermediate revisions by 2 users not shown)
Line 1: Line 1:
<div class="toclimit-4">
In this section you can find several biologically relevant examples of different queries.


==Statement of purpose==
MotiveQuery (MQ) represents a cutting edge and powerful platform for complex, swift and accurate molecular motifs selection and analysis over a vast variety of structural data. In the following text you can find a numerous use cases which will help you to deep dive into the language and the web service itself. Moreover, following text can also serve as a brief language reference. The full language reference is accessible via our [[MotiveQuery_Language_Reference | wiki pages]].  '''This is not the exhaustive list of queries there are new features to come in the next version.'''
In order to tune up your queries or just to try an interactive work with the language, feel free to use [http://webchem.ncbr.muni.cz/Platform/App/MotiveExplorer MotiveExplorer (ME)] application, where you can upload a PDB molecule of choice and immediately see the result of queries. ME requires an up-to-date web browser and the [http://www.microsoft.com/silverlight/ Silverlight] framework installed. It has been tested under Windows and Mac OS operating systems.
==How to read following text==
The text is separated to three plus one part which differs by data type queries are operated (Atoms, Residues and Motives). The first two categories deal with only basic Atom or Residue selections. Outputs of these two types of queries are Atoms and Residues respectively.  Last category is Motives, which contain numerous advanced queries demonstrating versatility and power of MQ. These queries operate on all results provided by both Atom and Residue queries. The additional part represents a few use cases of complex queries demonstrating possible application of MQ language.
Each query looks like this:
===Name of the query()===
*'''[PDB ID]''' Here you find an example PDB ID where you can try out the query with MotiveExplorer and a rough description of the query function.
* This is followed by examples of this query with with the explanation such as <code>Residues("HEM")</code>. Copy this query to the command text box in Motive Explorer application and immediately see the results.
* Type of data query operate on and the expected returned value.
* e.g. ''Type: Atoms(symbols: String*) -> Atoms''. query Atoms() take 0..n strings in parenthesis ("C", "N", etc.) and based on the input returns a list of individual atoms.
==Terminology==
By an atom we mean an individual point in Cartesian coordinate system as provided by ATOM or HETATOM records in PDB input files. Residue refers to any component of a biomacromolecule or a biomacromolecular complex. This includes amino acid residues, nucleotides and ligand, which are commonly referred to as residues as they provide building blocks for proteins and nucleic acids. A single residue is defined by its id and name defined in PDB input file.  Subsequently, a motive is any sequence of either atoms or residues generated by MQ. Therefore, both atoms and residues can be considered as motives.   
=Queries=
==Atoms==
=== Basic queries ===
====Atoms()====
* '''[2hhb]''' Returns a sequence of individual atoms based on element type provided in the argument. More elements can be specified, if separated by a comma. In case no argument is provided a list of all the atoms is returned.
*<code>Atoms("Fe")</code> - Returns all iron atoms in the given structure.
*<code>Atoms("Fe", "N")</code> - Returns all iron and nitrogen atoms in the given structure.
* ''Type: Atoms(symbols: String*) -> Atoms.''
====AtomNames()====
* '''[2hhb]''' Returns a sequence of atoms with defined name or names.
* <code>AtomNames("CA")</code> – Returns all CA atoms.
* <code>AtomNames("CA", "N")</code> –Returns atoms with names CA (Cα carbon) or N (terminal part of amino acids).
* ''Type: AtomNames(names: String+) -> Atoms.''
====AtomIds()====
* '''[2hhb]''' Returns a sequence of atoms with given id or ids
* <code>AtomIds(1)</code> – Returns atom with id=1 from the given structures.
* <code>AtomIds(1,2,5) </code>- Returns atoms with id=1, 2 and 5 from the given structures.
* ''Type: AtomIds(ids: Integer+) -> Atoms.''
====AtomIdRange()====
* '''[2hhb]''' Returns a sequence of atoms with ids from a given range (inclusive specified indices).
* <code>AtomIdRange(1, 10)</code> – returns 10 atoms with IDs from the interval <1, 10>, as specified in the input file.
* ''Type: AtomIdRange(minId: Integer, maxId: ?Integer) -> Atoms''
====NotAtomNames()====
* '''[2hhb]''' Returns a sequence of atoms which are not defined by an argument.
* <code>NotAtomNames("C", "N", "CA", "O")</code> – returns all the atoms with names other than C, CA, N and O.  i.e. only the side chain atoms of the protein .
* ''Type: NotAtomNames(names: String+) -> Atom.''
====NotAtomIds()====
* '''[2hhb]''' Returns a sequence of atoms which does not have a defined id
* <code>NotAtomIds(1)</code> - Returns atom with id other but 1 from the given structures.
* <code>NotAtomIds(1,2,5)</code> - Returns atoms with id other but 1,2 and 5 from the given structures.
* ''Type: NotAtomIds(ids: Integer+) -> Atoms.''
====NotAtoms()====
* ''[2hhb]'' Returns all the atoms not specified in the argument. More elements can be specified, if separated by a comma.
* <code>Atoms("Fe")</code> – returns all the atoms of the structure, but iron.
* <code>Atoms("Fe", "N")</code> returns all the atoms of the structure, but iron and nitrogen.
* ''Type: NotAtoms(symbols: String+) -> Atoms.''
====RingAtoms()====
* '''[2hhb]''' Returns specified atoms found on detected rings .
* <code>RingAtoms(Atoms("N"), Rings(2 * ["C"] + ["N"] + ["C"] + ["N"]))</code> – Returns all the nitrogen atoms on the histidine sidechain.
* ''Type: RingAtoms(atom: Atoms, ring: ?Rings) -> Atoms.''
==Residues==
=== Basic queries===
====Residues()====
* '''[2hhb]''' Returns a sequence of individual residues specified by a function argument. More residues can be specified, if separated by a comma.
* <code>Residues("HEM")</code> – Returns a list of HEM residues.
* <code>Residues ("HEM", "ALA")</code> – Returns a set of HEM and ALA residues.
* ''Type: Residues(names: Value*) -> Residues.''
====NotResidues()====
* '''[2hhb]''' Returns a sequence of residues which are not defined by the argument.
* <code>NotResidues("HEM")</code> – returns a set of residues, which does not have a HEM in their name.
* ''Type: NotResidues(names: Value+) -> Residues.''
====ResidueIds()====
* '''[2hhb]''' Returns a sequence of specified residues in case the structure contains them. Each residue is represented by its PDB ID and chain such as “A 8”.
* <code>ResidueIds ("14 A",  "15 A")</code> – Returns 14th  and 15th residue of chain A.
* ''Type: ResidueIds(ids: String+) -> Residues''.
====ResidueIdRange()====
* '''[2hhb]''' Returns a set of residues on a given chain from the lower to the upper index. In case a residue is not provided in the structure, it is skipped.
* <code>ResidueIdRange("A", 50,  100)</code> – returns a set of residues on chain A from the ID 50 to 100.
* ''Type: ResidueIdRange(chain: String, min: Integer, max: Integer) -> Residues''.
====NotAminoAcids()====
* '''[2hhb]''' Returns a sequence of residues that are not among the 20 standard amino acids. Allowed values for an optional parameter NoWaters: True, False.
* <code>NotAminoAcids()</code> – Returns all the nonstandard residues incorporated in the protein structure with the exception of HOH and WAT residues, which stands for solvent.
* <code>NotAminoAcids(NoWaters=False)</code> – Returns all the nonstandard residues incorporated in the protein structure inclusive solvent (HOH and WAT residues).
* ''Type: NotAminoAcids() -> Residues''.
====AminoAcids()====
*  '''[2hhb]''' Returns a sequence of residues that are among the 20 standard amino acids.
* Allowed values: <code>Positive, Negative, Aromatic, Polar, NonPolar</code>
* <code>AminoAcids()</code> – Returns all standard amino acids.
* <code>AminoAcids(ChargeType=”Polar”)</code> – Returns all polar amino acids based on the type of their side chain.
* ''Type: AminoAcids() -> Residues.''
====HetResidues()====
* '''[2hhb]''' Returns a sequence of heteroatom residues as specified in input PDB files, excluding residues.
* <code>HetResidues()</code> – A set of hetatom residues.
* ''Type: HetResidues() -> Residues''
==Motives==
===Structure specification===
MQ can select residues or their parts based on their name or Id, however, the true power of MQ lies in its ability to utilize the chemical nature of the input structures and select motives purely based on elements and the connectivity among them.
====Rings()====
* '''[3d12]''' Returns all the rings in the protein structure specified by a user.  Any structural ring can be identified by concatenating individual elements the string is composed from.
* <code>Rings(2 * ["C"] + ["N"] + ["C"] + ["N"])</code> – Returns the histidine aromatic ring.
* <code>Rings(4 * ["C"] + ["O"])</code> – Returns pentose ring.
* ''Type: Rings(atoms: Value*) -> Ring''
====ToAtoms()====
* '''[1hho]''' Converts the input motive into a sequence of individual atoms, i.e. each motive contains a single atom.
* <code>Residues("HEM").ToAtoms()</code> – Returns all the atoms of HEM residues as a sequence of individual atoms.
* ''Type: ToAtoms(motives: Motives) -> Motives''
====ToResidues()====
* '''[1hho]''' Converts the input motive into a sequence of individual residues, i.e. It can either decompose a motif with multiple residues to a sequence of motives each containing a single residue, or if applied to a sequence of atoms merge atoms to a single motif per residue.
* <code>Residues("HEM").AmbientResidues(2).ToResidues()</code> – Returns the sequence of individual residues from the 2Å surrounding of the HEM residue, inclusive HEM.
* <code>Atoms("C").ToResidues()</code> – Returns a sequence of motifs. Each motif contains only carbon atoms grouped together according to their parent residue.
* ''Type: ToResidues(motives: Motives) -> Motives''
====Union()====
* '''[1hho]''' Merges the sequence of input motives to a single motiv.
* <code>Residues("HEM").ConnectedResidues(1).Union()</code> –  Takes two HEM residues of the 1hho protein with covalently bonded residues (2 motives) and merges them into a single motiv.
* ''Type: Union(motives: Motives) -> Motives
====RegularMotives()====
* From the primary sequence extracts motives based on the input regular expression. Please note, that MQ does not check whether there is a gap in the chain e.g. if HIS 28 is followed by the residue 30 ALA query ‘HA’ returns positive match for such example.
* <code> RegularMotives("HH")</code> – Finds two consecutive histidine residues.
* <code> RegularMotives("G.{1,2}G")</code> – Finds two glycine residues separated by one or two other residues.
* <code> RegularMotives("G.{1,2}G").Filter(lambda m: m.IsConnected())</code> – Finds two glycine residues separated by one or two residues and verifies that all of them are bonded.
* ''Type: RegularMotives(regex: Value) -> Motives''
====Cluster()====
* Clusters identified results to a single motive based on their distance in angstroms. On contrary to the <code>Near()</code> query <code>Cluster()</code> does not provide a count check. See example below.
* <code>Cluster(5, Residues("Ala"))</code>  – Returns all the alanine residues which are at most 5Å distant to each other.
* <code>Cluster(3, RingAtoms(Atoms("N"), Rings(2 * ["C"] + ["N"] + ["C"] + ["N"])))</code> – Returns all the nitrogen atoms from particular rings to a single motif in case the atoms are at most 3Å distant to each other.
* <code>Cluster(2 , Atoms("C"), Atoms("C"))</code> – Returns all the carbon atoms which are pairwise closer than 2Å. In case any other carbon atom would be inside this 2Å sphere, it is also included in the result. Therefore, if you insist on exactly 2 carbon atoms to be returned, use <code>Near()</code> query instead.
* ''Type: Cluster(r: Number, motives: Motives+) -> Motives''
===Boolean operations===
====Or()====
* '''[2hhb]''' Merges up to n different motives together.
* <code>Or(AminoAcids(ChargeType="Negative"),AminoAcids(ChargeType= "Positive"))</code> – returns all charged amino acids both positively charged and negatively charged.
* ''Type: Or(motives: Motives+) -> Motives''
===Topology function===
====AmbientAtoms()====
* '''[1hho]''' Returns all the atoms, which are within n Å from the geometrical center of a given motif.
* <code>Atoms("Fe").AmbientAtoms(4)</code> – returns all the atoms, which are closer than 4Å from the center of mass of iron atom.
* ''Type: AmbientAtoms(motive: Motives, r: Number) -> Motives''
====AmbientResidues()====
* '''[1hho]''' Returns all the residues, which are within n Å from the geometrical center of a given motif.
* <code>Residues("HEM").AmbientResidues(4)</code> – returns all the residues, which are closer than 4Å from the center of mass of HEM residues.
* ''Type: AmbientResidues(motive: Motives, r: Number) -> Motives'
====Near()====
* '''[1hho]''' Clusters all the specified motives, which are pairwise closer in Ångströms than a specified argument. Additionally, it checks if the motive contains exactly specified motives. On the contrary to the Cluster() query this does the ‘count check’.
* <code>Near(0, Rings(6*['C']), Rings(4*['C'] + ['N']))</code> – Returns motives containing only purine. In this example the purine part of tryptophan side chain will be returned.
* <code>Near(2 , Atoms("C"), Atoms("C"), Atoms("C"), Atoms("C"))</code> – Returns motives which contain exactly 4 carbon atoms within a sphere of 2Å.
* ''Type: Near(r: Number, motives: Motives+) -> Motives''
===Filtering===
====Filter()====
* '''[1hho]''' Basically, an argument of the function is a Boolean condition, which is evaluated of each and every motive of input sequence of motives. Referred to as ‘m’ in the following text. Only results satisfying the condition are returned, others are filtered out. Technically it uses lambda abstraction for filtering a collection of input motives, the usage is the same as in the Python programming language, and therefore, your previous Python experiences are beneficial.
* <code>Residues().Filter(lambda m: m.IsConnectedTo(Atoms("Fe"))) </code>– returns a set of motives from all residues which are covalently bonded to the iron atom (excluded the HEM residue).
* <code> Residues().Filter(lambda m: m.Count(Atoms("O")) == 2)</code> returns all the residues containing exactly two oxygen atoms.
* ''Type: Filter(motives: Motives, filter: Motive->Bool) -> Motives''
====Count()====
* '''[1hho]''' Usually it is convenient to utilize this function inside a filtering query. The Count() query counts the number of occurrences of a motive inside a different motive.
* <code>Residues("HEM").ConnectedResidues(2).Filter(lambda m: m.Count(Atoms("S")) == 1)</code> – Returns motives composed of a HEM residue surrounded by two layers of bonded residues in case that the whole motive contains exactly one sulphur atom.
* <code>Residues("CYS").ConnectedResidues(1).Filter(lambda m: m.Count(Residues("VAL")) == 2)</code> – Returns a cysteine residue which is surrounded from both sides by valine residues.
* ''Type: Count(where: Motive, what: Motives) -> Integer''.
====Contains()====
* '''[4hhb]''' <code>Contains()</code> query checks if the input motive contains a specified motive of interest. In other words Contains() query is similar to a query, where the number of occurrences is higher than zero <code>(Count() > 0)</code>
* <code>Residues("HEM").AmbientResidues(2).Filter(lambda m: m.Contains(Residues("HIS")))</code> – Returns motives, where any atom of HEM residue is at most 2Å distant from the histidine.
* <code>Residues().Filter(lambda m: m.Contains(Atoms("S")))</code> – Returns all the residue which has a sulphur incorporated in their structure. For this particular example it is similar to the query <code>Residues("CYS", "MET")</code>.
* ''Type: Contains(where: Motive, what: Motives) -> Bool''
===Connectivity===
====IsConnected()====
* '''[4m9e, 4m9v]''' Checks, whether a particular motif is composed of a single component
* <code>Atoms("Zn").AmbientAtoms(4).Filter(lambda m: m.IsConnected())</code> – Atoms which are at most 4 Å distant from the zinc atom and are all binded together, i.e. there is no outlier.
* For comparison please compare with the results of the query <code>Atoms("Zn").AmbientAtoms(4)</code>
* ''Type: IsConnected(motive: Motive) -> Bool.''
====IsConnectedTo()====
* '''[1hho]''' Checks if the two provided motives are connected one to another.
* <code>Residues("ALA").Filter(lambda m: m.IsConnectedTo(Residues("GLY")))</code> – Returns all the alanine residues which are directly connected to the glycine residues.
* ''Type: IsConnectedTo(current: Motive, motive: Motives) -> Bool''
====IsNotConnectedTo()====
* ''[1hho]'' Checks if the two provided motives are NOT connected one to another.
* <code>Residues("ALA").Filter(lambda m: m.IsNotConnectedTo(Residues("GLY")))</code> – Returns all the alanine residues which are not directly connected to the glycine residues.
* ''Type: IsNotConnectedTo(current: Motive, motive: Motives) -> Bool''
====ConnectedAtoms()====
* '''[1hho]''' Returns n directly bonded layer of atoms to a given motive.
* <code>Atoms("Fe").ConnectedAtoms(1)</code> – The iron atom and all the atoms which are covalently bonded to it over a single bond.
* <code>Atoms("Fe"). ConnectedAtoms (2)</code> – Previous selection and all the atoms which are covalently bonded to them (i.e. additional layer of bonded atom). In other words the output composes of all the atoms which are 2 bonds away from the iron atom.
* ''Type: ConnectedAtoms(motive: Motives, n: Integer) -> Motives''
====ConnectedResidues()====
* '''[1hho]''' Returns n directly bonded layer of residues to a given motive.
* <code>Residues("HEM").ConnectedResidues(1)</code> – The HEM residue and all the residues which are covalently bonded to any atom of the HEM residue.
* <code>Residues("HEM").ConnectedResidues(2)</code> – previous selection and all the residues which are covalently bonded to them (i.e. additional layer of bonded residues).
* ''Type: ConnectedResidues(motive: Motives, n: Integer) -> Motives''
==Selected Use Cases==
==== Find all post-translational modified aminoacids ====
==== Find all post-translational modified aminoacids ====
*i.e. Those incodporated in the protein backbone and not hetero atoms
*i.e. Those incorporated in the protein backbone and not hetero atoms
  <pre>NotAminoAcids().Filter(lambda m: m.Count(HetResidues()) == 0) </pre>
  <syntaxhighlight lang="python">
This query queries all the non-standard amino acids for their presence among Hetatom entries. Equivalently:**<pre>NotAminoAcids().Filter(lambda m: m.Contains(HetResidues()).Not())</pre>
NotAminoAcids().
 
    Filter(lambda m: m.Count(HetResidues()) == 0) </syntaxhighlight>
 
This query queries all the non-standard amino acids for their presence among Hetatom entries. Equivalently:<syntaxhighlight lang="python">
NotAminoAcids().
    Filter(lambda m: m.Contains(HetResidues()).Not())</syntaxhighlight>


====Find all heteroatoms, which are not covalently bonded to the protein structure====
====Find all heteroatoms, which are not covalently bonded to the protein structure====
* Takes all the heteroatoms and queries them for being connected to any amino acid of a given protein
* Takes all the heteroatoms and queries them for being connected to any amino acid of a given protein
<pre>HetResidues().Filter(lambda m: m.IsNotConnectedTo(AminoAcids()))</pre>
<syntaxhighlight lang="python">
 
HetResidues().
 
    Filter(lambda m: m.IsNotConnectedTo(AminoAcids()))</syntaxhighlight>


====Identify Zinc fingers====
====Identify Zinc fingers====
*There is a variety of different zinc fingers based on the surrounding residues, in our example we will focus on those comprising two zinc and two his residues (Cys2His2).
*There is a variety of different zinc fingers based on the surrounding residues, in our example we will focus on those comprising two zinc and two his residues (Cys2His2).


<pre>Atoms("Zn").ConnectedResidues(1).Filter(lambda m: (m.Count(Residues("His")) == 2) & (m.Count(Residues("Cys")) == 2)) </pre>
<syntaxhighlight lang="python">
 
Atoms("Zn").
At first the zinc atoms are selected together with their bonded residues. Additionally, these motives are filtered according to the content of their amino acids.
    ConnectedResidues(1).
 
    Filter(lambda m:  
      (m.Count(Residues("His")) == 2) & (m.Count(Residues("Cys")) == 2)) </syntaxhighlight>


At first the zinc atoms are selected together with their bonded residues. Additionally, these patterns are filtered according to the content of their amino acids.


====Identify all the residues, which contain a sugar ring====
====Identify all the residues, which contain a sugar ring====
* This task can be decomposed to two individual subtasks, since sugars contain either pentose or furanose ring. Pentose ring contains 4 carbon and an oxygen atom. Similarly, furanose ring is composed of 5 carbon atoms and an oxygen atom.
* This task can be decomposed to two individual subtasks, since sugars contain either pentose or furanose ring. Pentose ring contains 4 carbon and an oxygen atom. Similarly, furanose ring is composed of 5 carbon atoms and an oxygen atom.


<pre>Or(Rings(4 * ["C"] + ["O"]) .ConnectedResidues(0), Rings(5 * ["C"] + ["O"]) .ConnectedResidues(0)) </pre>
<syntaxhighlight lang="python">
 
Or(Rings(4 * ["C"] + ["O"]).ConnectedResidues(0),  
By specifying the <code>Ring()</code> queries, we select only the ring part of the molecule. By extending the <code>Ring()</code> query with <code>ConnectedResidues(0)</code> only the residue which includes this ring is selected. Last but not least we can join both queries with <code>Or()</code> in order to merge results.
  Rings(5 * ["C"] + ["O"]).ConnectedResidues(0)) </syntaxhighlight>
 
 
 
====Identify all binding sites of PA-IIL lectine in different organisms====
*Binding sites of this type of lectine comprise of two calcium atoms close to each other and a binded sugar residue. 
<pre>Cluster(4, Atoms("Ca")).Filter(lambda m: m.Count(Atoms("Ca")) == 2).ConnectedResidues(1).Filter(lambda m: (m.Count(Rings(5 * ["C"] + ["O"])) > 0) | (m.Count(Rings(4 * ["C"] + ["O"])) > 0)) </pre>


At first we select all the calcium atoms and clusters them, if they are in a vicinity of 4Å and less. Additionally, we check if there are exactly 2 such atoms. Subsequently all the binded residues are found and finally, only the motives containing either pentose <code>(Rings(5 * ["C"] + ["O"]))</code> or furanose <code>(Rings(4 * ["C"] + ["O"]))</code> are returned.  
By specifying the <code>Ring()</code> queries, we select only the ring part of the molecule. By extending the <code>Ring()</code> query with <syntaxhighlight lang="python">ConnectedResidues(0)</syntaxhighlight> only the residue which includes this ring is selected. Last but not least we can join both queries with <syntaxhighlight lang="python">Or()</syntaxhighlight> in order to merge results.


Equivalently:
====Identify all binding sites of PA-IIL lectin in different organisms====
*Binding sites of this type of lectin comprise of two calcium atoms close to each other and a binded sugar residue. 
<syntaxhighlight lang="python">
Near(4, Atoms("Ca"), Atoms("Ca"))
  .ConnectedResidues(1)
  .Filter(lambda l:
    l.Count(Or(Rings(5 * ["C"] + ["O"]), Rings(4 * ["C"] + ["O"]))) > 0)
  .Filter(lambda l: l.Count(Atoms("P")) == 0)</syntaxhighlight>


<pre>Cluster(4, Atoms("Ca")).Filter(lambda m: m.Count(Atoms("Ca")) == 2).ConnectedResidues(1).Filter(lambda m: m.Contains(Or((Rings(5 * ["C"] + ["O"])), Rings(4 * ["C"] + ["O"]))))</pre>
At first we select all the pairs of calcium atoms, if they are in a vicinity of 4Å and less by <syntaxhighlight lang="python">Near(4, Atoms("Ca"), Atoms("Ca"))</syntaxhighlight> Subsequently all the bonded residues are checked if they contain either pyranose or furanose ring. only the patterns containing either pentose <syntaxhighlight lang="python">(Rings(5 * ["C"] + ["O"]))</syntaxhighlight> or furanose <syntaxhighlight lang="python">(Rings(4 * ["C"] + ["O"]))</syntaxhighlight> are returned. Since a sugar moiety is an integral part of nucleotides, there is a final simple check, assuring, that no patterns containing phosphorus, i.e. nucleotide are retained.

Latest revision as of 13:42, 25 April 2015

In this section you can find several biologically relevant examples of different queries.

Find all post-translational modified aminoacids

[edit]
  • i.e. Those incorporated in the protein backbone and not hetero atoms
NotAminoAcids().
    Filter(lambda m: m.Count(HetResidues()) == 0)

This query queries all the non-standard amino acids for their presence among Hetatom entries. Equivalently:

NotAminoAcids().
    Filter(lambda m: m.Contains(HetResidues()).Not())

Find all heteroatoms, which are not covalently bonded to the protein structure

[edit]
  • Takes all the heteroatoms and queries them for being connected to any amino acid of a given protein
HetResidues().
    Filter(lambda m: m.IsNotConnectedTo(AminoAcids()))

Identify Zinc fingers

[edit]
  • There is a variety of different zinc fingers based on the surrounding residues, in our example we will focus on those comprising two zinc and two his residues (Cys2His2).
Atoms("Zn").
    ConnectedResidues(1).
    Filter(lambda m: 
      (m.Count(Residues("His")) == 2) & (m.Count(Residues("Cys")) == 2))

At first the zinc atoms are selected together with their bonded residues. Additionally, these patterns are filtered according to the content of their amino acids.

Identify all the residues, which contain a sugar ring

[edit]
  • This task can be decomposed to two individual subtasks, since sugars contain either pentose or furanose ring. Pentose ring contains 4 carbon and an oxygen atom. Similarly, furanose ring is composed of 5 carbon atoms and an oxygen atom.
Or(Rings(4 * ["C"] + ["O"]).ConnectedResidues(0), 
   Rings(5 * ["C"] + ["O"]).ConnectedResidues(0))

By specifying the Ring() queries, we select only the ring part of the molecule. By extending the Ring() query with

ConnectedResidues(0)

only the residue which includes this ring is selected. Last but not least we can join both queries with

Or()

in order to merge results.

Identify all binding sites of PA-IIL lectin in different organisms

[edit]
  • Binding sites of this type of lectin comprise of two calcium atoms close to each other and a binded sugar residue.
Near(4, Atoms("Ca"), Atoms("Ca"))
  .ConnectedResidues(1)
  .Filter(lambda l:
    l.Count(Or(Rings(5 * ["C"] + ["O"]), Rings(4 * ["C"] + ["O"]))) > 0)
  .Filter(lambda l: l.Count(Atoms("P")) == 0)

At first we select all the pairs of calcium atoms, if they are in a vicinity of 4Å and less by

Near(4, Atoms("Ca"), Atoms("Ca"))

Subsequently all the bonded residues are checked if they contain either pyranose or furanose ring. only the patterns containing either pentose

(Rings(5 * ["C"] + ["O"]))

or furanose

(Rings(4 * ["C"] + ["O"]))

are returned. Since a sugar moiety is an integral part of nucleotides, there is a final simple check, assuring, that no patterns containing phosphorus, i.e. nucleotide are retained.