Jump to content

PatternQuery:How to build a query: Difference between revisions

From WebChemistry Wiki
 
(13 intermediate revisions by the same user not shown)
Line 1: Line 1:
<div class="toclimit-4">
<div class="toclimit-3">


In order to tune up your queries before executing them on the whole database or just to try out an interactive work with the language, feel free to use [http://webchem.ncbr.muni.cz/Platform/MotiveQuery/Index '''Explorer'''] application, where you can upload a PDB molecule of choice, or load a random sample from the PDB database based on selected properties.
In order to tune up your queries before executing them on the whole database or just to work with the language interactively, feel free to use the [http://webchem.ncbr.muni.cz/Platform/PatternQuery/Index '''Explorer'''] application, where you can upload a PDB molecule of choice, or load a random sample from the PDB database based on selected properties.


You can either try one of our ready-to-use examples or try to make up one of your own.
You can either try one of our ready-to-use examples or try to make up one of your own.


==How to think about queries==
==How to think about queries==
When building queries, you have to decompose the problem in smaller chunks, think a query for each of the pieces and them wisely bind them together. Let us have a goal:
When building queries, you have to decompose the problem into smaller chunks, think of a query for each of the pieces and then wisely bind them together. Let us have a goal:


'''Get all the residues which are as much as 5Å away from any histidine residue and check if the given fragment contains at least 2 negatively charged amino acids.'''
'''Get all the residues which are as much as 5Å away from any histidine residue and check if the given pattern contains at least 2 negatively charged amino acids.'''
 
From scratch you can't probably think out a solution right now, but when you decompose the problem into separate individual sub problems, it is not so difficult after all. (Whenever you are unsure about the meaning of the queries, consult the [[MotiveQuery:Language Reference | language reference]]). For given problem, the decomposition is as follows:


From scratch you can't probably think out a solution right now, but when you decompose the problem into separate individual sub problems, it is not so difficult after all. (Whenever you are unsure about the meaning of the queries, consult the [[PatternQuery:Language Reference | language reference]]). For the given problem, the decomposition and the corresponding queries are as follows:
{{col-begin}}
{{col-2}}
#Select histidine residues
#Select histidine residues
#Select residual surrounding of a fragment up to 5Å.
#Select residual surrounding of a pattern up to 5Å.
#Find negatively charged residues.
#Find negatively charged residues.
#Count the amino acids.
#Count the amino acids.
#Filter the histidine plus its surrounding matching the condition above.
#Filter the histidine plus its surrounding matching the condition above.
 
{{col-2}}
Now we are ready for constructing individual queries:
#<code>[[PatternQuery:Language Reference#Residues | Residues("HIS")]]</code>
#<code>[[MotiveQuery:Language Reference#Residues | Residues("HIS")]]</code>
#<code>[[PatternQuery:Language Reference#AmbientResidues | AmbientResidues(5)]]</code>
#<code>[[MotiveQuery:Language Reference#AmbientResidues | AmbientResidues(5)]]</code>
#<code>[[PatternQuery:Language Reference#AminoAcids| AminoAcids(ChargeType="Negative")]]</code>
#<code>[[MotiveQuery:Language Reference#AminoAcids| AminoAcids(ChargeType="Negative")]]</code>
#<code>[[PatternQuery:Language Reference#Count| Count(residues)]]</code>
#<code>[[MotiveQuery:Language Reference#Count| Count(residues)]]</code>
#<code>[[PatternQuery:Language Reference#Filter| Filter(condition)]]</code>
#<code>[[MotiveQuery:Language Reference#Filter| Filter(condition)]]</code>
{{col-end}}
It wasn't that difficult, was it? When we have composed all the queries, we can compose them together in order to achieve our goal as highlighted in the query bellow and in the illustrative info-graphics:
It wasn't that difficult, was it? When we have composed all the queries, we can compose them together in order to achieve our goal as highlighted in the query bellow and in the illustrative info-graphics:


Line 33: Line 33:
</syntaxhighlight>
</syntaxhighlight>


[[Image:MotiveQuery-Representation.png | center]]
[[Image:PatternQuery-Representation.png | center]]


==Ready-to-use examples==
==Ready-to-use examples==


Now when we are aware of how to think about queries, you are ready to browse a plenty of different examples listed below. The text is separated to three plus one part which differs by data type queries are operated (Atoms, Residues and Fragments). The first two categories deal with only basic ''Atom'' or ''Residue'' selections. Outputs of these two types of queries are Atoms and Residues respectively.  Last category is Fragments, which contain numerous advanced queries demonstrating versatility and power of '''MQ'''. These queries operate on all results provided by both Atom and Residue queries. On the top of that, you can browse several use biologically relevant [[MotiveQuery:Use Cases | use cases]].
Now when we are aware of how to think about queries, you are ready to browse a plenty of different examples listed below. The text is separated into three plus one part which differs by data type queries are operated (Atoms, Residues and Patterns). The first two categories deal with only basic ''Atom'' or ''Residue'' selections. Outputs of these two types of queries are Atoms and Residues respectively.  Last category is 'Patterns', which contains a number of advanced queries, these queries demonstrate versatility and a power of '''PQ'''. These queries operate on all results provided by both Atom and Residue queries. On the top of that, you can browse several use biologically relevant [[PatternQuery:Use Cases | use cases]].




===Structure of the text===
==Structure of the text==
*'''[PDB ID]''' Here you find an example PDB ID where you can try out the query with '''MotiveQuery Explorer''' and a rough description of the query function.
*'''[PDB id]''' Here you find an example PDB id where you can try out the query with '''PatternQuery Explorer''' and a rough description of the query function.
* This is followed by examples of this query with with the explanation such as <code>Residues("HEM")</code>. Copy this query to the command text box in '''MotiveQuery Explorer''' application and immediately see the results.
* This is followed by examples of this query with with the explanation such as <syntaxhighlight lang="python" inline="">Residues("HEM")</syntaxhighlight>. Copy this query to the command text box in '''PatternQuery Explorer''' application and immediately see the results.
* Type of data query operate on and the expected returned value.
* Type of data query operate on and the expected returned value.
* e.g. ''Type: Atoms(symbols: String*) -> Atoms''. query <code>[[MotiveQuery:Language Reference#Atoms | Atoms()]]</code> take 0..n strings, representing elements, in parenthesis ("C", "N", etc.) and based on the input returns a list of individual atoms.
* e.g. ''Type: Atoms(symbols: String*) -> Atoms''. query <code>[[PatternQuery:Language Reference#Atoms | Atoms()]]</code> take 0..n strings, representing elements, in parenthesis ("C", "N", etc.) and based on the input returns a list of individual atoms.
 


=Queries=
=Queries=


==Atoms==
==Atoms==
=== Basic queries ===
{{big | Basic queries}}
====Atoms()====
====Atoms()====
* '''[PDB id: 2hhb]''' Returns a sequence of individual atoms based on element type provided in the argument. More elements can be specified, if separated by a comma. In case no argument is provided a list of all the atoms is returned.
* '''[PDB id: 2hhb]''' Returns a sequence of individual atoms based on element type provided in the argument. More elements can be specified, if separated by a comma. In case no argument is provided a list of all the atoms is returned.
*<code>Atoms("Fe")</code> - Returns all iron atoms in the given structure.
*<syntaxhighlight lang="python" inline="">Atoms("Fe")</syntaxhighlight> - Returns all iron atoms in the given structure.
*<code>Atoms("Fe", "N")</code> - Returns all iron and nitrogen atoms in the given structure.
*<syntaxhighlight lang="python" inline="">Atoms("Fe", "N")</syntaxhighlight> - Returns all iron and nitrogen atoms in the given structure.
* ''Type: Atoms(symbols: String*) -> Atoms.''
* ''Type: Atoms(symbols: String*) -> Atoms.''


Line 60: Line 59:
====AtomNames()====
====AtomNames()====
* '''[PDB id: 2hhb]''' Returns a sequence of atoms with defined name or names.
* '''[PDB id: 2hhb]''' Returns a sequence of atoms with defined name or names.
* <code>AtomNames("CA")</code> – Returns all CA atoms.
* <syntaxhighlight lang="python" inline="">AtomNames("CA")</syntaxhighlight> – Returns all CA atoms.
* <code>AtomNames("CA", "N")</code> –Returns atoms with names CA (C? carbon) or N (terminal part of amino acids).
* <syntaxhighlight lang="python" inline="">AtomNames("CA", "N")</syntaxhighlight> –Returns atoms with names CA (C? carbon) or N (terminal part of amino acids).
* ''Type: AtomNames(names: String+) -> Atoms.''
* ''Type: AtomNames(names: String+) -> Atoms.''


Line 67: Line 66:
====AtomIds()====
====AtomIds()====
* '''[PDB id: 2hhb]''' Returns a sequence of atoms with given id or ids
* '''[PDB id: 2hhb]''' Returns a sequence of atoms with given id or ids
* <code>AtomIds(1)</code> – Returns atom with id=1 from the given structures.
* <syntaxhighlight lang="python" inline="">AtomIds(1)</syntaxhighlight> – Returns atom with id=1 from the given structures.
* <code>AtomIds(1,2,5) </code>- Returns atoms with id=1, 2 and 5 from the given structures.
* <syntaxhighlight lang="python" inline="">AtomIds(1,2,5) </syntaxhighlight>- Returns atoms with id=1, 2 and 5 from the given structures.
* ''Type: AtomIds(ids: Integer+) -> Atoms.''
* ''Type: AtomIds(ids: Integer+) -> Atoms.''


Line 74: Line 73:
====AtomIdRange()====
====AtomIdRange()====
* '''[PDB id: 2hhb]''' Returns a sequence of atoms with ids from a given range (inclusive specified indices).
* '''[PDB id: 2hhb]''' Returns a sequence of atoms with ids from a given range (inclusive specified indices).
* <code>AtomIdRange(1, 10)</code> – returns 10 atoms with IDs from the interval <1, 10>, as specified in the input file.
* <syntaxhighlight lang="python" inline="">AtomIdRange(1, 10)</syntaxhighlight> – returns 10 atoms with IDs from the interval <1, 10>, as specified in the input file.
* ''Type: AtomIdRange(minId: Integer, maxId: ?Integer) -> Atoms''
* ''Type: AtomIdRange(minId: Integer, maxId: ?Integer) -> Atoms''


====NotAtomNames()====
====NotAtomNames()====
* '''[PDB id: 2hhb]''' Returns a sequence of atoms which are not defined by an argument.
* '''[PDB id: 2hhb]''' Returns a sequence of atoms which are not defined by an argument.
* <code>NotAtomNames("C", "N", "CA", "O")</code> – returns all the atoms with names other than C, CA, N and O.  i.e. only the side chain atoms of the protein .
* <syntaxhighlight lang="python" inline="">NotAtomNames("C", "N", "CA", "O")</syntaxhighlight> – returns all the atoms with names other than C, CA, N and O.  i.e. only the side chain atoms of the protein .
* ''Type: NotAtomNames(names: String+) -> Atom.''
* ''Type: NotAtomNames(names: String+) -> Atom.''


====NotAtomIds()====
====NotAtomIds()====
* '''[PDB id: 2hhb]''' Returns a sequence of atoms which does not have a defined id
* '''[PDB id: 2hhb]''' Returns a sequence of atoms which does not have a defined id
* <code>NotAtomIds(1)</code> - Returns atom with id other but 1 from the given structures.
* <syntaxhighlight lang="python" inline="">NotAtomIds(1)</syntaxhighlight> - Returns atom with id other but 1 from the given structures.
* <code>NotAtomIds(1,2,5)</code> - Returns atoms with id other but 1,2 and 5 from the given structures.
* <syntaxhighlight lang="python" inline="">NotAtomIds(1,2,5)</syntaxhighlight> - Returns atoms with id other but 1,2 and 5 from the given structures.
* ''Type: NotAtomIds(ids: Integer+) -> Atoms.''
* ''Type: NotAtomIds(ids: Integer+) -> Atoms.''


Line 91: Line 92:
====NotAtoms()====
====NotAtoms()====
* '''[PDB id: 2hhb]''' Returns all the atoms not specified in the argument. More elements can be specified, if separated by a comma.
* '''[PDB id: 2hhb]''' Returns all the atoms not specified in the argument. More elements can be specified, if separated by a comma.
* <code>Atoms("Fe")</code> – returns all the atoms of the structure, but iron.
* <syntaxhighlight lang="python" inline="">Atoms("Fe")</syntaxhighlight> – returns all the atoms of the structure, but iron.
* <code>Atoms("Fe", "N")</code> returns all the atoms of the structure, but iron and nitrogen.  
* <syntaxhighlight lang="python" inline="">Atoms("Fe", "N")</syntaxhighlight> returns all the atoms of the structure, but iron and nitrogen.  
* ''Type: NotAtoms(symbols: String+) -> Atoms.''
* ''Type: NotAtoms(symbols: String+) -> Atoms.''


Line 98: Line 99:
====RingAtoms()====
====RingAtoms()====
* '''[PDB id: 2hhb]''' Returns specified atoms found on detected rings .
* '''[PDB id: 2hhb]''' Returns specified atoms found on detected rings .
* <code>RingAtoms(Atoms("N"), Rings(2 * ["C"] + ["N"] + ["C"] + ["N"]))</code> – Returns all the nitrogen atoms on the histidine side chain.
* <syntaxhighlight lang="python" inline="">RingAtoms(Atoms("N"), Rings(2 * ["C"] + ["N"] + ["C"] + ["N"]))</syntaxhighlight> – Returns all the nitrogen atoms on the histidine side chain.
* ''Type: RingAtoms(atom: Atoms, ring: ?Rings) -> Atoms.''
* ''Type: RingAtoms(atom: Atoms, ring: ?Rings) -> Atoms.''


==Residues==
==Residues==
=== Basic queries===
{{big | Basic queries}}
====Residues()====
====Residues()====
* '''[PDB id: 2hhb]''' Returns a sequence of individual residues specified by a function argument. More residues can be specified, if separated by a comma.
* '''[PDB id: 2hhb]''' Returns a sequence of individual residues specified by a function argument. More residues can be specified, if separated by a comma.
* <code>Residues("HEM")</code> – Returns a list of HEM residues.
* <syntaxhighlight lang="python" inline="">Residues("HEM")</syntaxhighlight> – Returns a list of HEM residues.
* <code>Residues ("HEM", "ALA")</code> – Returns a set of HEM and ALA residues.
* <syntaxhighlight lang="python" inline="">Residues ("HEM", "ALA")</syntaxhighlight> – Returns a set of HEM and ALA residues.
* ''Type: Residues(names: Value*) -> Residues.''
* ''Type: Residues(names: Value*) -> Residues.''




====NotResidues()====
====NotResidues()====
* '''[PDB id: 2hhb]''' Returns a sequence of residues which are not defined by the argument.
* '''[PDB id: 2hhb]''' Returns a sequence of residues which are not defined by the argument.
* <code>NotResidues("HEM")</code> – returns a set of residues, which does not have a HEM in their name.
* <syntaxhighlight lang="python" inline="">NotResidues("HEM")</syntaxhighlight> – returns a set of residues, which does not have a HEM in their name.
* ''Type: NotResidues(names: Value+) -> Residues.''
* ''Type: NotResidues(names: Value+) -> Residues.''




====ResidueIds()====
====ResidueIds()====
* '''[PDB id: 2hhb]''' Returns a sequence of specified residues in case the structure contains them. Each residue is represented by its PDB ID and chain such as “A 8”.
* '''[PDB id: 2hhb]''' Returns a sequence of specified residues in case the structure contains them. Each residue is represented by its PDB ID and chain such as “A 8”.
* <code>ResidueIds ("14 A",  "15 A")</code> – Returns 14th  and 15th residue of chain A.
* <syntaxhighlight lang="python" inline="">ResidueIds ("14 A",  "15 A")</syntaxhighlight> – Returns 14th  and 15th residue of chain A.
* ''Type: ResidueIds(ids: String+) -> Residues''.
* ''Type: ResidueIds(ids: String+) -> Residues''.


Line 126: Line 126:
====ResidueIdRange()====
====ResidueIdRange()====
* '''[PDB id: 2hhb]''' Returns a set of residues on a given chain from the lower to the upper index. In case a residue is not provided in the structure, it is skipped.  
* '''[PDB id: 2hhb]''' Returns a set of residues on a given chain from the lower to the upper index. In case a residue is not provided in the structure, it is skipped.  
* <code>ResidueIdRange("A", 50,  100)</code> – returns a set of residues on chain A from the ID 50 to 100.  
* <syntaxhighlight lang="python" inline="">ResidueIdRange("A", 50,  100)</syntaxhighlight> – returns a set of residues on chain A from the ID 50 to 100.  
* ''Type: ResidueIdRange(chain: String, min: Integer, max: Integer) -> Residues''.
* ''Type: ResidueIdRange(chain: String, min: Integer, max: Integer) -> Residues''.




====NotAminoAcids()====
====NotAminoAcids()====
* '''[PDB id: 2hhb]''' Returns a sequence of residues that are not among the 20 standard amino acids. Allowed values for an optional parameter NoWaters: True, False.
* '''[PDB id: 2hhb]''' Returns a sequence of residues that are not among the 20 standard amino acids. Allowed values for an optional parameter NoWaters: True, False.
* <code>NotAminoAcids()</code> – Returns all the nonstandard residues incorporated in the protein structure with the exception of HOH and WAT residues, which stands for solvent.
* <syntaxhighlight lang="python" inline="">NotAminoAcids()</syntaxhighlight> – Returns all the nonstandard residues incorporated in the protein structure with the exception of HOH and WAT residues, which stands for solvent.
* <code>NotAminoAcids(NoWaters=False)</code> – Returns all the nonstandard residues incorporated in the protein structure inclusive solvent (HOH and WAT residues).
* <syntaxhighlight lang="python" inline="">NotAminoAcids(NoWaters=False)</syntaxhighlight> – Returns all the nonstandard residues incorporated in the protein structure inclusive solvent (HOH and WAT residues).
* ''Type: NotAminoAcids() -> Residues''.
* ''Type: NotAminoAcids() -> Residues''.




====AminoAcids()====
====AminoAcids()====
*  '''[PDB id: 2hhb]''' Returns a sequence of residues that are among the 20 standard amino acids.
*  '''[PDB id: 2hhb]''' Returns a sequence of residues that are among the 20 standard amino acids.
* Allowed values: <code>Positive, Negative, Aromatic, Polar, NonPolar</code>
* Allowed values: <syntaxhighlight lang="python" inline="">Positive, Negative, Aromatic, Polar, NonPolar</syntaxhighlight>
* <code>AminoAcids()</code> – Returns all standard amino acids.
* <syntaxhighlight lang="python" inline="">AminoAcids()</syntaxhighlight> – Returns all standard amino acids.
* <code>AminoAcids(ChargeType="Polar")</code> – Returns all polar amino acids based on the type of their side chain.
* <syntaxhighlight lang="python" inline="">AminoAcids(ChargeType="Polar")</syntaxhighlight> – Returns all polar amino acids based on the type of their side chain.
* ''Type: AminoAcids() -> Residues.''
* ''Type: AminoAcids() -> Residues.''


====HetResidues()====
====HetResidues()====
* '''[PDB id: 2hhb]''' Returns a sequence of heteroatom residues as specified in input PDB files, excluding residues.
* '''[PDB id: 2hhb]''' Returns a sequence of heteroatom residues as specified in input PDB files, excluding residues.
* <code>HetResidues()</code> – A set of hetatom residues.
* <syntaxhighlight lang="python" inline="">HetResidues()</syntaxhighlight> – A set of hetatom residues.
* ''Type: HetResidues() -> Residues''
* ''Type: HetResidues() -> Residues''


==Fragments==
 
===Structure specification===
==Patterns==
MQ can select residues or their parts based on their name or Id, however, the true power of MQ lies in its ability to utilize the chemical nature of the input structures and select fragments purely based on elements and the connectivity among them.
{{big | Structure specification}}
 
PQ can select residues or their parts based on their name or Id, however, the true power of PQ lies in its ability to utilize the geometrical and chemical nature of the input structures and select patterns purely based on elements, topology and the connectivity among them.




====Rings()====
====Rings()====
* '''[PDB id: 3d12]''' Returns all the rings in the protein structure specified by a user.  Any structural ring can be identified by concatenating individual elements the string is composed from.
* '''[PDB id: 3d12]''' Returns all the rings in the protein structure specified by a user.  Any structural ring can be identified by concatenating individual elements the string is composed from.
* <code>Rings(2 * ["C"] + ["N"] + ["C"] + ["N"])</code> – Returns the histidine aromatic ring.  
* <syntaxhighlight lang="python" inline="">Rings(2 * ["C"] + ["N"] + ["C"] + ["N"])</syntaxhighlight> – Returns the histidine aromatic ring.  
* <code>Rings(4 * ["C"] + ["O"])</code> – Returns pentose ring.  
* <syntaxhighlight lang="python" inline="">Rings(4 * ["C"] + ["O"])</syntaxhighlight> – Returns pentose ring.  
* ''Type: Rings(atoms: Value*) -> Ring''
* ''Type: Rings(atoms: Value*) -> Ring''




====ToAtoms()====
====ToAtoms()====
* '''[PDB id: 1hho]''' Converts the input fragment into a sequence of individual atoms, i.e. each fragment contains a single atom.
* '''[PDB id: 1hho]''' Converts the input pattern into a sequence of individual atoms, i.e. each pattern contains a single atom.
* <code>Residues("HEM").ToAtoms()</code> – Returns all the atoms of HEM residues as a sequence of individual atoms.
* <syntaxhighlight lang="python" inline="">Residues("HEM").ToAtoms()</syntaxhighlight> – Returns all the atoms of HEM residues as a sequence of individual atoms.
* ''Type: ToAtoms(fragments: Fragments) -> Fragments''
* ''Type: ToAtoms(patterns: Patterns) -> Patterns''




====ToResidues()====
====ToResidues()====
* '''[PDB id: 1hho]''' Converts the input fragment into a sequence of individual residues, i.e. It can either decompose a fragment with multiple residues to a sequence of fragments each containing a single residue, or if applied to a sequence of atoms merge atoms to a single fragment per residue.
* '''[PDB id: 1hho]''' Converts the input pattern into a sequence of individual residues, i.e. It can either decompose a pattern with multiple residues to a sequence of patterns each containing a single residue, or if applied to a sequence of atoms merge atoms to a single pattern per residue.
* <code>Residues("HEM").AmbientResidues(2).ToResidues()</code> – Returns the sequence of individual residues from the 2A surrounding of the HEM residue, inclusive HEM.
* <syntaxhighlight lang="python" inline="">Residues("HEM").AmbientResidues(2).ToResidues()</syntaxhighlight> – Returns the sequence of individual residues from the 2A surrounding of the HEM residue, inclusive HEM.
* <code>Atoms("C").ToResidues()</code> – Returns a sequence of fragments. Each fragment contains only carbon atoms grouped together according to their parent residue.
* <syntaxhighlight lang="python" inline="">Atoms("C").ToResidues()</syntaxhighlight> – Returns a sequence of patterns. Each pattern contains only carbon atoms grouped together according to their parent residue.
* ''Type: ToResidues(fragments: FragmentSeq) -> Fragments''
* ''Type: ToResidues(patterns: PatternSeq) -> Patterns''




====Union()====
====Union()====
* '''[PDB id: 1hho]''' Merges the sequence of input fragments to a single fragment.  
* '''[PDB id: 1hho]''' Merges the sequence of input patterns to a single pattern.  
* <code>Residues("HEM").ConnectedResidues(1).Union()</code> –  Takes two HEM residues of the 1hho protein with covalently bonded residues (2 fragments) and merges them into a single fragment.
* <syntaxhighlight lang="python" inline="">Residues("HEM").ConnectedResidues(1).Union()</syntaxhighlight> –  Takes two HEM residues of the 1hho protein with covalently bonded residues (2 patterns) and merges them into a single pattern.
* ''Type: Union(fragments: FragmentSeq) -> Fragments
* ''Type: Union(patterns: PatternSeq) -> Patterns''




====RegularMotifs()====
====RegularMotifs()====
* From the primary sequence extracts sequence motifs based on the input regular expression. Please note, that MQ does not check whether there is a gap in the chain e.g. if HIS 28 is followed by the residue 30 ALA query ‘HA’ returns positive match for such example.
* '''[PDB id: 1het]''' Sequence motifs is extracted from the primary sequence based on the input regular expression. Please note, that PQ does not check presence of a gap gap in the chain e.g. if HIS 28 is followed by the residue 30 ALA query ‘HA’ returns positive match for such example.
* Additionally, modified residues are treated as they were not modified, as long as the information about modification is properly annotated in the ''MODRES'' or ''_pdbx_struct_mod_residue'' fields. Therefore, a letter ''P'' stands for a proline residue so as all its modifications, i.e. '''HYP''' (hydroxyproline).  
* In case the information about posttranslational modifications is present in ''MODRES'' or ''_pdbx_struct_mod_residue'' fields, modified residues are treated as standard amino acids. Therefore, a letter ''P'' stands for a proline residue so as all its modifications, e.g. '''HYP''' (hydroxyproline).  
* <code> RegularMotifs("HH")</code> – Finds two consecutive histidine residues or their modifications.
* <syntaxhighlight lang="python" inline=""> RegularMotifs("HH")</syntaxhighlight> – Finds two consecutive histidine residues or their modifications.  
* <code> RegularMotifs("G.{1,2}G")</code> – Finds two glycine residues separated by one or two other residues.
* <syntaxhighlight lang="python" inline=""> RegularMotifs("G.{1,2}G")</syntaxhighlight> – Finds two glycine residues separated by one or two other residues.  
* <code> RegularMotifs("G.{1,2}G").Filter(lambda m: m.IsConnected())</code> – Finds two glycine residues separated by one or two residues and verifies that all of them are bonded.
* <syntaxhighlight lang="python" inline=""> RegularMotifs("G.{1,2}G").Filter(lambda m: m.IsConnected())</syntaxhighlight> – Finds two glycine residues separated by one or two residues and verifies that all of them are bonded.  
* <code> RegularMotifs(".P.").Filter(lambda l: l.Count(NotAminoAcids()) == 0)</code> - Finds a 3 consecutive residue and verifies that neither of them is outside the ''standard 20''.
* <syntaxhighlight lang="python" inline=""> RegularMotifs(".P.").Filter(lambda l: l.Count(NotAminoAcids()) == 0)</syntaxhighlight> - Finds 3 consecutive residues, where the middle one is proline and verifies that neither of them is outside the ''standard 20''.
* ''Type: RegularMotifs(regex: Value) -> Fragments''
* ''Type: RegularMotifs(regex: Value) -> Patterns''
 


====Cluster()====
====Cluster()====
* Clusters identified results to a single fragment based on their distance in angstroms. On contrary to the <code>Near()</code> query <code>Cluster()</code> does not provide a count check. See example below.
* Clusters identifies results to a single pattern based on their distance [A]. On contrary to the <syntaxhighlight lang="python" inline="">Near()</syntaxhighlight> query <syntaxhighlight lang="python" inline="">Cluster()</syntaxhighlight> does not provide a count check. See example below.
* <code>Cluster(5, Residues("Ala"))</code>  – Returns all the alanine residues which are at most 5A distant to each other.
* <syntaxhighlight lang="python" inline="">Cluster(5, Residues("Ala"))</syntaxhighlight>  – Returns all the alanine residues which are at most 5A distant to each other.
* <code>Cluster(3, RingAtoms(Atoms("N"), Rings(2 * ["C"] + ["N"] + ["C"] + ["N"])))</code> – Returns all the nitrogen atoms from particular rings to a single fragment in case the atoms are at most 3A distant to each other.  
* <syntaxhighlight lang="python" inline="">Cluster(3, RingAtoms(Atoms("N"), Rings(2 * ["C"] + ["N"] + ["C"] + ["N"])))</syntaxhighlight> – Returns all the nitrogen atoms from particular rings to a single pattern in case the atoms are at most 3A distant to each other.  
* <code>Cluster(2 , Atoms("C"), Atoms("C"))</code> – Returns all the carbon atoms which are pairwise closer than 2A. In case any other carbon atom would be inside this 2A sphere, it is also included in the result. Therefore, if you insist on exactly 2 carbon atoms to be returned, use <code>Near()</code> query instead.
* <syntaxhighlight lang="python" inline="">Cluster(2 , Atoms("C"), Atoms("C"))</syntaxhighlight> – Returns all the carbon atoms which are pairwise closer than . In case any other carbon atom would be inside this 2A sphere, it is also included in the result. Therefore, if you insist on exactly 2 carbon atoms to be returned, use <syntaxhighlight lang="python" inline="">Near()</syntaxhighlight> query instead.
* ''Type: Cluster(r: Number, fragments: FragmentSeq) -> Fragments''
* ''Type: Cluster(r: Number, patterns: PatternSeq) -> Patterns''




===Boolean operations===
===Boolean operations===
====Or()====
====Or()====
* '''[PDB id: 2hhb]''' Merges up to ''n'' different fragments together.
* '''[PDB id: 2hhb]''' Merges up to ''n'' different patterns together.
* <code>Or(AminoAcids(ChargeType="Negative"),AminoAcids(ChargeType= "Positive"))</code> – returns all charged amino acids both positively charged and negatively charged.
* <syntaxhighlight lang="python" inline="">Or(AminoAcids(ChargeType="Negative"), AminoAcids(ChargeType= "Positive"))</syntaxhighlight> – returns all charged amino acids both positively charged and negatively charged.
* ''Type: Or(fragments: FragmentSeq+) -> Fragments''
* ''Type: Or(patterns: PatternSeq+) -> Patterns''




===Topology function===
===Topology function===
====AmbientAtoms()====
====AmbientAtoms()====
* '''[PDB id: 1hho]''' Returns all the atoms, which are within n A from the geometrical center of a given fragment.
* '''[PDB id: 1hho]''' Returns all the atoms, which are within from the geometrical center of a given pattern.
* <code>Atoms("Fe").AmbientAtoms(4)</code> – returns all the atoms, which are closer than 4A from the center of mass of iron atom.
* <syntaxhighlight lang="python" inline="">Atoms("Fe").AmbientAtoms(4)</syntaxhighlight> – returns all the atoms, which are closer than from the center of mass of iron atom.
* ''Type: AmbientAtoms(fragment: FragmentSeq, r: Number) -> Fragments''
* ''Type: AmbientAtoms(pattern: PatternSeq, r: Number) -> Patterns''




====AmbientResidues()====
* '''[PDB id: 1hho]''' Returns all the residues, which are within nÅ from the geometrical center of a given pattern.
* <syntaxhighlight lang="python" inline="">Residues("HEM").AmbientResidues(4)</syntaxhighlight> – returns all the residues, which are closer than 4Å from the center of mass of HEM residues.
* ''Type: AmbientResidues(pattern: PatternSeq, r: Number) -> Patterns'''


====AmbientResidues()====
* '''[PDB id: 1hho]''' Returns all the residues, which are within n A from the geometrical center of a given fragment.
* <code>Residues("HEM").AmbientResidues(4)</code> – returns all the residues, which are closer than 4A from the center of mass of HEM residues.
* ''Type: AmbientResidues(fragment: FragmentSeq, r: Number) -> Fragments'


====Near()====
====Near()====
* '''[PDB id: 1hho]''' Clusters all the specified fragments, which are pairwise closer in Angströms than a specified argument. Additionally, it checks if the fragment contains exactly specified fragments. On the contrary to the Cluster() query this does the ‘count check’.
* '''[PDB id: 1hho]''' Clusters all the specified patterns, which are pairwise closer [Å] than a specified argument. Additionally, it checks if the pattern contains exactly specified patterns. On the contrary to the Cluster() query this does the ‘count check’.
* <code>Near(0, Rings(6*['C']), Rings(4*['C'] + ['N']))</code> – Returns fragments containing only purine. In this example the purine part of tryptophan side chain will be returned.
* <syntaxhighlight lang="python" inline="">Near(0, Rings(6*['C']), Rings(4*['C'] + ['N']))</syntaxhighlight> – Returns patterns containing only purine. In this example the purine part of tryptophan side chain will be returned.
* <code>Near(2 , Atoms("C"), Atoms("C"), Atoms("C"), Atoms("C"))</code> – Returns fragments which contain exactly 4 carbon atoms within a sphere of 2A.
* <syntaxhighlight lang="python" inline="">Near(2 , Atoms("C"), Atoms("C"), Atoms("C"), Atoms("C"))</syntaxhighlight> – Returns patterns which contain exactly 4 carbon atoms within a sphere of .
* ''Type: Near(r: Number, fragments: FragmentSeq+) -> Fragments''
* ''Type: Near(r: Number, patterns: PatternSeq+) -> Patterns''


====Inside()====
* '''[PDB id: 1hho]''' Inside query tries to identify given pattern within another pattern. It is particularly useful for location of specific atomic arrangements within a more general ones.
* <syntaxhighlight lang="python" inline="">Residues("Gly").Inside(Chains("A"))</syntaxhighlight> - Returns glycine residues found on the chain A.
* <syntaxhighlight lang="python" inline="">Atoms("S").Inside(HetResidues())</syntaxhighlight> - Returns sulphur atoms found in the heteroatom residues
* ''Type: Inside(patterns&#58; PatternSeq, where&#58; PatternSeq) -&gt; PatternSeq''


===Filtering===
===Filtering===
====Filter()====
====Filter()====
* '''[PDB id: 1hho]''' Basically, an argument of the function is a Boolean condition, which is evaluated of each and every fragment of input fragment sequence. Referred to as ‘m’ in the following text. Only results satisfying the condition are returned, others are filtered out. Technically, it uses lambda abstraction for filtering a collection of input fragments. The usage is the same as in the Python programming language, and therefore, your previous Python experiences are beneficial.  
* '''[PDB id: 1hho]''' Filtering is used for removing unwanted patterns from the result. A condition given as a function argument is evaluated for each pattern from input pattern sequence referred to as ‘m’ in the following text. Only results satisfying the condition are returned, others are filtered out. Technically, it uses lambda abstraction for filtering a collection of input patterns. The usage is the same as in the Python programming language, and therefore, your previous Python experiences are beneficial.  
* <code>Residues().Filter(lambda m: m.IsConnectedTo(Atoms("Fe"))) </code> – returns a set of fragments from all residues which are covalently bonded to the iron atom (excluded the HEM residue).
* <syntaxhighlight lang="python" inline="">Residues().Filter(lambda m: m.IsConnectedTo(Atoms("Fe"))) </syntaxhighlight> – returns a set of patterns from all residues which are covalently bonded to the iron atom (excluded the HEM residue).
* <code> Residues().Filter(lambda m: m.Count(Atoms("O")) == 2)</code> returns all the residues containing exactly two oxygen atoms.
* <syntaxhighlight lang="python" inline=""> Residues().Filter(lambda m: m.Count(Atoms("O")) == 2)</syntaxhighlight> returns all the residues containing exactly two oxygen atoms.
* ''Type: Filter(fragments: FragmentSeq, filter: Fragment->Bool) -> Fragments''
* ''Type: Filter(patterns: PatternSeq, filter: Pattern->Bool) -> Patterns''
 


====Count()====
====Count()====
* '''[PDB id: 1hho]''' Usually it is convenient to utilize this function inside a filtering query. <code>The Count()</code> query counts the number of occurrences of a fragment inside a different fragment.
* '''[PDB id: 1hho]''' Usually it is convenient to utilize this function inside a filtering query. The <syntaxhighlight lang="python" inline="">Count()</syntaxhighlight> query counts the number of occurrences of a pattern inside a different pattern.
* <code>Residues("HEM").ConnectedResidues(2).Filter(lambda m: m.Count(Atoms("S")) == 1)</code> – Returns fragments composed of a HEM residue surrounded by two layers of bonded residues in case that the whole fragment contains exactly one sulphur atom.
* <syntaxhighlight lang="python" inline="">Residues("HEM").ConnectedResidues(2).Filter(lambda m: m.Count(Atoms("S")) == 1)</syntaxhighlight> – Returns patterns composed of a HEM residue surrounded by two layers of bonded residues in case that the whole pattern contains exactly one sulphur atom.
* <code>Residues("CYS").ConnectedResidues(1).Filter(lambda m: m.Count(Residues("VAL")) == 2)</code> – Returns a cysteine residue which is surrounded from both sides by valine residues.
* <syntaxhighlight lang="python" inline="">Residues("CYS").ConnectedResidues(1).Filter(lambda m: m.Count(Residues("VAL")) == 2)</syntaxhighlight> – Returns a cysteine residue which is surrounded from both sides by valine residues.
* ''Type: Count(where: Fragment, what: FragmentSeq) -> Integer''.
* ''Type: Count(where: Pattern, what: PatternSeq) -> Integer''.
 
 


====Contains()====
====Contains()====
* '''[PDB id: 4hhb]''' <code>Contains()</code> query checks if the input fragment contains a specified fragment of interest. In other words Contains() query is similar to a query, where the number of occurrences is higher than zero <code>(Count() > 0)</code>
* '''[PDB id: 4hhb]''' <syntaxhighlight lang="python" inline="">Contains()</syntaxhighlight> query checks if the input pattern contains a specified pattern of interest. In other words Contains() query is similar to a query, where the number of occurrences is higher than zero <syntaxhighlight lang="python" inline="">(Count() > 0)</syntaxhighlight>
* <code>Residues("HEM").AmbientResidues(2).Filter(lambda m: m.Contains(Residues("HIS")))</code> – Returns fragments, where any atom of HEM residue is at most 2A distant from the histidine.
* <syntaxhighlight lang="python" inline="">Residues("HEM").AmbientResidues(2).Filter(lambda m: m.Contains(Residues("HIS")))</syntaxhighlight> – Returns patterns, where any atom of HEM residue is at most 2A distant from the histidine.
* <code>Residues().Filter(lambda m: m.Contains(Atoms("S")))</code> – Returns all the residue which has a sulphur incorporated in their structure. For this particular example it is similar to the query <code>Residues("CYS", "MET")</code>.
* <syntaxhighlight lang="python" inline="">Residues().Filter(lambda m: m.Contains(Atoms("S")))</syntaxhighlight> – Returns all the residue which has a sulphur incorporated in their structure. For this particular example it is similar to the query <syntaxhighlight lang="python" inline="">Residues("CYS", "MET")</syntaxhighlight>.
* ''Type: Contains(where: Fragment, what: FragmentSeq) -> Bool''
* ''Type: Contains(where: Pattern, what: PatternSeq) -> Bool''




===Connectivity===
===Connectivity===
====IsConnected()====
====IsConnected()====
* '''[PDB id: 4m9e, 4m9v]''' Checks, whether a particular fragment is composed of a single component
* '''[PDB id: 4m9e, 4m9v]''' Checks, whether a particular pattern is composed of a single component
* <code>Atoms("Zn").AmbientAtoms(4).Filter(lambda m: m.IsConnected())</code> – Atoms which are at most 4 A distant from the zinc atom and are all binded together, i.e. there is no outlier.
* <syntaxhighlight lang="python" inline="">Atoms("Zn").AmbientAtoms(4).Filter(lambda m: m.IsConnected())</syntaxhighlight> – Atoms which are at most 4 A distant from the zinc atom and are all binded together, i.e. there is no outlier.
* For comparison please compare with the results of the query <code>Atoms("Zn").AmbientAtoms(4)</code>
* For comparison please compare with the results of the query <syntaxhighlight lang="python" inline="">Atoms("Zn").AmbientAtoms(4)</syntaxhighlight>
* ''Type: IsConnected(fragment: Fragment) -> Bool.''
* ''Type: IsConnected(pattern: Pattern) -> Bool.''
 


====IsConnectedTo()====
====IsConnectedTo()====
* '''[PDB id: 1hho]''' Checks if the two provided fragments are connected one to another.  
* '''[PDB id: 1hho]''' Checks if the two provided patterns are connected one to another.  
* <code>Residues("ALA").Filter(lambda m: m.IsConnectedTo(Residues("GLY")))</code> – Returns all the alanine residues and directly connected glycine residues.
* <syntaxhighlight lang="python" inline="">Residues("ALA").Filter(lambda m: m.IsConnectedTo(Residues("GLY")))</syntaxhighlight> – Returns all the alanine residues and directly connected glycine residues.
* ''Type: IsConnectedTo(current: Fragment, fragments: FragmentSeq) -> Bool''
* ''Type: IsConnectedTo(current: Pattern, patterns: PatternSeq) -> Bool''




====IsNotConnectedTo()====
====IsNotConnectedTo()====
* '''[PDB id: 1hho]''' Checks if the two provided fragments are NOT connected one to another.
* '''[PDB id: 1hho]''' Checks if the two provided patterns are NOT connected one to another.
* <code>Residues("ALA").Filter(lambda m: m.IsNotConnectedTo(Residues("GLY")))</code> – Returns all the alanine residues which are not directly connected to the glycine residues.
* <syntaxhighlight lang="python" inline="">Residues("ALA").Filter(lambda m: m.IsNotConnectedTo(Residues("GLY")))</syntaxhighlight> – Returns all the alanine residues which are not directly connected to the glycine residues.
* ''Type: IsNotConnectedTo(current: Fragment, fragments: FragmentSeq) -> Bool''
* ''Type: IsNotConnectedTo(current: Pattern, patterns: PatternSeq) -> Bool''
 




====ConnectedAtoms()====
====ConnectedAtoms()====
* '''[PDB id: 1hho]''' Returns ''n'' directly bonded layer of atoms to a given fragment.
* '''[PDB id: 1hho]''' Returns ''n'' directly bonded layer of atoms to a given pattern.
* <code>Atoms("Fe").ConnectedAtoms(1)</code> – The iron atom and all the atoms which are covalently bonded to it over a single bond.
* <syntaxhighlight lang="python" inline="">Atoms("Fe").ConnectedAtoms(1)</syntaxhighlight> – The iron atom and all the atoms which are covalently bonded to it over a single bond.
* <code>Atoms("Fe"). ConnectedAtoms (2)</code> – Previous selection and all the atoms which are covalently bonded to them (i.e. additional layer of bonded atom). In other words the output composes of all the atoms which are 2 bonds away from the iron atom.
* <syntaxhighlight lang="python" inline="">Atoms("Fe").ConnectedAtoms (2)</syntaxhighlight> – Previous selection and all the atoms which are covalently bonded to them (i.e. additional layer of bonded atom). In other words the output composes of all the atoms which are 2 bonds away from the iron atom.
* ''Type: ConnectedAtoms(fragment: FragmentSeq, n: Integer) -> Fragments''
* ''Type: ConnectedAtoms(pattern: PatternSeq, n: Integer) -> Patterns''
 




====ConnectedResidues()====
====ConnectedResidues()====
* '''[PDB id: 1hho]''' Returns n directly bonded layer of residues to a given fragment.
* '''[PDB id: 1hho]''' Returns n directly bonded layer of residues to a given pattern.
* <code>Residues("HEM").ConnectedResidues(1)</code> – The HEM residue and all the residues which are covalently bonded to any atom of the HEM residue.
* <syntaxhighlight lang="python" inline="">Residues("HEM").ConnectedResidues(1)</syntaxhighlight> – The HEM residue and all the residues which are covalently bonded to any atom of the HEM residue.
* <code>Residues("HEM").ConnectedResidues(2)</code> – previous selection and all the residues which are covalently bonded to them (i.e. additional layer of bonded residues).
* <syntaxhighlight lang="python" inline="">Residues("HEM").ConnectedResidues(2)</syntaxhighlight> – previous selection and all the residues which are covalently bonded to them (i.e. additional layer of bonded residues).
* ''Type: ConnectedResidues(fragment: FragmentSeq, n: Integer) -> Fragments''
* ''Type: ConnectedResidues(pattern: PatternSeq, n: Integer) -> Patterns''

Latest revision as of 12:57, 26 July 2016

In order to tune up your queries before executing them on the whole database or just to work with the language interactively, feel free to use the Explorer application, where you can upload a PDB molecule of choice, or load a random sample from the PDB database based on selected properties.

You can either try one of our ready-to-use examples or try to make up one of your own.

How to think about queries

[edit]

When building queries, you have to decompose the problem into smaller chunks, think of a query for each of the pieces and then wisely bind them together. Let us have a goal:

Get all the residues which are as much as 5Å away from any histidine residue and check if the given pattern contains at least 2 negatively charged amino acids.

From scratch you can't probably think out a solution right now, but when you decompose the problem into separate individual sub problems, it is not so difficult after all. (Whenever you are unsure about the meaning of the queries, consult the language reference). For the given problem, the decomposition and the corresponding queries are as follows:

It wasn't that difficult, was it? When we have composed all the queries, we can compose them together in order to achieve our goal as highlighted in the query bellow and in the illustrative info-graphics:

Residues("HIS").
  AmbientResidues(5).
  Filter(lambda l: l.Count(AminoAcids(ChargeType = "Negative")) >= 2)

Ready-to-use examples

[edit]

Now when we are aware of how to think about queries, you are ready to browse a plenty of different examples listed below. The text is separated into three plus one part which differs by data type queries are operated (Atoms, Residues and Patterns). The first two categories deal with only basic Atom or Residue selections. Outputs of these two types of queries are Atoms and Residues respectively. Last category is 'Patterns', which contains a number of advanced queries, these queries demonstrate versatility and a power of PQ. These queries operate on all results provided by both Atom and Residue queries. On the top of that, you can browse several use biologically relevant use cases.


Structure of the text

[edit]
  • [PDB id] Here you find an example PDB id where you can try out the query with PatternQuery Explorer and a rough description of the query function.
  • This is followed by examples of this query with with the explanation such as Residues("HEM"). Copy this query to the command text box in PatternQuery Explorer application and immediately see the results.
  • Type of data query operate on and the expected returned value.
  • e.g. Type: Atoms(symbols: String*) -> Atoms. query Atoms() take 0..n strings, representing elements, in parenthesis ("C", "N", etc.) and based on the input returns a list of individual atoms.

Queries

[edit]

Atoms

[edit]

Basic queries

Atoms()

[edit]
  • [PDB id: 2hhb] Returns a sequence of individual atoms based on element type provided in the argument. More elements can be specified, if separated by a comma. In case no argument is provided a list of all the atoms is returned.
  • Atoms("Fe") - Returns all iron atoms in the given structure.
  • Atoms("Fe", "N") - Returns all iron and nitrogen atoms in the given structure.
  • Type: Atoms(symbols: String*) -> Atoms.


AtomNames()

[edit]
  • [PDB id: 2hhb] Returns a sequence of atoms with defined name or names.
  • AtomNames("CA") – Returns all CA atoms.
  • AtomNames("CA", "N") –Returns atoms with names CA (C? carbon) or N (terminal part of amino acids).
  • Type: AtomNames(names: String+) -> Atoms.


AtomIds()

[edit]
  • [PDB id: 2hhb] Returns a sequence of atoms with given id or ids
  • AtomIds(1) – Returns atom with id=1 from the given structures.
  • AtomIds(1,2,5)- Returns atoms with id=1, 2 and 5 from the given structures.
  • Type: AtomIds(ids: Integer+) -> Atoms.


AtomIdRange()

[edit]
  • [PDB id: 2hhb] Returns a sequence of atoms with ids from a given range (inclusive specified indices).
  • AtomIdRange(1, 10) – returns 10 atoms with IDs from the interval <1, 10>, as specified in the input file.
  • Type: AtomIdRange(minId: Integer, maxId: ?Integer) -> Atoms


NotAtomNames()

[edit]
  • [PDB id: 2hhb] Returns a sequence of atoms which are not defined by an argument.
  • NotAtomNames("C", "N", "CA", "O") – returns all the atoms with names other than C, CA, N and O. i.e. only the side chain atoms of the protein .
  • Type: NotAtomNames(names: String+) -> Atom.


NotAtomIds()

[edit]
  • [PDB id: 2hhb] Returns a sequence of atoms which does not have a defined id
  • NotAtomIds(1) - Returns atom with id other but 1 from the given structures.
  • NotAtomIds(1,2,5) - Returns atoms with id other but 1,2 and 5 from the given structures.
  • Type: NotAtomIds(ids: Integer+) -> Atoms.


NotAtoms()

[edit]
  • [PDB id: 2hhb] Returns all the atoms not specified in the argument. More elements can be specified, if separated by a comma.
  • Atoms("Fe") – returns all the atoms of the structure, but iron.
  • Atoms("Fe", "N") returns all the atoms of the structure, but iron and nitrogen.
  • Type: NotAtoms(symbols: String+) -> Atoms.


RingAtoms()

[edit]
  • [PDB id: 2hhb] Returns specified atoms found on detected rings .
  • RingAtoms(Atoms("N"), Rings(2 * ["C"] + ["N"] + ["C"] + ["N"])) – Returns all the nitrogen atoms on the histidine side chain.
  • Type: RingAtoms(atom: Atoms, ring: ?Rings) -> Atoms.


Residues

[edit]

Basic queries

Residues()

[edit]
  • [PDB id: 2hhb] Returns a sequence of individual residues specified by a function argument. More residues can be specified, if separated by a comma.
  • Residues("HEM") – Returns a list of HEM residues.
  • Residues ("HEM", "ALA") – Returns a set of HEM and ALA residues.
  • Type: Residues(names: Value*) -> Residues.


NotResidues()

[edit]
  • [PDB id: 2hhb] Returns a sequence of residues which are not defined by the argument.
  • NotResidues("HEM") – returns a set of residues, which does not have a HEM in their name.
  • Type: NotResidues(names: Value+) -> Residues.


ResidueIds()

[edit]
  • [PDB id: 2hhb] Returns a sequence of specified residues in case the structure contains them. Each residue is represented by its PDB ID and chain such as “A 8”.
  • ResidueIds ("14 A", "15 A") – Returns 14th and 15th residue of chain A.
  • Type: ResidueIds(ids: String+) -> Residues.


ResidueIdRange()

[edit]
  • [PDB id: 2hhb] Returns a set of residues on a given chain from the lower to the upper index. In case a residue is not provided in the structure, it is skipped.
  • ResidueIdRange("A", 50, 100) – returns a set of residues on chain A from the ID 50 to 100.
  • Type: ResidueIdRange(chain: String, min: Integer, max: Integer) -> Residues.


NotAminoAcids()

[edit]
  • [PDB id: 2hhb] Returns a sequence of residues that are not among the 20 standard amino acids. Allowed values for an optional parameter NoWaters: True, False.
  • NotAminoAcids() – Returns all the nonstandard residues incorporated in the protein structure with the exception of HOH and WAT residues, which stands for solvent.
  • NotAminoAcids(NoWaters=False) – Returns all the nonstandard residues incorporated in the protein structure inclusive solvent (HOH and WAT residues).
  • Type: NotAminoAcids() -> Residues.


AminoAcids()

[edit]
  • [PDB id: 2hhb] Returns a sequence of residues that are among the 20 standard amino acids.
  • Allowed values: Positive, Negative, Aromatic, Polar, NonPolar
  • AminoAcids() – Returns all standard amino acids.
  • AminoAcids(ChargeType="Polar") – Returns all polar amino acids based on the type of their side chain.
  • Type: AminoAcids() -> Residues.


HetResidues()

[edit]
  • [PDB id: 2hhb] Returns a sequence of heteroatom residues as specified in input PDB files, excluding residues.
  • HetResidues() – A set of hetatom residues.
  • Type: HetResidues() -> Residues


Patterns

[edit]

Structure specification

PQ can select residues or their parts based on their name or Id, however, the true power of PQ lies in its ability to utilize the geometrical and chemical nature of the input structures and select patterns purely based on elements, topology and the connectivity among them.


Rings()

[edit]
  • [PDB id: 3d12] Returns all the rings in the protein structure specified by a user. Any structural ring can be identified by concatenating individual elements the string is composed from.
  • Rings(2 * ["C"] + ["N"] + ["C"] + ["N"]) – Returns the histidine aromatic ring.
  • Rings(4 * ["C"] + ["O"]) – Returns pentose ring.
  • Type: Rings(atoms: Value*) -> Ring


ToAtoms()

[edit]
  • [PDB id: 1hho] Converts the input pattern into a sequence of individual atoms, i.e. each pattern contains a single atom.
  • Residues("HEM").ToAtoms() – Returns all the atoms of HEM residues as a sequence of individual atoms.
  • Type: ToAtoms(patterns: Patterns) -> Patterns


ToResidues()

[edit]
  • [PDB id: 1hho] Converts the input pattern into a sequence of individual residues, i.e. It can either decompose a pattern with multiple residues to a sequence of patterns each containing a single residue, or if applied to a sequence of atoms merge atoms to a single pattern per residue.
  • Residues("HEM").AmbientResidues(2).ToResidues() – Returns the sequence of individual residues from the 2A surrounding of the HEM residue, inclusive HEM.
  • Atoms("C").ToResidues() – Returns a sequence of patterns. Each pattern contains only carbon atoms grouped together according to their parent residue.
  • Type: ToResidues(patterns: PatternSeq) -> Patterns


Union()

[edit]
  • [PDB id: 1hho] Merges the sequence of input patterns to a single pattern.
  • Residues("HEM").ConnectedResidues(1).Union() – Takes two HEM residues of the 1hho protein with covalently bonded residues (2 patterns) and merges them into a single pattern.
  • Type: Union(patterns: PatternSeq) -> Patterns


RegularMotifs()

[edit]
  • [PDB id: 1het] Sequence motifs is extracted from the primary sequence based on the input regular expression. Please note, that PQ does not check presence of a gap gap in the chain e.g. if HIS 28 is followed by the residue 30 ALA query ‘HA’ returns positive match for such example.
  • In case the information about posttranslational modifications is present in MODRES or _pdbx_struct_mod_residue fields, modified residues are treated as standard amino acids. Therefore, a letter P stands for a proline residue so as all its modifications, e.g. HYP (hydroxyproline).
  • RegularMotifs("HH") – Finds two consecutive histidine residues or their modifications.
  • RegularMotifs("G.{1,2}G") – Finds two glycine residues separated by one or two other residues.
  • RegularMotifs("G.{1,2}G").Filter(lambda m: m.IsConnected()) – Finds two glycine residues separated by one or two residues and verifies that all of them are bonded.
  • RegularMotifs(".P.").Filter(lambda l: l.Count(NotAminoAcids()) == 0) - Finds 3 consecutive residues, where the middle one is proline and verifies that neither of them is outside the standard 20.
  • Type: RegularMotifs(regex: Value) -> Patterns


Cluster()

[edit]
  • Clusters identifies results to a single pattern based on their distance [A]. On contrary to the Near() query Cluster() does not provide a count check. See example below.
  • Cluster(5, Residues("Ala")) – Returns all the alanine residues which are at most 5A distant to each other.
  • Cluster(3, RingAtoms(Atoms("N"), Rings(2 * ["C"] + ["N"] + ["C"] + ["N"]))) – Returns all the nitrogen atoms from particular rings to a single pattern in case the atoms are at most 3A distant to each other.
  • Cluster(2 , Atoms("C"), Atoms("C")) – Returns all the carbon atoms which are pairwise closer than 2Å. In case any other carbon atom would be inside this 2A sphere, it is also included in the result. Therefore, if you insist on exactly 2 carbon atoms to be returned, use Near() query instead.
  • Type: Cluster(r: Number, patterns: PatternSeq) -> Patterns


Boolean operations

[edit]

Or()

[edit]
  • [PDB id: 2hhb] Merges up to n different patterns together.
  • Or(AminoAcids(ChargeType="Negative"), AminoAcids(ChargeType= "Positive")) – returns all charged amino acids both positively charged and negatively charged.
  • Type: Or(patterns: PatternSeq+) -> Patterns


Topology function

[edit]

AmbientAtoms()

[edit]
  • [PDB id: 1hho] Returns all the atoms, which are within nÅ from the geometrical center of a given pattern.
  • Atoms("Fe").AmbientAtoms(4) – returns all the atoms, which are closer than 4Å from the center of mass of iron atom.
  • Type: AmbientAtoms(pattern: PatternSeq, r: Number) -> Patterns


AmbientResidues()

[edit]
  • [PDB id: 1hho] Returns all the residues, which are within nÅ from the geometrical center of a given pattern.
  • Residues("HEM").AmbientResidues(4) – returns all the residues, which are closer than 4Å from the center of mass of HEM residues.
  • Type: AmbientResidues(pattern: PatternSeq, r: Number) -> Patterns'


Near()

[edit]
  • [PDB id: 1hho] Clusters all the specified patterns, which are pairwise closer [Å] than a specified argument. Additionally, it checks if the pattern contains exactly specified patterns. On the contrary to the Cluster() query this does the ‘count check’.
  • Near(0, Rings(6*['C']), Rings(4*['C'] + ['N'])) – Returns patterns containing only purine. In this example the purine part of tryptophan side chain will be returned.
  • Near(2 , Atoms("C"), Atoms("C"), Atoms("C"), Atoms("C")) – Returns patterns which contain exactly 4 carbon atoms within a sphere of 2Å.
  • Type: Near(r: Number, patterns: PatternSeq+) -> Patterns


Inside()

[edit]
  • [PDB id: 1hho] Inside query tries to identify given pattern within another pattern. It is particularly useful for location of specific atomic arrangements within a more general ones.
  • Residues("Gly").Inside(Chains("A")) - Returns glycine residues found on the chain A.
  • Atoms("S").Inside(HetResidues()) - Returns sulphur atoms found in the heteroatom residues
  • Type: Inside(patterns: PatternSeq, where: PatternSeq) -> PatternSeq

Filtering

[edit]

Filter()

[edit]
  • [PDB id: 1hho] Filtering is used for removing unwanted patterns from the result. A condition given as a function argument is evaluated for each pattern from input pattern sequence referred to as ‘m’ in the following text. Only results satisfying the condition are returned, others are filtered out. Technically, it uses lambda abstraction for filtering a collection of input patterns. The usage is the same as in the Python programming language, and therefore, your previous Python experiences are beneficial.
  • Residues().Filter(lambda m: m.IsConnectedTo(Atoms("Fe"))) – returns a set of patterns from all residues which are covalently bonded to the iron atom (excluded the HEM residue).
  • Residues().Filter(lambda m: m.Count(Atoms("O")) == 2) returns all the residues containing exactly two oxygen atoms.
  • Type: Filter(patterns: PatternSeq, filter: Pattern->Bool) -> Patterns


Count()

[edit]
  • [PDB id: 1hho] Usually it is convenient to utilize this function inside a filtering query. The Count() query counts the number of occurrences of a pattern inside a different pattern.
  • Residues("HEM").ConnectedResidues(2).Filter(lambda m: m.Count(Atoms("S")) == 1) – Returns patterns composed of a HEM residue surrounded by two layers of bonded residues in case that the whole pattern contains exactly one sulphur atom.
  • Residues("CYS").ConnectedResidues(1).Filter(lambda m: m.Count(Residues("VAL")) == 2) – Returns a cysteine residue which is surrounded from both sides by valine residues.
  • Type: Count(where: Pattern, what: PatternSeq) -> Integer.

Contains()

[edit]
  • [PDB id: 4hhb] Contains() query checks if the input pattern contains a specified pattern of interest. In other words Contains() query is similar to a query, where the number of occurrences is higher than zero (Count() > 0)
  • Residues("HEM").AmbientResidues(2).Filter(lambda m: m.Contains(Residues("HIS"))) – Returns patterns, where any atom of HEM residue is at most 2A distant from the histidine.
  • Residues().Filter(lambda m: m.Contains(Atoms("S"))) – Returns all the residue which has a sulphur incorporated in their structure. For this particular example it is similar to the query Residues("CYS", "MET").
  • Type: Contains(where: Pattern, what: PatternSeq) -> Bool


Connectivity

[edit]

IsConnected()

[edit]
  • [PDB id: 4m9e, 4m9v] Checks, whether a particular pattern is composed of a single component
  • Atoms("Zn").AmbientAtoms(4).Filter(lambda m: m.IsConnected()) – Atoms which are at most 4 A distant from the zinc atom and are all binded together, i.e. there is no outlier.
  • For comparison please compare with the results of the query Atoms("Zn").AmbientAtoms(4)
  • Type: IsConnected(pattern: Pattern) -> Bool.


IsConnectedTo()

[edit]
  • [PDB id: 1hho] Checks if the two provided patterns are connected one to another.
  • Residues("ALA").Filter(lambda m: m.IsConnectedTo(Residues("GLY"))) – Returns all the alanine residues and directly connected glycine residues.
  • Type: IsConnectedTo(current: Pattern, patterns: PatternSeq) -> Bool


IsNotConnectedTo()

[edit]
  • [PDB id: 1hho] Checks if the two provided patterns are NOT connected one to another.
  • Residues("ALA").Filter(lambda m: m.IsNotConnectedTo(Residues("GLY"))) – Returns all the alanine residues which are not directly connected to the glycine residues.
  • Type: IsNotConnectedTo(current: Pattern, patterns: PatternSeq) -> Bool


ConnectedAtoms()

[edit]
  • [PDB id: 1hho] Returns n directly bonded layer of atoms to a given pattern.
  • Atoms("Fe").ConnectedAtoms(1) – The iron atom and all the atoms which are covalently bonded to it over a single bond.
  • Atoms("Fe").ConnectedAtoms (2) – Previous selection and all the atoms which are covalently bonded to them (i.e. additional layer of bonded atom). In other words the output composes of all the atoms which are 2 bonds away from the iron atom.
  • Type: ConnectedAtoms(pattern: PatternSeq, n: Integer) -> Patterns


ConnectedResidues()

[edit]
  • [PDB id: 1hho] Returns n directly bonded layer of residues to a given pattern.
  • Residues("HEM").ConnectedResidues(1) – The HEM residue and all the residues which are covalently bonded to any atom of the HEM residue.
  • Residues("HEM").ConnectedResidues(2) – previous selection and all the residues which are covalently bonded to them (i.e. additional layer of bonded residues).
  • Type: ConnectedResidues(pattern: PatternSeq, n: Integer) -> Patterns