Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
WebChemistry Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
PatternQuery:Principles
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Basic Principles of the Language== ===Intuitive Description=== As we've seen in the example above, it is very easy to ''compose'' our ideas about the final shape of the pattern we are interested in. The way this works is that the input molecule is decomposed into a stream of patterns. These streams can then be modified and combined into new streams, which can be modified and combined again. As an example, take the query <code>Atoms('Ca')</code>. What the '''PatternQuery''' language does is to extract all calcium atoms from the input molecule and represent them as a stream of sets containing one atom each as illustrated on the image below: [[Image:PatternQuery-Principles Atoms(Ca).png|center|600px]] Now, each element of this stream can be modified, for example to include all atoms 4A within the original calcium atom. Now we have a stream of sets of atoms, where each set contains the original Ca atom and the atoms within the given radius. This would be represented by the query <code>Atoms('Ca').AmbientAtoms(4)</code> and is illustrated on the image bellow: [[Image:PatternQuery-Principles Atoms(Ca) surr.png|center|600px]] In the next step, we might wish to keep only these patterns that contain at least 6 atoms. This is achieved by looking at each pattern, counting the number of atoms and throwing away these patterns that do not meet the criteria. Written as a query, this could be represented as <code>Atoms('Ca').AmbientAtoms(4).Filter(lambda m: m.Count(Atoms()) >= 6)</code>. In the graphical form: [[Image:PatternQuery-Principles Atoms(Ca) surr filt.png|center|600px]] The previous filter query also demonstrates another interesting concept of the language: the ability to identify patterns within patterns, which is what the expression <code>m.Count(Atoms())</code> does - the <code>[[PatternQuery:Language_Reference#Atoms | Atoms()]]</code> query is executed for each pattern from the original input sequence provided by the expression <code>Atoms('Ca').AmbientAtoms(4)</code>, and creates a new sequence of patterns that each contain a single atom. Then the Count function takes over and returns the number of patterns produced by its argument. In this way, the query <code>[[PatternQuery:Language_Reference#Atoms | Atoms()]]</code> inside the Count function can be replaced by any function that also produces a sequence of patterns, for example <code>[[PatternQuery:Language_Reference#Rings | Rings()]]</code>. Finally, streams of patterns can be combined. For example, let’s say we want to find all pairs of calcium atoms that are no further than 4A apart. This can be achieved using the query <code>Near(4, Atoms('Ca'), Atoms('Ca'))</code>. So this query takes as the input two identical streams of calcium atoms and for each pair of them determines if the atoms are closer than 4A to each other. For each pair that satisfies this condition, a new pattern from the 2 atoms is created. Therefore, the result of the above <code>[[PatternQuery:Language_Reference#Near | Near()]]</code> query is a stream of sets of atoms (patterns) that each contain two calcium atoms that are no further than 4A from each other: [[Image:PatternQuery-Principles-Near.png|center|600px]] With these basic types of queries outlined in the previous paragraphs, the sky's the limit. Due to the composable nature of the language if a new type of pattern emerges, only a single function needs to be added to the language for it to work with all its other parts. As an example, assume we didn’t know that proteins had secondary structure called “sheet” and we just discovered it and a fancy algorithm to identify these "sheets". Now we would be interested in how this new type of protein substructure interacts with other parts of the molecule. All that would be needed is to add a function called <code>[[PatternQuery:Language_Reference#Sheets | Sheets()]]</code> to the language and immediately we would be able to analyze and filter it’s neighborhood using the functions <code>[[PatternQuery:Language_Reference#AmbientAtoms | AmbientAtoms()]]</code> and <code>[[PatternQuery:Language_Reference#Filter | Filter()]]</code>. ===A More Formal Description=== There are two basic data structures that the language is built upon. These are: * '''Pattern'''. A pattern is simply an arbitrary set of atoms. * '''Pattern Sequence'''. A sequence of patterns. In mathematical terms, can be understood as a "set of patterns" which is another way of saying "set of sets of atoms". And on these data structures, there are three basic types of queries: * '''Generator queries'''. Generator queries, as the name suggests, generate sequences of patterns from the original input. They are the tool that transforms the input molecule into a stream of patterns that can be later modified or combined. Examples of these queries include <code>[[PatternQuery:Language_Reference#Atoms | Atoms()]]</code>, <code>[[PatternQuery:Language_Reference#Residues | Residues()]]</code>, and <code>[[PatternQuery:Language_Reference#RegularMotifs | RegularMotifs()]]</code>. * '''Modifier queries'''. These queries operate on individual patterns and modify them or throw them away. Examples include <code>[[PatternQuery:Language_Reference#AmbientAtoms | AmbientAtoms()]]</code>, <code>[[PatternQuery:Language_Reference#ConnectedResidues | ConnectedResidues()]]</code>, and <code>[[PatternQuery:Language_Reference#Filter | Filter()]]</code>. * '''Combinator queries'''. Combinatorial queries take as input two or more sequence of patterns and combine them into a single new sequence that satisfies given criteria. Examples include <code>[[PatternQuery:Language_Reference#Or | Or()]]</code>, <code>[[PatternQuery:Language_Reference#Near | Near()]]</code>, and <code>[[PatternQuery:Language_Reference#Path | Path()]]</code>.
Summary:
Please note that all contributions to WebChemistry Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
WebChemistry Wiki:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
PatternQuery:Principles
(section)
Add topic