Jump to content

SecStrAnnotator:OneToOne: Difference between revisions

From WebChemistry Wiki
Midlik (talk | contribs)
No edit summary
Midlik (talk | contribs)
No edit summary
 
(21 intermediate revisions by the same user not shown)
Line 1: Line 1:
This page is about '''SecStrAnnotator 2.0''' and higher (using SecStrAnnotator 1.0 is discouraged, the old page is [[SecStrAnnotator:OneToOne_v1 | here]]).
This page is about '''SecStrAnnotator 2.3''' (the page for SecStrAnnotator 1.0 is [[SecStrAnnotator:OneToOne_v1.0 | here]]).




Line 9: Line 9:


==Dependencies==
==Dependencies==
===.NET 6.0 runtime===
Download and install from the official [https://dotnet.microsoft.com/ .NET website]. (For SecStrAnnotator 2.0–2.2, use .NET Core 3.0 or .NET Core 3.1)


===PyMOL===
===PyMOL===
Line 14: Line 18:
PyMOL is used by SecStrAnnotator for structural alignment and visualization. It can be downloaded from the [https://pymol.org/2/ PyMOL website]. In Ubuntu Linux it can also be installed by running <code>sudo apt install pymol</code>.
PyMOL is used by SecStrAnnotator for structural alignment and visualization. It can be downloaded from the [https://pymol.org/2/ PyMOL website]. In Ubuntu Linux it can also be installed by running <code>sudo apt install pymol</code>.


===.NET Core Runtime===
===(DSSP)===


(Only needed for SecStrAnnotator 2.0 and higher)
Only needed if using DSSP secondary structure assignment method (<code>--ssa dssp</code>). In Ubuntu Linux it can be installed by running <code>sudo apt install dssp</code>.


.NET Core Runtime is usually pre-installed in Windows systems. To install in Linux, follow the official [https://dotnet.microsoft.com/ .NET Core website].
==Execution==


===Mono===
SecStrAnnotator is executed from command line:


(Only needed for SecStrAnnotator 1.0 outside Windows)
dotnet SecStrAnnotator.dll <span style=color:gray>[</span><span style=color:green>OPTIONS</span><span style=color:gray>]</span> <span style=color:green>DIRECTORY</span> <span style=color:green>TEMPLATE</span> <span style=color:green>QUERY</span>


On Windows, SecStrAnnotator.exe can be executed directly; however, on other operating systems it must be run using [https://www.mono-project.com/ Mono]. In Ubuntu Linux it can be installed by running <code>sudo apt install mono-devel</code>.
Example of a call:


===DSSP===
<source lang="bash">
dotnet SecStrAnnotator.dll --help                                                                          # Show help


(Only needed when using DSSP secondary structure assignment method (by <code>--ssa dssp</code>))
dotnet SecStrAnnotator.dll --onlyssa my_data_directory 1tqn,A,7:478                                        # Only detect SSEs


==Execution==
dotnet SecStrAnnotator.dll my_data_directory 2nnj,A 1tqn,A,7:478                                          # Detect and annotate SSEs


SecStrAnnotator is executed from command line.
  dotnet SecStrAnnotator.dll my_data_directory 2nnj,A 1tqn,A,7:478 --align cealign --matching mom --session # Detect and annotate SSEs
 
</source>
SecStrAnnotator 1.0 on Windows:
  SecStrAnnotator.exe <span style=color:gray>[</span><span style=color:green>OPTIONS</span><span style=color:gray>]</span> <span style=color:green>DIRECTORY</span> <span style=color:green>TEMPLATE</span> <span style=color:green>QUERY</span>
 
SecStrAnnotator 1.0 on Linux:
  mono SecStrAnnotator.exe <span style=color:gray>[</span><span style=color:green>OPTIONS</span><span style=color:gray>]</span> <span style=color:green>DIRECTORY</span> <span style=color:green>TEMPLATE</span> <span style=color:green>QUERY</span>
 
SecStrAnnotator 2.0:
dotnet SecStrAnnotator2.dll <span style=color:gray>[</span><span style=color:green>OPTIONS</span><span style=color:gray>]</span> <span style=color:green>DIRECTORY</span> <span style=color:green>TEMPLATE</span> <span style=color:green>QUERY</span>


Example of a call:
dotnet SecStrAnnotator2.dll --align cealign --ssa geom-hbond --matching mom --session  my_data_directory 1og2,A,30:491 1tqn,A,28:499


===Arguments===
===Arguments===
Line 56: Line 51:
** <code>1h9r,A,123:183,252:261</code> (residues 123–183 and 252–261 of the chain A)
** <code>1h9r,A,123:183,252:261</code> (residues 123–183 and 252–261 of the chain A)
* <code><span style=color:green>QUERY</span></code> describes the query protein domain and uses the same format as <code><span style=color:green>TEMPLATE</span></code>.
* <code><span style=color:green>QUERY</span></code> describes the query protein domain and uses the same format as <code><span style=color:green>TEMPLATE</span></code>.
Keep in mind that the chains and residues are numbered according to the label_* numbering scheme in mmCIF file format (i.e. chain identifier is <code>label_asym_id</code>, residue number is <code>label_seq_id</code>).


===Options===
===Options===
Line 64: Line 61:
===Input files===
===Input files===


* <code><span style=color:green>DIRECTORY</span>/<span style=color:green>TEMPLATEPDB</span>.pdb</code>  – structure of the template protein
* <code><span style=color:green>DIRECTORY</span>/<span style=color:green>TEMPLATEPDB</span>.cif</code>  – structure of the template protein
* <code><span style=color:green>DIRECTORY</span>/<span style=color:green>TEMPLATEPDB</span>-template.sses.json</code>  – annotation of the template domain
* <code><span style=color:green>DIRECTORY</span>/<span style=color:green>TEMPLATEPDB</span>-template.sses.json</code>  – annotation of the template domain
* <code><span style=color:green>DIRECTORY</span>/<span style=color:green>QUERYPDB</span>.pdb</code>  – structure of the query protein
* <code><span style=color:green>DIRECTORY</span>/<span style=color:green>QUERYPDB</span>.cif</code>  – structure of the query protein
 
For SecStrAnnotator 2.0, all structural files should be in mmCIF format instead of PDB format.


===Output files===
===Output files===


* <code><span style=color:green>DIRECTORY</span>/<span style=color:green>QUERYPDB</span>-aligned.pdb</code> – structure of the query protein after superimposition on the template protein
* <code><span style=color:green>DIRECTORY</span>/<span style=color:green>QUERYPDB</span>-aligned.cif</code> – structure of the query protein after superimposition on the template protein
* <code><span style=color:green>DIRECTORY</span>/<span style=color:green>QUERYPDB</span>-detected.sses.json</code> – secondary structure assignment of the query protein, i.e. all detected SSEs
* <code><span style=color:green>DIRECTORY</span>/<span style=color:green>QUERYPDB</span>-detected.sses.json</code> – secondary structure assignment of the query protein, i.e. all detected SSEs
* <code><span style=color:green>DIRECTORY</span>/<span style=color:green>QUERYPDB</span>-annotated.sses.json</code> – annotated SSEs in the query protein
* <code><span style=color:green>DIRECTORY</span>/<span style=color:green>QUERYPDB</span>-annotated.sses.json</code> – annotated SSEs in the query protein
* <code><span style=color:green>DIRECTORY</span>/<span style=color:green>QUERYPDB</span>-annotated.pse</code> – PyMOL session with the visualization of the resulting annotation (only when executed with <code>--session</code> option)
* <code><span style=color:green>DIRECTORY</span>/<span style=color:green>QUERYPDB</span>-annotated.pse</code> – PyMOL session with the visualization of the resulting annotation (only when executed with <code>--session</code> option)
* Additional output files and more detailed .sses.json files are produced when executed with <code>--verbose</code> option


===Auxiliary files and programs===
===Auxiliary files and programs===


SecStrAnnotator has dependencies on other programs (PyMOL, optionally DSSP) and scripts. These auxiliary files need to be available in the system, and there location must be specified in the configuration file <code>SecStrAnnotator_config.json</code>. The configuration file itself must be in the same directory as <code>SecStrAnnotator.exe</code>. The default content of the configuration file is:
SecStrAnnotator has dependencies on other programs (PyMOL, optionally DSSP) and scripts. These auxiliary files need to be available in the system, and there location must be specified in the configuration file <code>SecStrAnnotator_config.json</code>. The configuration file itself must be in the same directory as <code>SecStrAnnotator.dll</code>. The default content of the configuration file is:
<source lang="json>
<source lang="json>
{
{
"PymolExecutable":    "pymol",
"PymolExecutable":    "pymol",
"DsspExecutable":     "./dssp",
"PymolScriptAlign":   "./scripts/script_align.py",
"PymolScriptAlign":   "./script_align.py",
"PymolScriptSession": "./scripts/script_session.py",
"PymolScriptSession": "./script_session.py"
"DsspExecutable":     "mkdssp"
}     
}     
</source>
</source>


which assumes that <code>pymol</code> is installed and can be run directly (from <code>$PATH</code>) and that the other files are present in the same directory as <code>SecStrAnnotator.exe</code>.  
which assumes that <code>pymol</code> is installed and can be run directly (from <code>$PATH</code>) and that the other files are present in subdirectory <code>scripts</code>.
 
On Windows, you must find the location of PyMOL executable and insert it into the configuration file.
 
* It will be probably some impossible-to-find file like <code>C:\Program Files\PyMOL\PyMOL\PyMOL.exe</code> or <code>C:\ProgramData\PyMOL\Scripts\pymol.exe</code> (or <code>C:/Users/Adam/miniconda3/Scripts/pymol.exe</code> if you installed PyMOL via <code>conda</code>).
* To make it more fun, <code>ProgramData</code> is a hidden directory, which is by default invisible in File Explorer.
* We're looking for PyMOL.exe, not PyMOLWin.exe. PyMOLWin.exe will run asynchronously and it will probably not work.
* Note that <code>\</code> is a special escaping character in JSON. Therefore use <code>/</code> or <code>\\</code> instead of <code>\</code>.


On Windows, the location of PyMOL executable must be manually inserted into the modification file (it is usually <code>C:\Program Files\PyMOL\PyMOL\PyMOL.exe</code>, but can be different; it should be PyMOL.exe, not PymolWin.exe):
<source lang="json>
<source lang="json>
{
{
"PymolExecutable":    "C:\\Program Files\\PyMOL\\PyMOL\\PyMOL.exe",
"PymolExecutable":    "C:/ProgramData/PyMOL/Scripts/pymol.exe",
"DsspExecutable":     ".\\dssp",
"PymolScriptAlign":   "./scripts/script_align.py",
"PymolScriptAlign":   ".\\script_align.py",
"PymolScriptSession": "./scripts/script_session.py",
"PymolScriptSession": ".\\script_session.py"
"DsspExecutable":     "./dssp.exe"
}     
}     
</source>
</source>
Note that <code>\\</code> must be used instead of <code>\</code> in the configuration file.


===Annotation file format===
===Annotation file format===
Line 107: Line 108:
<source lang="json>
<source lang="json>
{
{
   "1og2": {
   "1tqn": {
     "comment": "This is a demonstration of the annotation format.",
     "comment": "This is a demonstration of the annotation format. It shows a few SSEs selected from the real annotation of 1tqn.",
     "secondary_structure_elements": [
     "secondary_structure_elements": [
       { "label":   "A", "chain_id": "A", "start": 50, "end": 61, "type": "H" },
       { "label": "A", "chain_id": "A", "start": 36, "end": 47, "type": "H", "sequence": "FCMFDMECHKKY" },
       { "label": "1.1", "chain_id": "A", "start": 65, "end": 69, "type": "E" },
       { "label": "B", "chain_id": "A", "start": 66, "end": 76, "type": "h", "sequence": "PDMIKTVLVKE" },
       { "label": "1.2", "chain_id": "A", "start": 72, "end": 77, "type": "E" },
       { "label": "1-1", "chain_id": "A", "start": 50, "end": 55, "type": "E", "sheet_id": 1, "sequence": "VWGFYD" },
       { "label":   "B", "chain_id": "A", "start": 80, "end": 90, "type": "H" },
       { "label": "1-2", "chain_id": "A", "start": 58, "end": 63, "type": "E", "sheet_id": 1, "sequence": "QPVLAI" },
       { "label": "1.3", "chain_id": "A", "start": 386, "end": 389, "type": "E" }
       { "label": "1-3", "chain_id": "A", "start": 372, "end": 375, "type": "E", "sheet_id": 1, "sequence": "VVMI" }
     ],
     ],
     "beta_connectivity": [
     "beta_connectivity": [
       [ "1.1", "1.2", -1 ],
       [ "1-1", "1-2", -1 ],
       [ "1.2", "1.3", 1 ]
       [ "1-2", "1-3", 1 ]
     ]
     ]
   }
   }
Line 124: Line 125:
</source>
</source>


The example describes two helices, named A and B, and a β-sheet consisting of three strands, named 1.1, 1.2, and 1.3. Strands 1.1 and 1.2 are connected by an anti-parallel β-ladder, strands 1.2 and 1.3 by a parallel β-ladder. All the SSEs are located on chain A of structure 1og2.
The example describes two helices, named A and B, and a β-sheet consisting of three strands, named 1-1, 1-2, and 1-3. Strands 1-1 and 1-2 are connected by an anti-parallel β-ladder, strands 1-2 and 1-3 by a parallel β-ladder. All the SSEs are located on chain A of structure 1tqn.





Latest revision as of 13:59, 17 May 2022

This page is about SecStrAnnotator 2.3 (the page for SecStrAnnotator 1.0 is here).


SecStrAnnotator finds annotation for a query protein Q, based on the template protein T. Thus, the input consists of the structure of T, structure of Q, and annotation of T.

Sometimes a single protein consists of several domains. In such cases, T and Q do not refer to the whole protein but only to one domain.

The annotation algorithm consists of three major steps. The first step is structural alignment and superimposition of the query protein with the template protein, so the corresponding parts of the two proteins are located close to each other. In the second step, secondary structure assignment (SSA) is performed – SSEs are detected in the query protein Q. The third step is called matching – the algorithm will match the template SSEs to the query SSEs and for each annotated SSE in T it will select the corresponding SSE in Q.

Dependencies

[edit]

.NET 6.0 runtime

[edit]

Download and install from the official .NET website. (For SecStrAnnotator 2.0–2.2, use .NET Core 3.0 or .NET Core 3.1)

PyMOL

[edit]

PyMOL is used by SecStrAnnotator for structural alignment and visualization. It can be downloaded from the PyMOL website. In Ubuntu Linux it can also be installed by running sudo apt install pymol.

(DSSP)

[edit]

Only needed if using DSSP secondary structure assignment method (--ssa dssp). In Ubuntu Linux it can be installed by running sudo apt install dssp.

Execution

[edit]

SecStrAnnotator is executed from command line:

dotnet SecStrAnnotator.dll [OPTIONS] DIRECTORY TEMPLATE QUERY

Example of a call:

 dotnet SecStrAnnotator.dll --help                                                                          # Show help

 dotnet SecStrAnnotator.dll --onlyssa my_data_directory 1tqn,A,7:478                                        # Only detect SSEs

 dotnet SecStrAnnotator.dll my_data_directory 2nnj,A 1tqn,A,7:478                                           # Detect and annotate SSEs

 dotnet SecStrAnnotator.dll my_data_directory 2nnj,A 1tqn,A,7:478 --align cealign --matching mom --session  # Detect and annotate SSEs


Arguments

[edit]
  • DIRECTORY is the directory containing all the input files. The output files will also be saved to this directory.
  • TEMPLATE describes the template protein domain in one of the following formats: PDB or PDB,CHAIN or PDB,CHAIN,RANGES. The whole argument must be written without spaces. Examples:
    • 1og2 (structure 1og2, chain A by default)
    • 1og2,B (chain B)
    • 1og2,B,100:400 (residues 100–400 of the chain B)
    • 1og2,B,:400 (residues up to 400 of the chain B)
    • 1h9r,A,123:183,252:261 (residues 123–183 and 252–261 of the chain A)
  • QUERY describes the query protein domain and uses the same format as TEMPLATE.

Keep in mind that the chains and residues are numbered according to the label_* numbering scheme in mmCIF file format (i.e. chain identifier is label_asym_id, residue number is label_seq_id).

Options

[edit]

There is a range of options which can be used to modify the behaviour of SecStrAnnotator. The most important option is:

  • --help Prints the help message, which includes the description of all the other options.

Input files

[edit]
  • DIRECTORY/TEMPLATEPDB.cif – structure of the template protein
  • DIRECTORY/TEMPLATEPDB-template.sses.json – annotation of the template domain
  • DIRECTORY/QUERYPDB.cif – structure of the query protein

Output files

[edit]
  • DIRECTORY/QUERYPDB-aligned.cif – structure of the query protein after superimposition on the template protein
  • DIRECTORY/QUERYPDB-detected.sses.json – secondary structure assignment of the query protein, i.e. all detected SSEs
  • DIRECTORY/QUERYPDB-annotated.sses.json – annotated SSEs in the query protein
  • DIRECTORY/QUERYPDB-annotated.pse – PyMOL session with the visualization of the resulting annotation (only when executed with --session option)
  • Additional output files and more detailed .sses.json files are produced when executed with --verbose option

Auxiliary files and programs

[edit]

SecStrAnnotator has dependencies on other programs (PyMOL, optionally DSSP) and scripts. These auxiliary files need to be available in the system, and there location must be specified in the configuration file SecStrAnnotator_config.json. The configuration file itself must be in the same directory as SecStrAnnotator.dll. The default content of the configuration file is:

{
	"PymolExecutable":    "pymol",
	"PymolScriptAlign":   "./scripts/script_align.py",
	"PymolScriptSession": "./scripts/script_session.py",
	"DsspExecutable":     "mkdssp"
}

which assumes that pymol is installed and can be run directly (from $PATH) and that the other files are present in subdirectory scripts.

On Windows, you must find the location of PyMOL executable and insert it into the configuration file.

  • It will be probably some impossible-to-find file like C:\Program Files\PyMOL\PyMOL\PyMOL.exe or C:\ProgramData\PyMOL\Scripts\pymol.exe (or C:/Users/Adam/miniconda3/Scripts/pymol.exe if you installed PyMOL via conda).
  • To make it more fun, ProgramData is a hidden directory, which is by default invisible in File Explorer.
  • We're looking for PyMOL.exe, not PyMOLWin.exe. PyMOLWin.exe will run asynchronously and it will probably not work.
  • Note that \ is a special escaping character in JSON. Therefore use / or \\ instead of \.
{
	"PymolExecutable":    "C:/ProgramData/PyMOL/Scripts/pymol.exe",
	"PymolScriptAlign":   "./scripts/script_align.py",
	"PymolScriptSession": "./scripts/script_session.py",
	"DsspExecutable":     "./dssp.exe"
}

Annotation file format

[edit]

All files with extension .sses.json are in SecStrAnnotator annotation format. A short example of this format:

{
  "1tqn": {
    "comment": "This is a demonstration of the annotation format. It shows a few SSEs selected from the real annotation of 1tqn.",
    "secondary_structure_elements": [
      { "label": "A", "chain_id": "A", "start": 36, "end": 47, "type": "H", "sequence": "FCMFDMECHKKY" },
      { "label": "B", "chain_id": "A", "start": 66, "end": 76, "type": "h", "sequence": "PDMIKTVLVKE" },
      { "label": "1-1", "chain_id": "A", "start": 50, "end": 55, "type": "E", "sheet_id": 1, "sequence": "VWGFYD" },
      { "label": "1-2", "chain_id": "A", "start": 58, "end": 63, "type": "E", "sheet_id": 1, "sequence": "QPVLAI" },
      { "label": "1-3", "chain_id": "A", "start": 372, "end": 375, "type": "E", "sheet_id": 1, "sequence": "VVMI" }
    ],
    "beta_connectivity": [
      [ "1-1", "1-2", -1 ],
      [ "1-2", "1-3", 1 ]
    ]
  }
}

The example describes two helices, named A and B, and a β-sheet consisting of three strands, named 1-1, 1-2, and 1-3. Strands 1-1 and 1-2 are connected by an anti-parallel β-ladder, strands 1-2 and 1-3 by a parallel β-ladder. All the SSEs are located on chain A of structure 1tqn.




Back to the main page