Jump to content

SecStrAnnotator:OneToMany: Difference between revisions

From WebChemistry Wiki
Midlik (talk | contribs)
Midlik (talk | contribs)
No edit summary
Line 27: Line 27:
==Selection of a template domain and obtaining its annotation==
==Selection of a template domain and obtaining its annotation==


In this step, one of the domains from the protein family should be selected as a template domain. Suppose that we have selected the domain which is located on the chain A of PDB entry 1og2.
In this step, one of the domains from the protein family should be selected as a template domain. Suppose that we have selected the domain which is located on the chain A of PDB entry 2nnj.


If an SSE annotation of the template domain is available in the literature, it can be converted into the [[SecStrAnnotator:OneToOne#Annotation file format | required format]] and used as the template annotation.
If an SSE annotation of the template domain is available in the literature, it can be converted into the [[SecStrAnnotator:OneToOne#Annotation file format | required format]] and used as the template annotation.
Line 33: Line 33:
If no annotation is available, it must be created from scratch. The easiest way of creating an annotation file is to perform secondary structure assignment (SSA) by SecStrAnnotator:
If no annotation is available, it must be created from scratch. The easiest way of creating an annotation file is to perform secondary structure assignment (SSA) by SecStrAnnotator:


  dotnet SecStrAnnotator.dll --onlyssa my_structure_directory 1og2,A
  dotnet SecStrAnnotator.dll --onlyssa my_structure_directory 2nnj,A


This will create file <code>1og2-detected.sses.json</code> in <code>my_structure_directory</code>, which should be renamed to <code>1og2-template.sses.json</code>.
This will create file <code>2nnj-detected.sses.json</code> in <code>my_structure_directory</code>, which should be renamed to <code>2nnj-template.sses.json</code>.


Additional refinement of the template annotation is recommended – this includes removing unnecessary SSEs, adding SSEs which should be annotated, and possibly renaming the SSEs according to a transparent scheme (e.g. helices A, B, C, D... instead of H0, H1, H4, H6...).
Additional refinement of the template annotation is recommended – this includes removing unnecessary SSEs, adding SSEs which should be annotated, and possibly renaming the SSEs according to a transparent scheme (e.g. helices A, B, C, D... instead of H0, H1, H4, H6...).
Line 41: Line 41:
==Running the annotation algorithm on each member of the family==
==Running the annotation algorithm on each member of the family==


  python3 SecStrAnnotator_batch.py my_structure_directory 1og2,A family_from_cath.json
  python3 SecStrAnnotator_batch.py my_structure_directory 2nnj,A family_from_cath.json


<br style="clear:both" />
<br style="clear:both" />

Revision as of 13:56, 9 April 2020

This page describes the procedure for annotating SSEs in a whole protein family, using SecStrAnnotator Suite (SecStrAnnotator + supplementary scripts).

A protein family is understood as a set of structurally similar protein domains. A protein domain can be either a whole protein chain or only a part of it (in multidomain proteins).

Dependencies

Python3

Most steps in the procedure are realized by scripts which are executed by Python3 interpreter (pre-installed in some Linux distributions).

Procedure

Preparing structural data

A list of PDB structures corresponding to a protein family can be obtained from PDBe REST API using domains_from_pdbeapi.py. The protein family can be identified by a CATH code, such as 1.10.630.10 (CATH), or by a Pfam accession, such as PF00067 (Pfam):

python3 domains_from_pdbeapi.py 1.10.630.10 > family_from_cath.json

python3 domains_from_pdbeapi.py PF00067 > family_from_pfam.json

The structures are then downloaded from PDBe by download_from_pdbe.py:

python3 download_from_pdbe.py family_from_cath.json my_structure_directory

In this moment, all necessary structures should be in the directory my_structure_directory.

Selection of a template domain and obtaining its annotation

In this step, one of the domains from the protein family should be selected as a template domain. Suppose that we have selected the domain which is located on the chain A of PDB entry 2nnj.

If an SSE annotation of the template domain is available in the literature, it can be converted into the required format and used as the template annotation.

If no annotation is available, it must be created from scratch. The easiest way of creating an annotation file is to perform secondary structure assignment (SSA) by SecStrAnnotator:

dotnet SecStrAnnotator.dll --onlyssa my_structure_directory 2nnj,A

This will create file 2nnj-detected.sses.json in my_structure_directory, which should be renamed to 2nnj-template.sses.json.

Additional refinement of the template annotation is recommended – this includes removing unnecessary SSEs, adding SSEs which should be annotated, and possibly renaming the SSEs according to a transparent scheme (e.g. helices A, B, C, D... instead of H0, H1, H4, H6...).

Running the annotation algorithm on each member of the family

python3 SecStrAnnotator_batch.py my_structure_directory 2nnj,A family_from_cath.json



Back to the main page