SecStrAnnotator:OneToMany: Difference between revisions
No edit summary |
No edit summary |
||
Line 3: | Line 3: | ||
A '''protein family''' is understood as a set of structurally similar protein domains. A '''protein domain''' can be either a whole protein chain or only a part of it (in multidomain proteins). | A '''protein family''' is understood as a set of structurally similar protein domains. A '''protein domain''' can be either a whole protein chain or only a part of it (in multidomain proteins). | ||
=Dependencies= | |||
Python3 | ===Python3=== | ||
=Procedure= | |||
==Preparing structural data== | ==Preparing structural data== | ||
A list of PDB structures corresponding to a protein family can be obtained from [http://www.ebi.ac.uk/pdbe/pdbe-rest-api PDBe REST API] using <code>domains_from_pdbeapi.py</code>. The protein family can be identified by a CATH code, such as 1.10.630.10 ([http://cathdb.info/ CATH]), or a Pfam accession, such as PF00067 ([https://pfam.xfam.org/ Pfam]): | A list of PDB structures corresponding to a protein family can be obtained from [http://www.ebi.ac.uk/pdbe/pdbe-rest-api PDBe REST API] using <code>domains_from_pdbeapi.py</code>. The protein family can be identified by a CATH code, such as 1.10.630.10 ([http://cathdb.info/ CATH]), or by a Pfam accession, such as PF00067 ([https://pfam.xfam.org/ Pfam]): | ||
python3 domains_from_pdbeapi.py 1.10.630.10 > family_from_cath.json | python3 domains_from_pdbeapi.py 1.10.630.10 > family_from_cath.json | ||
Line 15: | Line 17: | ||
python3 domains_from_pdbeapi.py PF00067 > family_from_pfam.json | python3 domains_from_pdbeapi.py PF00067 > family_from_pfam.json | ||
The structures are then downloaded from PDBe by <code>download_from_pdbe.py</code>: | The structures are then downloaded from [https://www.ebi.ac.uk/pdbe/ PDBe] by <code>download_from_pdbe.py</code>: | ||
python3 download_from_pdbe.py family_from_cath.json my_structure_directory | python3 download_from_pdbe.py family_from_cath.json my_structure_directory | ||
Line 25: | Line 27: | ||
In this step, one of the domains from the protein family should be selected as a template domain. Suppose that we have selected the domain which is located on the chain A of PDB entry 1og2. | In this step, one of the domains from the protein family should be selected as a template domain. Suppose that we have selected the domain which is located on the chain A of PDB entry 1og2. | ||
If | If an SSE annotation of the template domain is available in the literature, it can be converted into the [[SecStrAnnotator:OneToOne#Annotation file format | required format]] and used as the template annotation. | ||
If no annotation is available, it must be created from scratch. The easiest way of | If no annotation is available, it must be created from scratch. The easiest way of creating an annotation file is to perform secondary structure assignment (SSA) by SecStrAnnotator: | ||
mono SecStrAnnotator.exe --onlyssa my_structure_directory 1og2,A | mono SecStrAnnotator.exe --onlyssa my_structure_directory 1og2,A | ||
This will create file <code>1og2-detected.sses.json</code> in <code>my_structure_directory</code>, which should be renamed to <code>1og2- | This will create file <code>1og2-detected.sses.json</code> in <code>my_structure_directory</code>, which should be renamed to <code>1og2-template.sses.json</code>. | ||
Additional refinement of the template annotation is recommended. | Additional refinement of the template annotation is recommended. | ||
Line 37: | Line 39: | ||
==Running the annotation algorithm on each member of the family== | ==Running the annotation algorithm on each member of the family== | ||
python3 SecStrAnnotator_batch.py my_structure_directory 1og2,A family_from_cath.json | python3 SecStrAnnotator_batch.py my_structure_directory 1og2,A family_from_cath.json | ||
<br style="clear:both" /> | <br style="clear:both" /> |
Revision as of 01:23, 1 May 2018
This page describes the procedure for annotating SSEs in a whole protein family.
A protein family is understood as a set of structurally similar protein domains. A protein domain can be either a whole protein chain or only a part of it (in multidomain proteins).
Dependencies
Python3
Procedure
Preparing structural data
A list of PDB structures corresponding to a protein family can be obtained from PDBe REST API using domains_from_pdbeapi.py
. The protein family can be identified by a CATH code, such as 1.10.630.10 (CATH), or by a Pfam accession, such as PF00067 (Pfam):
python3 domains_from_pdbeapi.py 1.10.630.10 > family_from_cath.json
or
python3 domains_from_pdbeapi.py PF00067 > family_from_pfam.json
The structures are then downloaded from PDBe by download_from_pdbe.py
:
python3 download_from_pdbe.py family_from_cath.json my_structure_directory
In this moment, all necessary structures should be in the directory my_structure_directory
.
Selection of a template domain and obtaining its annotation
In this step, one of the domains from the protein family should be selected as a template domain. Suppose that we have selected the domain which is located on the chain A of PDB entry 1og2.
If an SSE annotation of the template domain is available in the literature, it can be converted into the required format and used as the template annotation.
If no annotation is available, it must be created from scratch. The easiest way of creating an annotation file is to perform secondary structure assignment (SSA) by SecStrAnnotator:
mono SecStrAnnotator.exe --onlyssa my_structure_directory 1og2,A
This will create file 1og2-detected.sses.json
in my_structure_directory
, which should be renamed to 1og2-template.sses.json
.
Additional refinement of the template annotation is recommended.
Running the annotation algorithm on each member of the family
python3 SecStrAnnotator_batch.py my_structure_directory 1og2,A family_from_cath.json