Escherichia coli is a Gram-negative,
rod-shaped bacterium that is commonly found
in the lower intestine of warm-blooded organisms. Although most E. coli strains
are harmless, some serotypes can cause serious food poisoning in humans, and
are occasionally responsible for product recalls due to food contamination.
The E. coli cells are able to survive outside the body for a limited amount
of time, which makes them ideal indicator organisms to test environmental
samples for fecal contamination.
This page contains protein structure and function modeling data for the
Escherichia coli genome, generated using the state of the art computational
methods.
-
Ab initio protein structure prediction dataset
contains 495 E. coli proteins with length ranging from 32 to 567 residues.
These proteins have no homologous templates that can be detected by
LOMETS from the PDB library, and are judged as Hard distant-homology
protein targets.
The structural models were generated by
QUARK
based structure assembly simulations.
Based on the confidence score estimation, 72 proteins are predicted to
have a correct fold (TM-score>0.5) and 321 have a substantial portion of
structure correctly modeled (TM-score >0.35).
Please refer to the following paper for detail of the data:
- Dong Xu, Yang Zhang.
Ab Initio Structure Prediction for Escherichia coli: Towards Genome-wide Protein Structure Modeling and Fold Assignment.
Scientific Reports. 3: 1895 (2013)
(download the PDF file)
- Dong Xu, Yang Zhang.
Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field.
Proteins, 80: 1715-1735 (2012) (download the PDF file).
-
Template based protein structure prediction dataset
contains 3,784 E. coli proteins with length ranging from 14 to 2,358 residues.
These proteins generally have homologous templates that can be detected by
LOMETS from the PDB library, and are judged as Easy or Medium homology
protein targets.
To model the structure of these proteins, possible templates are first ifentified
from the PDB library by LOMETS,
which consists of 9 state of the art threading programs.
The continuous fragments are then excised from the templates and used
to re-assemble the full-length models by the
I-TASSER simulations.
Finally, FG-MD
simulations are used to refine the atomic models.
Please refer to the following paper for detail of the data:
- Dong Xu, Yang Zhang.
Ab Initio Structure Prediction for Escherichia coli: Towards Genome-wide Protein Structure Modeling and Fold Assignment.
Scientific Reports. 3: 1895 (2013)
(download the PDF file)
-
Ambrish Roy, Alper Kucukural, Yang Zhang.
I-TASSER: a unified platform for automated protein structure and function prediction.
Nature Protocols, 5: 725-738 (2010). (download the PDF file).
-
Protein-protein interaction dataset
contains quaternary structure models for 35,125 protein-protein interactions
in the E. coli genome.
To generate these models, we used
Threpp
to first thread the monomer
sequences in the E. coli genome through the oligomer structures in the
PDB. The quaternary complex structures are then constructed by match
the other E. coli sequences with the binding partners of the templates
along the oligomer entries in the PDB database.
Finally, the best complex models are selected based on a composite score
containing the threading Z-score, interface structure match and the
statistics contact potential.
Please refer to the following paper for detail of the data:
- Weikang Gong, Aysam Guerler, Chengxin Zhang, Elisa Warner, Chunhua Li, Yang Zhang.
Integrating multimeric threading with high-throughput experiments for structural interactome of Escherichia coli.
Journal of Molecular Biology, 433: 166944 (2021).
[PDF]
[Support Information]
[Server]
-
Aysam Guerler, Brandon Govindarajoo and Yang Zhang.
Mapping monomeric threading to protein-protein structure prediction,
Journal of Chemical Information and Modeling
2013, 53: 717-725.
[PDF]
[Server]