EcoCyc Project Overview

EcoCyc is a bioinformatics database that describes the genome and the biochemical machinery of E. coli K-12 MG1655. The long-term goal of the project is to describe the molecular catalog of the E. coli cell, as well as the functions of each of its molecular parts, to facilitate a system-level understanding of E. coli. EcoCyc is an electronic reference source for E. coli biologists, and for biologists who work with related microorganisms.

This page provides an overview of the data content of EcoCyc. See also the list of data sources from which EcoCyc integrates data.

Genome. EcoCyc contains the complete genome sequence of E. coli, and describes the nucleotide position and function of every E. coli gene. A staff of five full-time curators updates the annotation of the E. coli genome on an ongoing basis using a literature-based curation strategy. Mini-review summaries of E. coli gene products can be found in EcoCyc protein and RNA pages. Users can retrieve the nucleotide sequence of a gene, and the amino-acid sequence of a gene product.

Regulation. EcoCyc describes several types of E. coli cellular regulation:

Membrane transporters. EcoCyc annotates E. coli transport proteins, and the associated transport reactions that they mediate.

Metabolism. EcoCyc describes all known metabolic pathways and signal-transduction pathways of E. coli. It describes each metabolic enzyme of E. coli, including its cofactors, activators, inhibitors, and subunit structure. See also the MetaCyc project.

Database links. EcoCyc is linked to other biological databases containing protein and nucleic acid sequence data, bibliographic data, protein structures, and descriptions of different E. coli strains.

Literature-Based Curation. Curation is the process of manually refining and updating a bioinformatics database. The EcoCyc project uses a literature-based curation approach in which database updates are based on evidence in the experimental literature. EcoCyc is largely up to date with respect to its curation activities. As of October 2013, EcoCyc has encoded information from more than 25,406 publications.

Curators collect gene, protein, pathway, and compound names and synonyms. They classify genes and gene products using the Gene Ontology and MultiFun ontology, and they classify pathways within the Pathway Tools pathway ontology. Protein complex components and the stoichiometry of these subunits are captured; cellular localization of polypeptides and protein complexes is entered, as are experimentally determined protein molecular weights; enzyme activities and any enzyme prosthetic groups, cofactors, activators, or inhibitors are captured. Operon structure and gene regulation information are encoded. Textual summaries with extensive citations are authored by curators. Within the summaries for proteins, RNAs, pathways, and operons, curators capture additional information not captured in the highly structured database fields of EcoCyc. For example, curators use the free-text summary sections to capture phenotypes caused by mutation, depletion, or overproduction of each gene product; any genetic interactions known; protein domain architecture and structural studies; similarity to other proteins; or any functional complementation experiments that have been described. Summaries can also be used to note cases in which the published reports present contradictory results. In such cases, both viewpoints will be presented with proper attribution. This approach assures that no information is lost.

Query and visualization. Scientists can use the EcoCyc web site or the downloadable Pathway Tools software to visualize the layout of genes within the E. coli chromosome, or of an individual biochemical reaction, or of a complete biochemical pathway (with compound structures displayed). The navigation capabilities of the software allow a user to move from a display of an enzyme to a display of a reaction that the enzyme catalyzes, or to the gene that encodes the enzyme.

Analysis of omics data. EcoCyc provides four tools for analysis of omics datasets:

Underlying software. The Pathway Tools software that underlies EcoCyc is not specific to E. coli, but has been applied to manage genomic and biochemical data for hundreds of organisms.

"EcoCyc" is pronounced "eeko-sike". It sounds like "ecology" and like "encyclopedia".

The Roles of EcoCyc in Microbial Genome Annotation

The EcoCyc database can impact two aspects of microbial genome annotation: annotation of gene function, and annotation of metabolic pathways.

Annotation of Gene Function

We suggest that microbial genome annotation pipelines include a BLAST search (or a search by other sequence similarity tools) against all proteins with experimentally defined functions from EcoCyc. As discussed in our article Multidimensional annotation of the Escherichia coli K-12 genome, E. coli contains more proteins of experimentally determined functions than any other organism.

Strong similarity hits to the preceding proteins should be preferred over hits against other proteins during assignment of functions to newly sequenced genes to minimize the chances of annotation errors due to transitive annotations.

Annotation of Metabolic Pathways

EcoCyc and its cousin MetaCyc can be used to predict the metabolic pathways of an organism from its sequenced genome using the PathoLogic component of the Pathway Tools software.

MetaCyc is a reference pathway database for pathway prediction by PathoLogic. As of March 2013, MetaCyc contains more than 2,000 experimentally elucidated metabolic pathways. PathoLogic predicts the presence of MetaCyc pathways in an annotated genome by matching enzymes in the annotated genome against enzymes within MetaCyc pathways.

We Encourage Your Feedback

Feedback from the scientific community has been invaluable to improving EcoCyc during its many years of development. We strongly encourage your comments and suggestions for improvements in areas including the following. Please email suggestions or questions to our .

Acknowledgments

The development of EcoCyc is funded by NIH grants GM77678 and GM71962 from the NIH National Institute of General Medical Sciences.

Contributors to EcoCyc are listed on the credits page.