Integration of bioinformatics to biodegradation
Biological Procedures Online volume 16, Article number: 8 (2014)
Bioinformatics and biodegradation are two primary scientific fields in applied microbiology and biotechnology. The present review describes development of various bioinformatics tools that may be applied in the field of biodegradation. Several databases, including the University of Minnesota Biocatalysis/Biodegradation database (UM-BBD), a database of biodegradative oxygenases (OxDBase), Biodegradation Network-Molecular Biology Database (Bionemo) MetaCyc, and BioCyc have been developed to enable access to information related to biochemistry and genetics of microbial degradation. In addition, several bioinformatics tools for predicting toxicity and biodegradation of chemicals have been developed. Furthermore, the whole genomes of several potential degrading bacteria have been sequenced and annotated using bioinformatics tools.
Millions of toxic chemicals have been produced for use in a variety of industries . These chemicals have often been released into the environment due to anthropogenic activities, where they contaminate soil and water . Furthermore, many chemicals persist in the environment, causing severe problems to living organisms; accordingly, it is crucial that these compounds be removed from the environment .
Biodegradation is the break-down of chemicals or xenobiotic compounds by microbes and plants . Biodegrading microbes degrade toxic chemicals via either mineralization or co-metabolism . In the process of mineralization, microbes completely degrade toxic chemicals by utilizing them as carbon and energy sources, whereas co-metabolism results in biotransformation of toxic compounds into less toxic compounds [4, 5].
Microbial remediation is an emerging technology for the removal of toxic chemicals from the environment [4–6]. A large number of microbes capable of utilizing toxic chemicals as their sole sources of carbon and energy have been isolated, many of which break complex chemical compounds down to carbon dioxide and water through a series of chemical reactions catalyzed by microbial enzymes [5–8], such as monooxygenases, dioxygenases, reductases, deaminases, and dehalogenases. The genes encoding these enzymes have been identified in a variety of microbes and cloned into bacteria to increase the efficiency of bioremediation. The degradation of a specific toxic chemical requires a specific microbe that depends on the structure of that chemical and the presence of the enzyme systems in bacteria for degradation of the compound. Therefore, knowledge regarding chemicals (classification, identification, environmental properties, toxicity, distribution, and associated risks) as well as their microbial biodegradation (xenobiotics degrading bacteria, enzymes, genes, proteins) can improve bioremediation process.
Bioinformatics, which has been incorporated into each branch of life sciences, provides a platform for researchers to develop valuable computational tools for human and environmental welfare [9, 10]. In the last few decades, bioinformatics has been integrated with biodegradation and several bioinformatics tools useful in the field of biodegradation have been developed. These include databases [11–14], chemical toxicity prediction systems [15, 16], biodegradation pathway prediction systems [17–20], and next-generation sequencing [21–24]. Here, we discuss the relationship of bioinformatics tools with biodegradation.
In recent years, an increasing number of databases have been developed to provide information regarding chemicals and their biodegradation. These databases may be characterized into two categories: chemical databases and biodegradative databases. Table 1 provides a list of various chemical databases that enable classification identification and risk assessment of chemicals or describe their environmental properties, toxicity and distribution.
Biodegradative databases store information related to biodegradation of chemicals including xenobiotics-degrading bacteria, metabolic degradation pathways of toxic chemicals, enzymes and genes involved in the biodegradation. These databases include the University of Minnesota Biocatalysis/Biodegradation database (UM-BBD), a database of biodegradative oxygenases (OxDBase), Biodegradation Network-Molecular Biology database (Bionemo), MetaCyc, and BioCyc.
The UM-BBD is a well-known database in the field of biodegradation that is freely available at http://umbbd.ethz.ch/. This database provides information pertaining to multiple fields of interest including microbes, biotransformation rules, enzymes, genes and reactions involved in microbial degradation . This database mainly focuses on the metabolic pathways of xenobiotic compounds which are available in text as well as graphic formats. Pathways represent multisteps enzymatic reactions in a series initiating from the starting compound and proceeds via the formation of intermediates. There is a diversity of the bacteria that can degrade a chemical compound via different pathways. All known pathways for a single compound are included in the UM-BBD metabolic pathway page (known as pathway map) of a particular compound with the information of the bacteria and enzymes involved in the degradation of that compound. Figure 1 represents the UM-BBD pathway map of 2-nitrobenzoic acid where two bacterial degradation pathways are present. Both pathways were initiated with the formation of 2-hydroxylaminobenzoic acid that further degraded via two different pathways in different bacteria. Currently, the UM-BBD database comprises (i) 219 microbial degradation pathways; (ii) 1503 chemical reactions; (iii) 993 enzymes; (iv) 543 microbes; (v) 250 biotransformation rules; (vi) 50 functional groups; (vii) 76 reactions of naphthalene 1, 2- dioxygenase and (viii) 109 reactions of toluene dioxygenase. This database is cross linked to several others including ExPASy, BRENDA, Enzyme and NCBI to provide information describing genes and enzymes involved in the degradation of xenobiotic compounds .
Another database, OxDBase (http://www.imtech.res.in/raghava/oxdbase/), which was developed by the CSIR-Institute of Microbial Technology, Chandigarh, India, stores information regarding oxygenases derived from published literature and databases . Oxygenases are the most important enzymes involved in aerobic degradation of aromatic compounds . There are two types of oxygenases, monooxygenases and dioxygenases. Monooxygenases catalyze incorporation of one atom of molecular oxygen into substrate whereas dioxygenases catalyze incorporation of two atoms of molecular oxygen . Dioxygenases are further divided into aromatic ring hydroxylating dioxygenases (ARHD) and aromatic ring cleavage dioxygenases (ARCD). ARHD catalyze hydroxylation of aromatic rings, whereas ARCD catalyze ring cleavage of aromatic rings . ARCDs are further divided into extradiol and intradiol. Intradiol ARCDs cleave aromatic rings between two hydroxyl groups, whereas extradiol cleaves rings between hydroxylated carbons and adjacent non-hydroxylated carbons . OxDBase provides information about 237 distinct oxygenases, including monooxygenases (118) and dioxygenases (ARCD, ARHD, intradiol and extradiol) (119). All enzyme entries contain information about (a) reaction(s) in which enzymes are involved, (b) their common names and synonyms, (c) structures and gene links, (d) families and subfamilies, (e) literature citations and (f) links to several external databases including the Kyoto Encyclopedia of Genes and Genomes (KEGG, http://www.genome.jp/kegg/), UM-BBD, BRENDA, and ENZYME. This database is user-friendly and increases our understanding of aerobic degradation of aromatic compounds .
The Bionemo database (http://bionemo.bioinfo.cnio.es) was developed by the structural Computational Biology Group at the Spanish National Cancer Research Center . Bionemo is a manually curated database that provides information regarding proteins and genes involved in biodegradation metabolism . The protein information involves sequences, domains and structures for proteins, whereas the genomic information involves sequences, regulatory elements and transcription units for genes . Bionemo complements UM-BBD, which focuses on the biochemical aspects of biodegradation . Bionemo has been developed by manually associating sequence database entries to biodegradation reactions based on the information extracted from published articles . Information related to the transcription units and their regulation of biodegradation genes is linked to the underlying biochemical network. This database is composed of (i) 145 biochemical pathways, (ii) 945 reactions in which 342 reactions are with associated complexes, (iii) 537 enzymatic complexes, (iv) 1107 proteins, (v) 234 microbial species (vi), 212 transcription units (vii), 90 transcription factors, (viii) 90 effectors, (XI) 128 TF DNA binding sites and (X) 100 promoters. Like other databases, Bionemo is cross linked to the following databases: (i) UMBBD for metabolic reaction; (ii) GenBank for DNA sequences; (iii) Uniport for protein; (iv) NCBI Taxonomy for microbial species and (v) PubMed for references . The information provided by Bionemo may be helpful for cloning, primer design and directed evolution experiments. The full database is downloadable as a PostgresSQL dump .
MetaCyc is a database of metabolic pathways derived from the scientific experimental literature that comprises more than 2097 experimentally determined metabolic pathways from more than 2460 different organisms. This is the largest curated database of metabolic pathways of all domains of life . This database provides information regarding the metabolic pathways involved in primary and secondary metabolism with associated compounds, enzymes and genes . This database is freely available at http://metacyc.org/. MetaCyc can be used for multiple scientific applications. Specifically, it can (i) provide reference data for computational prediction of the metabolic pathways of organisms from their sequenced genomes, (ii) support metabolic engineering, (iii) facilitate comparison of biochemical networks, and (iv) serve as an encyclopedia of metabolism . This database was developed and curated by the BioCyc group at SRI international.
BioCyc (http://biocyc.org/) is a collection of more than 2988 organism-specific Pathway/Genome Databases (PGDBs). Each PGDB contains the full genome and predicted metabolic pathway of a single organism . The pathway tool software predicts pathways using MetaCyc as a reference database . The predicted metabolic pathway contains information about metabolites, enzymes, and reactions. In addition, BioCyc PGDBs contain information about predicted operons, transport systems and pathway-hole fillers . BioCyc pathway tool based web sites offer multiple tools for querying and analysis of PGDBs, including analysis of gene expression, metabolomics, and other large-scale datasets . This database was developed by the Bioinformatics Research Group at SRI International.
Pathway prediction systems
Only a small portion of toxic chemicals have been tested for their microbial degradation; however, a large number of toxic chemicals remain unexplored for biodegradation testing, despite the fact that they have been released into the environment. Knowledge regarding the degradation of these compounds is essential to determination of the fate of these chemicals in the environment. In such cases, computational tools may be used to predict biodegradation pathways for these toxic chemicals. Several pathway prediction systems have been developed using either non-biochemically based or biochemically based methods [56, 57]. Non-biochemically based pathway prediction systems use statistical inference methods to generate reactions between compounds . These systems include machine learning methods , the Bayesian method , comparative genomics  and metabolic network alignment . These methods are very useful to identify missing links in the network [57, 62]. The disadvantage of these methods is that these reactions are based on statistical inference alone; therefore; many of them could be biochemically infeasible . Biochemically-based pathway prediction systems work on knowledge based biotransformation rules. Table 2 summarizes the role of various pathway prediction systems useful in the field of biodegradation. Here, we are presenting some details of biochemically based pathway prediction systems.
The UM-BBD-Pathway Prediction System (PPS) is a part of UM-BBD that may be accessed at http://umbbd.ethz.ch/predict/. The PPS can be used to predict metabolic pathways for microbial degradation of chemical compounds . Predictions are based on biotransformation rules derived from reactions found in the UM-BBD database or in the scientific literature . Users can predict both aerobic and anaerobic degradation pathways of chemicals and can select whether they will view all or only the more likely aerobic transformations . Users can also obtain the most accurate prediction for those compounds similar to compounds with biodegradation pathways that have been reported in the scientific literature . For example, the degradation pathways of 4-nitrophenol have been thoroughly investigated, while those of 2-fluro-4-nitrophenol and 2-bromo-4-nitrophenol have not. However, the structures of 2-fluro-4-nitrophenol and 2-bromo-4-nitrophenol are similar to 4-nitophenol; therefore, PPS can provide very accurate predictions for degradation of 2-flouro-4-nitrophenol and 2-bromo-4-nitrophenol. For the prediction, users may enter a compound into the system by either drawing the structure and generating SMILES or entering SMILES directly.
Another pathway prediction system, PathPred (http://www.genome.jp/tools/pathpred/), is a knowledge based prediction system that uses data derived from the Kyoto Encyclopedia of Genes and Genomes (KEGG) in the form of the KEGG REACTION database and KEGG repair database . The KEGG REACTION database contains not only all known enzymatic reactions taken from the IUBMB enzyme nomenclature, but also additional reactions taken from the KEGG metabolic pathways . KEGG RPAIR is a collection of biochemical structure transformation patterns (RDM patterns) for substrate–product pairs (reactant pairs) in KEGG REACTION. PathPred is a web-based server that predicts plausible enzyme-catalyzed reaction pathways from a query compound using information regarding RDM patterns and chemical structure alignments of substrate-product pairs. This server provides plausible reactions and transformed compounds and displays all predicted reaction pathways in a tree-shaped graph. PathPred based predictions are very accurate for compounds that have biochemical similarity to KEGG compounds . PathPred contains reference pathways (i) for microbial biodegradation of environmental compounds and (ii) for biosynthesis of plant secondary metabolites. The users can select one of the reference pathways according to their purpose . There are multiple user friendly methods for searching a pathway for query. Specifically, a query compound can be input (i) in the MDL mol file format, (ii) the SMILES representation, or (iii) by the KEGG compound identifier. In the case of the xenobiotics biodegradation reference pathway, users should use the compound to undergo biodegradation as a query, while in the case of the reference pathway of biosynthesis of secondary metabolites the query should be the end product of biosynthesis. The prediction results are linked to genomic information . The PathPred server provides new and alternative reactions, regardless of whether enzymes for these reactions are known or not. If the enzyme is unknown, users can use the E-zyme tool (http://www.genome.jp/tools/e-zyme/) to assign a possible EC number (up to the EC sub-subclass). After assigning EC numbers, it is also possible to search the putative genes in the genome based on sequence similarity of known genes with the same EC sub-subclass .
Biochemical Network Integrated Computational Explorer (BNICE) is computational approach for development of novel pathways based on the reaction rules of the Enzyme Commission classification system . BNICE generates all possible pathways from a given target or starting molecule. In the next step, BNICE screens out all possible pathways for thermodynamic feasibility based on the Gibbs free energies of the reaction and selects feasible novel thermodynamic pathways . Soh and Hatzimanikatis  suggested that the pathways generated by BNICE can be further evaluated using established pathway analysis approaches, such as thermodynamics-based flux balance analysis (FBA) GrowMatch, which allows investigation of the overall effects of these novel pathways on metabolic network performance in host organisms . FBA can help predict maximum yield, phenotypic changes, effects of gene knockouts, changes in bioenergetics of the system for metabolic engineering, synthetic biology, and biodegradation of xenobiotics . BNICE can be applied in multiple areas: (i) to discover novel pathways for metabolic engineering; (ii) for ‘retrosynthesis’ of metabolic chemicals, (iii) to investigate evolution between metabolic pathways of various organisms; (iv) to analyze metabolic pathways; (v) for mining of omics data; (vi) to select targets for enzyme engineering; and for (viii) analysis of degradation pathways of xenobiotic compounds .
From Metabolite to Metabolite (FMM) is a web server freely available at http://FMM.mbc.nctu.edu.tw/ that is able to search all possible pathways between known input and output compounds among various species based on the KEGG database and other integrated biological databases . FMM can generate combined pathway maps by combining the KEGG maps and KEGG LIGAND information . This server provides information regarding the corresponding enzymes, genes and organisms and provides a platform called “comparative analysis,” in which metabolic pathways can be compared between several species. FMM is an efficient tool for drug production, biofuel production, synthetic biology and metabolic engineering . For biodegradation purposes, we can search metabolic pathways of only those xenobiotic compounds for which information is available in the KEGG database. One example is presented in Figure 2, which shows the search of a pathway between 4-nitrophenol and 2-maleylacetate.
A recently developed web tool, Metabolic Tinker (http://osslab.ex.ac.uk/tinker.aspx) can be used to design synthetic metabolic pathways between user-defined target and source compounds . Metabolic Tinker uses a tailored heuristic search strategy to search for thermodynamically feasible paths in the entire known metabolic universe . The program contains a directed graph known as Universal Reaction Network (URN), which represents the entire set of known reactions and compounds from the Rhea database . Nodes and edges on this graph represent metabolites and reactions, respectively, and thus the entire graph represents the current known metabolic universe . Metabolic tinker searches possible biochemical paths between two compounds within this URN using standard search algorithms developed in computer science and graph theory . The Rhea/CHEBI identification codes of both the source and target compounds are needed to complete the search .
Computational methods for predicting chemical toxicity
The computational methods for estimating chemical toxicity are evolving rapidly . In recent years, several models have been developed in which computational programs have been used to predict the toxicity of chemical compounds [22–24, 67, 68]. Quantitative structure-regulatory activity relationship (QSAR) models calculate toxicity based on the physical characteristics of the structure of chemicals such as the molecular weight or the number of benzene rings (molecular descriptors) using mathematical algorithms . Following are the some examples of commercial and publicly-available models:
Sarah Nexus for prediction of the mutagenicity of chemicals .
VirtualToxLab for prediction of the toxic potential (endocrine and metabolic disruption, some aspects of carcinogenicity and cardiotoxicity) of drugs, chemicals and natural products .
Toxicity Estimation Software Tool (TEST) for prediction of the acute toxicity of organic chemicals based on their molecular structures .
TOPKAT for prediction of the ecotoxicity, mutagenicity, and reproductive/developmental toxicity of chemicals .
Ecological Structure Activity Relationships (ECOSAR) for estimation of the aquatic toxicity (acute short-term), toxicity and chronic (long-term or delayed) toxicity of industrial chemicals to aquatic organisms such as fish, aquatic invertebrates, green algae and aquatic plants by using computerized structure activity relationships 
Estimation Programs Interface (EPI) suite for prediction of physical/chemical properties and environmental fate (eco-toxicity). The software calculates chemical property data using programs including KOWWIN, AOPWIN, HENRYWIN, MPBPWIN, BIOWIN, KOCWIN, WSKOWWIN, WATERNT, BCFBAF, HYDROWIN and ECOSAR .
CAESAR for assessment of chemical toxicity under the REACH .
ToxiPred: A server for prediction of aqueous toxicity of small chemical molecules in Tetrahymena pyriformis.
Genome sequences of xenobiotic degrading bacteria
The automated Sanger method for sequencing is known as first generation sequencing, whereas newer methods developed for sequencing are considered next generation sequencing (NGS) . Commercially available NGS technologies include Roche/454, Illumina/Solexa, SOLiD/Life/APG, Helicos BioSciences, and the Polonator Instrument .
The initial steps of NGS involve generation of short reads and their subsequent alignment to a reference genome. The latter step is crucial for NGS technologies, and a variety of computational tools have been applied for genome sequence assembly including SSAKE , SOAPdenovo , AbySS , and Velvet . Once the sequence reads are assembled into contigs, the next steps are gene prediction and functional annotation. The most common gene prediction system for microbial systems is GLIMMER (Gene Locator and Interpolated Markov ModelER), which identifies the coding region on the microbial genome based on interpolated Markov models [83, 84]. The predicted coding region sequences may be analyzed and evaluated manually or by automatic annotation software to identify the homologous genes. A variety of automatic pipelines are available for bacterial annotation, including online tools such as RAST , BASys , WeGAS  and MaGe/Microscope , as well as offline tools such as AGeS , DIYA  and PIPA . Furthermore, MICheck  may be used to check for syntactic errors in annotated sequences.
NGS ignited a revolution in biodegradation and bioremediation with the concept of “from genomics to metabolomics.” Bacterial genomics is the study of the whole genomes of bacteria in which genes involved in biodegradation and other metabolic processes can be predicted. The whole genomes of several xenobiotic degrading bacteria have been sequenced using NGS technology, and several xenobiotic-degrading genes have been identified through gene predictions and annotation of the bacterial genomes [93–97]. In silico analysis of the bacterial genome leads to prediction of metabolic pathways for the biodegradation of xenobiotics and gives a holistic view of the metabolic network of particular bacteria . Several metabolic pathways may be predicted from the genomes of xenobiotic degrading bacteria [99, 100]. For example, the whole genome of Cupriavidus necator JMP134 (previously known as Ralstonia eutropha, Strain JMP134), which utilizes a variety of aromatic and chloroaromatic compounds as the sole carbon and energy sources, was sequenced and several genes coding the enzymes involved in the degradation of various xenobiotic compounds were identified [100, 101]. The genome of strain JMP134 comprises four replicons (two chromosomes and two plasmids) with a total of 6631 protein coding genes. The C. necator JMP134 genome contains 300 genes putatively involved in central ring-cleavage pathways of various aromatic compounds .
In silico analysis of the genome of Pseudomonas putida KT2440 showed that the presence of the following pathways for degradation of aromatic compounds: (i) the ortho pathway for the catabolism of protocatechuate (pca genes) and catechol (cat genes), (ii) the phenylacetate pathway (pha genes), and (iii) the homogentisate pathway (hmg genes) . Additionally, the gene clusters for catabolism of N-heterocyclic aromatic compounds (nic cluster) and in a central meta-cleavage pathway (pcm genes) were also identified in the genome of this microorganism .
Whole-genome sequences are not only useful for prediction of genes and their functions, but also for identification of novel biocatalysts . Combining the genomic approach with proteomic approaches will lead to new insights into metabolism at the organism level . Kim et al.  used metabolic, genomic and proteomic approaches to construct a complete and integrated pathway for pyrene degradation in Mycobacterium vanbaalenii PYR-1 and identified 27 enzymes that were used to construct a complete pathway for pyrene degradation based on genomic and proteomic data .
Several databases have been developed for providing the information on chemicals and their biodegradation. Users can use these databases to retrieve the information according to their research interests. For example, users can retrieve the information on toxicity, risk assessment, and environmental properties of the chemicals using chemical databases. Furthermore several bioinformatics tools have been developed for the prediction of the toxicity of chemicals. Users can use these tools for prediction of the toxicity of the chemicals. In addition, several pathway prediction systems are available for predicting the degradation pathways for those chemicals whose degradation pathways are not known in literature. The UM-BBD and PathPred are well known pathway prediction systems for biodegradation purpose. Using these pathway prediction systems, users can predict not only the degradation pathways, but also identify enzymes involved in the degradation pathways. This approach would be very useful for metabolic engineering and also to develop the strategy for bioremediation. The major problem related to the pathway predictions is that the predicted pathways are yet not experimentally verified. In the future, experimental studies should be carried out to verify the predicted pathways. Furthermore, the genomes of the several xenobiotics-degrading bacteria have been sequenced using NGS and the genes and enzymes involved in the biodegradation have been identified using gene-annotation. In future, molecular techniques along with bioinformatics tools may provide new insights into the genetics of the biodegradation.
Ellis LB, Wackett LP: Use of the University of Minnesota Biocatalysis/Biodegradation Database for study of microbial degradation. Microb Inform Exp. 2012, 2: 1-10.1186/2042-5783-2-1.
Arora P, Shi W: Tools of bioinformatics in biodegradation. Rev Environ Sci Biotechnol. 2010, 9: 211-213. 10.1007/s11157-010-9211-x.
Andrady AL: Biodegradation of plastics: monitoring what happens. Plastics Additives. 1998, 1: 32-40. 10.1007/978-94-011-5862-6_5. Springer Netherlands
Arora PK, Sasikala C, Ramana CV: Degradation of chlorinated nitroaromatic compounds. Appl Microbiol Biotechnol. 2012, 93 (6): 2265-2277. 10.1007/s00253-012-3927-1.
Arora PK, Srivastava A, Singh VP: Bacterial degradation of nitrophenols and their derivatives. J Hazard Mater. 2014, 266: 42-59.
Arora PK, Bae H: Bacterial degradation of chlorophenols and their derivatives. Microb Cell Fact. 2014, 13: 31-10.1186/1475-2859-13-31.
Karigar CH, Rao SS: Role of microbial enzymes in the bioremediation of pollutants: a review. Enzyme Res. 2011, 2011: 11-
Arora PK, Srivastava A, Singh VP: Application of monooxygenases in dehalogenation, desulphurization, denitrification and hydroxylation of aromatic compounds. J Bioremed Biodegrad. 2010, 1: 112-
Katara P: Role of bioinformatics and pharmacogenomics in drug discovery and development process. Netw Modeling Anal Health Inform Bioinforma. 2013, 2 (4): 225-230. 10.1007/s13721-013-0039-5.
Debes JD, Urrutia R: Bioinformatics tools to understand human diseases. Surgery. 2004, 135: 579-585. 10.1016/j.surg.2003.11.010.
Ellis LBM, Roe D, Wackett LP: The University of Minnesota Biocatalysis/Biodegradation Database: the first decade. Nucleic Acids Res. 2006, 34: D517-D521. 10.1093/nar/gkj076.
Arora PK, Kumar M, Chauhan A, Raghava GP, Jain RK: OxDBase: a database of oxygenases involved in biodegradation. BMC Res Notes. 2009, 2: 67-10.1186/1756-0500-2-67.
Carbajosa G, Trigo A, Valencia A, Cases I: Bionemo: molecular information on biodegradation metabolism. Nucleic Acids Res. 2009, 37 (Database issue): D598-602.
Caspi R, Altman T, Dreher K, Fulcher CA, Subhraveti P, Keseler IM, Kothari A, Kubo A, Krummenacker M, Latendresse M, Mueller LA, Ong Q, Paley S, Subhraveti P, Weaver DS, Weerasinghe D, Zhang P, Karp PD: The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res. 2012, 40 (D1): D742-D753. 10.1093/nar/gkr1014.
Greene N: Computer systems for the prediction of toxicity: an update. Adv Drug Deliv Rev. 2002, 54 (3): 417-431. 10.1016/S0169-409X(02)00012-1.
Mohan CG, Gandhi T, Garg D, Shinde R: Computer-assisted methods in chemical toxicity prediction. Mini Rev Med Chem. 2007, 7 (5): 499-507. 10.2174/138955707780619554.
Gao J, Ellis LB, Wackett LP: The University of Minnesota pathway prediction system: multi-level prediction and visualization. Nucleic Acids Res. 2011, 39 (Suppl 2): W406-W411.
Moriya Y, Shigemizu D, Hattori M, Tokimatsu T, Kotera M, Goto S, Kanehisa M: PathPred: an enzyme-catalyzed metabolic pathway prediction server. Nucleic Acids Res. 2010, 38: W138-W143. 10.1093/nar/gkq318.
Finley SD, Broadbelt LJ, Hatzimanikatis V: Computational framework for predictive biodegradation. Biotechnol Bioeng. 2009, 104: 1086-1097. 10.1002/bit.22489.
Chou CH, Chang WC, Chiu CM, Huang CC, Huang HD: FMM: a web server for metabolic pathway reconstruction and comparative analysis. Nucleic Acids Res. 2009, 37: W129-W134. 10.1093/nar/gkp264.
McClymont K, Soyer OS: Metabolic tinker: an online tool for guiding the design of synthetic metabolic pathways. Nucleic Acids Res. 2013, 41 (11): e113-10.1093/nar/gkt234.
Zheng M, Liu Z, Xue C, Zhu W, Chen K, Luo X, Jiang H: Mutagenic probability estimation of chemical compounds by a novel molecular electrophilicity vector and support vector machine. Bioinformatics. 2006, 22: 2099-2106. 10.1093/bioinformatics/btl352.
Wang Y, Lu J, Wang F, Shen Q, Zheng M, Luo X, Zhu W, Jiang H, Chen K: Estimation of carcinogenicity using molecular fragments tree. J Chem Inf Model. 2012, 52: 1994-2003. 10.1021/ci300266p.
Chen L, Lu J, Zhang J, Feng KR, Zheng MY, Cai YD: Predicting chemical toxicity effects based on chemical-chemical interactions. PLoS One. 2013, 8 (2): e56517-10.1371/journal.pone.0056517.
The ChemIDplus. [http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?CHEM ]
Schöning G: Classification & labelling inventory: role of ECHA and notification requirements. Ann Ist Super Sanita. 2011, 47 (2): 140-145.
The NCLASS (the Nordic N-Class Database on Environmental Hazard Classification). [http://apps.kemi.se/nclass/default.asp]
The Hazardous Substances Data Bank (HSDB). [http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?HSDB]
The Toxicology Literature Online (TOXLINE). [http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?TOXLINE]
The Chemical Carcinogenesis Research Information System (CCRIS). [http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?CCRIS]
The Developmental and Reproductive Toxicology Database (DART). [http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?DARTETIC]
The Genetic Toxicology Data Bank (GENE-TOX). [http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?GENETOX]
The Integrated Risk Information System (IRIS). [http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?IRIS]
Wullenweber A, Kroner O, Kohrman M, Maier A, Dourson M, Rak A, Wexler P, Tomljanovic C: Resources for global risk assessment: The International Toxicity Estimates for Risk (ITER) and Risk Information Exchange (RiskIE) databases. Toxicol Appl Pharmacol. 2008, 233: 45-53. 10.1016/j.taap.2007.12.035.
Wexler P: TOXNET: an evolving web resource for toxicology and environmental health information. Toxicology. 2001, 157: 3-10. 10.1016/S0300-483X(00)00337-1.
Schmidt U, Struck S, Gruening B, Hossbach J, Jaeger IS, Parol R, Lindequist U, Teuscher E, Preissner R: SuperToxic: a comprehensive database of toxic compounds. Nucleic Acids Res. 2009, 37 (Database issue): D295-D299.
Kinsner-Ovaskainen A, Rzepka R, Rudowski R, Coecke S, Cole T, Prieto P: Acutoxbase, an innovative database for in vitro acute toxicity studies. Toxicol In Vitro. 2009, 23: 476-485. 10.1016/j.tiv.2008.12.019.
The CTD (Comparative Toxicogenomics Database). [http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?CTD]
The Carcinogenic Potency Database. [http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?CPDB.htm]
The IUCLID - International Uniform Chemical Information Database. [http://iuclid.eu/]
The Haz Map. [http://hazmap.nlm.nih.gov/]
Hochstein C, Szczur M: TOXMAP: a GIS-based gateway to environmental health resources. Med Ref Serv Q. 2006, 25 (3): 13-31. 10.1300/J115v25n03_02.
The Toxics Release Inventory (TRI). [http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?TRI]
The Household Products Database. [http://hpd.nlm.nih.gov/]
The ESIS, European chemical Substances Information System. [http://esis.jrc.ec.europa.eu/]
The ECOTOX (AQUIRE, PHYTOTOX, TERRETOX). [http://cfpub.epa.gov/ecotox/]
The eChemPortal. [http://www.echemportal.org/echemportal/index?pageID=0&request_locale=en]
The ACToR (Aggregated Computational Toxicology Resource). [http://actor.epa.gov/actor/faces/BasicInfo.jsp]
The EPA Human Health Benchmarks for Pesticides (HHBP). [http://iaspub.epa.gov/apex/pesticides/f?p=HHBP:home]
The EPA Office of Pesticide Programs’ Aquatic Life Benchmarks (OPPALB). [http://www.epa.gov/oppefed1/ecorisk_ders/aquatic_life_benchmark.htm]
The Chemical Safety Information from Intergovernmental Organizations-INCHEM. [http://www.inchem.org/pages/about.html]
The JECDB: Japan Existing Chemical Data Base. [http://dra4.nihs.go.jp/mhlw_data/jsp/SearchPageENG.jsp]
The SPIN (Substances in Preparations In the Nordic countries). [http://www.spin2000.net/]
The US EPA: Substance Registry Services (SRS). [http://iaspub.epa.gov/sor_internet/registry/substreg/home/overview/home.do]
Medema MH, van Raaphorst R, Takano E, Breitling R: Computational tools for the synthetic design of biochemical pathways. Nat Rev Microbiol. 2012, 10 (3): 191-202. 10.1038/nrmicro2717.
Soh KC, Hatzimanikatis V: DREAMS of metabolism. Trends Biotechnol. 2010, 28 (10): 501-508. 10.1016/j.tibtech.2010.07.002.
Dale JM, Popescu L, Karp PD: Machine learning methods for metabolic pathway prediction. BMC Bioinformatics. 2010, 11 (1): 15-10.1186/1471-2105-11-15.
Green ML, Karp PD: A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases. BMC Bioinformatics. 2004, 5 (1): 76-10.1186/1471-2105-5-76.
Piškur J, Schnackerz KD, Andersen G, Björnberg O: Comparative genomics reveals novel biochemical pathways. Trends Genet. 2007, 23 (8): 369-372. 10.1016/j.tig.2007.05.007.
Cheng Q, Harrison R, Zelikovsky A: MetNetAligner: a web service tool for metabolic network alignments. Bioinformatics. 2009, 25 (15): 1989-1990. 10.1093/bioinformatics/btp287.
Osterman A, Overbeek R: Missing genes in metabolic pathways: a comparative genomics approach. Curr Opin Chem Biol. 2003, 7 (2): 238-251. 10.1016/S1367-5931(03)00027-9.
Hatzimanikatis V, Li C, Ionita JA, Henry CS, Jankowski MD, Broadbelt LJ: Exploring the diversity of complex metabolic networks. Bioinformatics. 2005, 21 (8): 1603-1609. 10.1093/bioinformatics/bti213.
Rodrigo G, Carrera J, Prather KJ, Jaramillo A: DESHARKY: automatic design of metabolic pathways for optimal cell growth. Bioinformatics. 2008, 24 (21): 2554-2556. 10.1093/bioinformatics/btn471.
Heath AP, Bennett GN, Kavraki LE: Finding metabolic pathways using atom tracking. Bioinformatics. 2010, 26: 1548-1555. 10.1093/bioinformatics/btq223.
Pharkya P, Burgard AP, Maranas CD: OptStrain: a computational framework for redesign of microbial production systems. Genome Res. 2004, 14: 2367-2376. 10.1101/gr.2872004.
Benfenati E: Predicting toxicity through computers: a changing world. Chem Cent J. 2007, 1 (1): 1-7. 10.1186/1752-153X-1-1.
Mishra NK: Computational modeling of P450s for toxicity prediction. Expert Opin Drug Metab Toxicol. 2011, 7 (10): 1211-1231. 10.1517/17425255.2011.611501.
Eriksson L, Jaworska J, Worth A, Cronin M, McDowell RM, Gramatica P: Methods for reliability, uncertainty assessment, and applicability evaluations of regression based and classification QSARs. Environ Health Perspect. 2003, 111: 1361-1375. 10.1289/ehp.5758.
The Sarah Nexus. [http://www.lhasalimited.org/products/sarah-nexus.htm]
Vedani A, Smiesko M, Spreafico M, Peristera O, Dobler M: Virtual ToxLab–in silico prediction of the toxic (endocrine-disrupting) potential of drugs, chemicals and natural products: two years and 2,000 compounds of experience: aprogress report. ALTEX. 2009, 26 (3): 167-176.
The Toxicity Estimation Software Tool (TEST). [http://www.epa.gov/nrmrl/std/qsar/qsar.html]
Prival MJ: Evaluation of the TOPKAT system for predicting the carcinogenicity of chemicals. Environ Mol Mutagen. 2001, 37 (1): 55-69. 10.1002/1098-2280(2001)37:1<55::AID-EM1006>3.0.CO;2-5.
The Ecological Structure Activity Relationships. [http://www.epa.gov/oppt/newchems/tools/21ecosar.htm]
The Estimation Programme Interface (EPI) Suite. US EPA. [http://www.epa.gov/opptintr/exposure/pubs/episuite.htm]
Cassano A, Manganaro A, Martin T, Young D, Piclin N, Pintore M, Bigoni D, Benfenati E: CAESAR models for developmental toxicity. Chem Cent J. 2010, 4 (Suppl 1): S4-10.1186/1752-153X-4-S1-S4.
Mishra NK, Singla D, Agarwal S, Consortium OSDD, Raghava GPS: ToxiPred: a server for prediction of aqueous toxicity of small chemical molecules in T. Pyriformis J Transl Toxicol. 2014, 1: 21-27.
Metzker ML: Sequencing technologies–the next generation. Nat Rev Genet. 2010, 11: 31-46. 10.1038/nrg2626.
Warren RL, Sutton GG, Jones SJ, Holt RA: Assembling millions of short DNA sequences using SSAKE. Bioinformatics. 2007, 23: 500-550. 10.1093/bioinformatics/btl629.
Li R, Zhu H, Ruan J, Qian W, Fang X, Shi Z, Li Y, Li S, Shan G, Kristiansen K, Li S, Yang H, Wang J, Wang J: De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 2010, 20: 265-272. 10.1101/gr.097261.109.
Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I: ABySS: a parallel assembler for short read sequence data. Genome Res. 2009, 19: 1117-1123. 10.1101/gr.089532.108.
Zerbino DR, Birney E: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008, 18: 821-829. 10.1101/gr.074492.107.
Delcher AL, Harmon D, Kasif S, White O, Salzberg SL: Improved microbial gene identification with GLIMMER. Nucleic Acids Res. 1999, 27: 4636-4641. 10.1093/nar/27.23.4636.
Richardson EJ, Watson M: The automatic annotation of bacterial genomes. Brief Bioinform. 2013, 14 (1): 1-12. 10.1093/bib/bbs007.
Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M, Meyer F, Olsen GJ, Olson R, Osterman AL, Overbeek RA, McNeil LK, Paarmann D, Paczian T, Parrello B, Pusch GD, Reich C, Stevens R, Vassieva O, Vonstein V, Wilke A, Zagnitko O: The RAST Server: rapid annotations using subsystems technology. BMC Genomics. 2008, 9: 75-10.1186/1471-2164-9-75.
Van Domselaar GH, Stothard P, Shrivastava S, Cruz JA, Guo A, Dong X, Lu P, Szafron D, Greiner R, Wishart DS: BASys: a web server for automated bacterial genome annotation. Nucleic Acids Res. 2005, 33: W455-W459. 10.1093/nar/gki593.
Lee D, Seo H, Park C, Park K: WeGAS: a web-based microbial genome annotation system. Biosci Biotechnol Biochem. 2009, 73: 213-216. 10.1271/bbb.80567.
Vallenet D, Labarre L, Rouy Z, Barbe V, Bocs S, Cruveiller S, Lajus A, Pascal G, Scarpelli C, Médigue C: MaGe: a microbial genome annotation system supported by synteny results. Nucleic Acids Res. 2006, 34: 53-65. 10.1093/nar/gkj406.
Kumar K, Desai V, Cheng L, Khitrov M, Grover D, Satya RV, Yu C, Zavaljevski N, Reifman J: AGeS: a software system for microbial genome sequence annotation. PLoS One. 2011, 6: e17469-10.1371/journal.pone.0017469.
Stewart AC, Osborne B, Read TD: DIYA: a bacterial annotation pipeline for any genomics lab. Bioinformatics. 2009, 25: 962-963. 10.1093/bioinformatics/btp097.
Yu C, Zavaljevski N, Desai V, Johnson S, Stevens FJ, Reifman J: The development of PIPA: an integrated and automated pipeline for genome-wide protein function annotation. BMC Bioinformatics. 2008, 9: 52-10.1186/1471-2105-9-52.
Cruveiller S, Le Saux J, Vallenet D, Lajus A, Bocs S, Médigue C: MICheck: a web tool for fast checking of syntactic annotations of bacterial genomes. Nucleic Acids Res. 2005, 33: W471-W479. 10.1093/nar/gki498.
Lee SH, Jin HM, Lee HJ, Kim JM, Jeon CO: Complete genome sequence of the BTEX-degrading bacterium Pseudoxanthomonas spadix BD-a59. J Bacteriol. 2012, 194 (2): 544-10.1128/JB.06436-11.
Köhler KA, Rückert C, Schatschneider S, Vorhölter FJ, Szczepanowski R, Blank LM, Niehaus K, Goesmann A, Pühler A, Kalinowski J, Schmid A: Complete genome sequence of Pseudomonas sp. strain VLB120 a solvent tolerant, styrene degrading bacterium, isolated from forest soil. J Biotechnol. 2013, 168 (4): 729-730. 10.1016/j.jbiotec.2013.10.016.
Schneiker S, Santos VA M d, Bartels D, Bekel T, Brecht M, Buhrmester J, Chernikova TN, Denaro R, Ferrer M, Gertler C, Goesmann A, Golyshina OV, Kaminski F, Khachane AN, Lang S, Linke B, McHardy AC, Meyer F, Nechitaylo T, Pühler A, Regenhardt D, Rupp O, Sabirova JS, Selbitschka W, Yakimov MM, Timmis KN, Vorhölter FJ, Weidner S, Kaiser O, Golyshin PN: Genome sequence of the ubiquitous hydrocarbon-degrading marine bacterium Alcanivorax borkumensis. Nat Biotechnol. 2006, 24: 997-1004. 10.1038/nbt1232.
Vikram S, Kumar S, Vaidya B, Pinnaka AK, Raghava GPS: Draft genome sequence of the 2-chloro-4-nitrophenol-degrading bacterium Arthrobacter sp. strain SJCon. Genome Announc. 2013, 1 (2): e0005813-
Kumar S, Vikram S, Raghava GPS: Genome sequence of the nitroaromatic compound-degrading bacterium Burkholderia sp. strain SJ98. J Bacteriol. 2012, 194 (12): 3286-10.1128/JB.00497-12.
Vilchez‒Vargas R, Junca H, Pieper DH: Metabolic networks, microbial ecology and ‘omics’ technologies: towards understanding in situ biodegradation processes. Environ Microbiol. 2010, 12 (12): 3089-3104. 10.1111/j.1462-2920.2010.02340.x.
Romero-Silva MJ, Méndez V, Agulló L, Seeger M: Genomic and functional analyses of the gentisate and protocatechuate ring-cleavage pathways and related 3-hydroxybenzoate and 4-hydroxybenzoate peripheral pathways in Burkholderia xenovorans LB400. PLoS One. 2013, 8 (2): e56038-10.1371/journal.pone.0056038.
Pérez-Pantoja D, De la Iglesia R, Pieper DH, González B: Metabolic reconstruction of aromatic compounds degradation from the genome of the amazing pollutant-degrading bacterium Cupriavidus necator JMP134. FEMS Microbiol Rev. 2008, 32: 736-794. 10.1111/j.1574-6976.2008.00122.x.
Lykidis A, Pérez-Pantoja D, Ledger T, Mavromatis K, Anderson IJ, Ivanova NN, Hooper SD, Lapidus A, Lucas S, González B, Kyrpides NC: The complete multipartitegenome sequence of Cupriavidus necator JMP134, a versatile pollutant degrader. PLoS One. 2010, 5 (3): e9729-10.1371/journal.pone.0009729.
Jiménez JI, Miñambres B, Garcia JL, Díaz E: Genomic analysis of the aromatic catabolic pathways from Pseudomonas putida KT2440. Environ Microbiol. 2002, 4 (12): 824-841. 10.1046/j.1462-2920.2002.00370.x.
Kim SJ, Kweon O, Jones RC, Freeman JP, Edmondson RD, Cerniglia CE: Complete and integrated pyrene degradation pathway in Mycobacterium vanbaalenii PYR-1 based on systems biology. J Bacteriol. 2007, 189: 464-472. 10.1128/JB.01310-06.
This work was supported by a Grant from the Next-Generation Biogreen 21 Program (PJ00806302), Rural Development Administration, Republic of Korea.
The authors declare that they have no competing interests.
PKA collected all the relevant publications, arranged the general structure of the review, drafted the text and produced figures. HHB revised and formatted the review and also help to draft the manuscript. All authors read and approved the final manuscript.
About this article
Cite this article
Arora, P.K., Bae, H. Integration of bioinformatics to biodegradation. Biol Proced Online 16, 8 (2014). https://doi.org/10.1186/1480-9222-16-8