Metabolomics/Databases

< Metabolomics

Back to Previous Chapter: Computational Modeling of Metabolic Control
Next chapter: Applications


Overview

The vast amount of metabolomic information harvested using high-throughput techniques has necessitated an effective means of storage to organize, disseminate, and facilitate analysis and annotation. This need has driven the development of databases as a repository of metabolomic data being produced. Data housed in these databases covers the wide-spectrum of research being done in the metabolomic world from NMR spectra to metabolic pathway substrates and products.

Metabolomic database serve a primary purpose or organizing information on the large catalog of metabolites that are encountered in metabolism pathways. There are many different databases that exist on the World Wide Web and house a wide variety of information covering a large variety of organisms.

Example Databases

Biological Magnetic Resonance Data Bank

The Biological Magnetic Resonance Data Bank (BMRB) focuses on quantitative data generated by spectroscopic investigations of biological macromolecules. It has links to search engines such as PubChem, that connect to recent articles and new data. It also links to projects and other databases that are all related to Metabolomics and Metabonomics. This database focuses on the NMR research aspect of metabolites discovery and what roles they play in metabolism. BMRB offers a large list of different known compounds and the information associated to it.

Terms:

Relevance: This information relates to what we have studied in class because we have been studying metabolism and the metabolites involved. This resource is simply a collection of all the accountable knowledge that exists. The field of metabolomics is growing and with the help of NMR spectroscopy more compounds and metabolites will be discovered along with their functionality. The information studied in class forms the foundation for this knowledge

Metabolomics: Resources, Reagents, and Kits for Metabolomic Analysis

The Sigma-Alderich database provides access to a number of metabolomic kits and reagents, as well as a number of resources, including information on cell signaling pathways, enzyme structures/functions/specificities, animations, charts, and an online library. This site also provides links to other resources.

Terms:

Relevance: This website shows cell signaling and other metabolic pathways (including glycolysis) in an animated, in-depth way. This site also provides a search feature to find pathways related to molecules of your choosing.

Madison Metabolomics Consortium Database

The Madison Metabolomics Consortium Database contains metabolites determined through NMR and MS. It contains information with the main focus on Arabidopsis thaliana, but also refers to many different species. The database also contains information on the presence of metabolites under several different physiological conditions, their structures in 2D and 3D, and links to related resource sources and other databases.

Terms:

Relevance: How does this information relate to the information that has been studied in this course to date? Using this website, it is possible to enter a molecule of interest into the search engine and obtain links that will lead to a list of pathways in which that molecule participates. Doing this for glucose, two pathways with which were covered in class: starch degradation (aka glycolysis) and glycogen degradation were displayed.

MetaCyc

The main focus of the MetaCyc Database is to collect and display information on experimentally studied pathways from a variety of organisms. Pathways are divided into five categories: biosynthesis, degradation/utilization/assimilation, detoxification, generation of precursor metabolites and energy, and Super-Pathways. Clicking on any of these will open, in outline format, more specific categories. This eventually leads to individual Metabolomes that are described graphically. There is also descriptions with details about their history and connected pathways. The database can also be browsed by compounds and reactions, though these sections tend to be less detailed.

MetaCyc allows anyone to submit newly identified pathways, but they unsurprisingly demand detailed, experimentally proven data which is closely examined before any additions are curated.

Terms:

Relevance: MetaCyc is closely related to the material that we have been learning about in class because it is a comprehensive database that covers many of the same pathways, such as glycolysis I (http://biocyc.org/META/NEW-IMAGE?type=PATHWAY&object=GLYCOLYSIS)

The Scripps Center for Mass Spectrometry: Metabolomics Science Webpage

The main focus of the Scripps Center for Mass Spectrometry is to provide a user-friendly websites for scientist in the field of Metabolomics. They provide general information on analytical tools, timelines of Metabolomics history, Metabolomic events held around the world, databases of metabolic systems, as well as bioinformatics software.

Terms:

Relevance: This Website relates to the information that we have been studying in class because it is it full of information about pathways and numerous databases. One such database is the KEGG Pathway Database, which contains all the pathways that are involved in metabolism. It shows such pathways as Glycolysis, Gluconeogenesis, Citrate cycle, pentose phosphate pathway, glactose metabolism, pyruvate metabolism, and hundreds more. Click here to check out they glycolysis pathway -> http://www.genome.jp/kegg/pathway/map/map00010.html This website does a good job of showing how all the pathways are interconnected into one another.

The Human Metabolome Database

The Human Metabolome Database is an extremely comprehensive, free electronic database that gives a detailed overview of human metabolites divided into chemical, clinical, and molecular biology/biochemistry data.

Terms:

Relevance: The Human Metabolome Database is connected to our coursework by its extremely thorough amount of data on all of the metabolites that we've been studying. Reaction intermediates and products such as glucose, 3-phosphoglycerate, and citrate can all be looked up and everything from the 3d structure to associated disorders are provided.

KNApSAcK

KNApSAcK is a Java application that presents an interactive display of biochemical information that can be searched by organism or metabolite name. KNApSAcK focuses primarily on the origin and mass spectra of particular metabolites.

Terms:

Relevance: KNApsAcK connects to our coursework because it allows for comparison of metabolites important to different organisms. One example search that was attempted was to see the metabolites shared by cyanobacteria and plants for photosynthesis.

BRENDA



The BRENDA developers boast that it is the main internet repository of functional enzyme data of the scientific community. An extremely robust system, it allows for searching of more than 4000 enzymes and provides comprehensive information on each of them, including indispensable reaction diagrams.

Terms:

Relevance: Information on this database is reinforces what was covered in class. Material covered in class also is the foundation for the material on this database

Reactome

The Reactome is a collaboration between Cold Spring Harbor Laboratory, The European Bioinformatics Institute, and the Gene Oncology Consortium to provide a curated database that catalogs core pathways and reactions in human biology. The Reactome obtains information from researchers with expertise in their fields and is cross-validated by an Reactome editorial team which references other databases such as the NCBI, Ensembl, and UniProt. Alongside the human pathways and reactions the Reactome also contains inferred data from 22 non-human species including mouse, rat, chicken, puffer fish, worm, fly, yeast, two plants and E.coli.

Current versions of the Reactome allow for searching by keyword but also allow a more visual approach by allowing researchers to view a map of much of the data being housed in the database and allowing reactions to be selected and zoomed in on from the top level.

Terms

Relevance: Much of the data housed in the Reactome database covers many of the pathways and reactions we have covered in the course such as the intermediary metabolism and regulator pathways. Like many of the other metabolomics database it can be thought of almost like a textbook containing thousands of entries on metabolism and its associated events.

KEGG Pathway DB

The KEGG Pathway Database is a large part of a collection of smaller databases which comprise the Kyoto Encyclopedia of Genes and Genomes. The Pathway database is known for its extensive collection of metabolic pathways and its handling of their interconnections, as well as other non metabolic cellular interactions. The database does an excellent job of integrating genomic, chemical and systemic functional information into an easily readable format.

Instead of new terms, enjoy this list of subsections of the database.

BMRB, MMCD and the Sesame laboratory module


Databases have been recently developed as metabolomics resources. Some of the databases that have been designed as metabolomics resources are intended to assist in MS and NMR analyses of relevant research. Among these particular databases are the BioMagResBank (BMRB), Madison Metabolomics Consortium Database (MMCD) and a module for the Sesame laboratory information management system.

The BMRB comprises of experimental spectral data for over 270 pure compounds. Each molecule entry includes five or six one- and two-dimensional NMR data sets, as well as compound source information, solution conditions, data collection protocol and the NMR pulse sequences. Database entries can be accessed by name, monoisotopic mass and chemical shift. Currently in development is an open access feature to this database that will allow users to contribute their own data, and substantiate the BMRB.

The MMCD consists of information on over 10,000 metabolites that primarily consists of data collected from Arabidopsis metabolites. Users may make queries comprising of MS and/or NMR spectra.

The Sesame laboratory module collects all metabolomics based experimental protocols, background information, and data for a particular study.

Link to article:

http://psb.stanford.edu/psb-online/proceedings/psb07/markley.pdf

References

CellCircuits: a database of protein network models

General Overview: This article provides a rationale for the development of CellCircuits, an open- access database that focuses on molecular network models. The database covers models that have been derived computationally and posted in published journal articles. The article explains that the ultimate goal of the project is to bridge the gap between molecular databases, even those with unconfirmed data, and strictly regulated pathway databases. The body of the article explores not only the rationale of CellCircuits, but the computational process that went into developing it and some example results of molecular networks models.

Terms:

Relevance: This article relates to our coursework because it shows some of the dizzying heights of complexity involved in trying to collate the growing body of metabolomics data into a usable form for the general science community.

ProMEX: a mass spectral reference database for proteins and protein phosphorylation sites

General Overview – This article explains the development and use of the ProMEX mass spectral library database. The goal of the expanding database is to allow users to compare an unknown sample to a body of confirmed mass spectra for known proteins. The article explores some of the theory and algorithms that go into making that possible.

Terms

Threshold: In comparing the mass spectra of an unknown, user provided sample against those in the database, the threshold value is point at which mass spectra hits are ignored because they are not considered matching.

Relevance – ProMEX is a relevant resource to our coursework because it shows how quickly the field of metabolomics is advancing. Using the search algorithms described in this article, users can now identify unknown proteins from experimental data in a quick and highly automated process.

Correcting ligands, metabolites, and pathways

General Overview – The authors of this article explain that the goal of their database, Biometa, is to provide an example of the need to correct inaccurate pathways and chemical structures. After originally developing this database, they came up with tools to validate the data it contained by stereochemistry and stoichiometric outcomes only to find that they had a high error rate. The article explains the creation of the database and validation tools and the steps they took to make corrections.

Terms

Relevance – This article is relevant to our coursework because it explains the logical eventuality that with the incredibly vast amount of metabolomics data and speed at which it is growing, errors are inevitable. The authors offer some insight into how this problem can be corrected and the necessity of doing so. The compound query window from the BioMeta database.

HMDB: The Human Metabolome Database

The Human Metabolome Database (HMDB) was established in 2004 with the explicit aim to catalog the whole metabolome in humans just as the Human Genome project unraveled the mysteries behind our genetic code. This paper covers the information contained in the database, which includes compound description, synonyms, physo-chemical structure, disease association, pathways information, and NMR Spectra and MS spectra among other things; each entry in the database contains 90 entries filled with relevant information. The paper also serves as a design documentation for the database, detailing how it was built with care to allow for efficient searching as well as explaining the quality control and curation of the database.

The HMDB is built upon a MySQL database that serves as the backend to to the graphical web-page interface. Raw text found in the database is translated to HTML via special Perl scripts that also generate links and graphics. The MySQL database is part of a generalized metabolomic LIMS system called MetaboLIMS that utilizes Java to handle input and queries.

The robustness of the database allows researchers to search from many different angles including by chemical structure, BLAST, single and multiple sequences, MS and NMR spectra, and boolean text searches via GLIMPSE.

Terms

Relevance: The information housed in the HMDB can be traced through all of the coursework that has been covered so far. Many of the metabolites housed in the HMDB were directly discussed in both the textbook and the lectures covered in class. Of course this is a surface connection between the information the text and this paper as the HMDB and other metabolic databases really encompass the majority of the metabolism world as they serve as as a repository under which all past and future research data can be stored.

Toward Pathway Engineering: A New Database of Genetic and Molecular Pathways

The KEGG database was created with the sole purpose of providing a diagram of molecular and genetic interactions to aid in the understanding of biological systems. Its creation was fueled in part by the completion of the Human Genome Project as a way to take this massive amount of information and place it in the proper locations in a system. KEGG is connected to DNA and Protein databases by integration with the tool DBGET, which acts to search across databases.

Terms

Relevance: The KEGG database is just another entry in the long line of databases which sum up much of the metabolic pathway information we have learned in class.

Articles and Web Pages for Review and Inclusion

Nutritional Metabolomics Database

A Liquid Chromatography-Mass Spectrometry-Based Metabolome Database for Tomato

Plant Physiology 141:1205-1218 (2006)

This article is issued from Wikibooks. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.