|
|
||||||||
SPECIAL ARTICLE |
1 From the Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA (JJM and JRY), and the National Institute of Biological Sciences, Beijing, China (M-QD)
2 Supported by grant no. DK074798 from the National Institutes of Health.
3 Reprints not available. Address correspondence to JR Yates III, Department of Chemical Physiology, 10550 North Torrey Pines Road, SR11, The Scripps Research Institute, La Jolla, CA 92037. E-mail: jyates{at}scripps.edu.
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
Nutrigenomics, the study of the influences of nutrition or dietary components on the transcriptome, proteome, and metabolome of cells, tissues, or organisms at a given time, attempts to obtain this type of information (1). Genomic techniques to analyze transcript concentrations have been useful, but the transcript concentration is not always correlated with the protein concentration, and posttranslational modifications that alter protein activity cannot be detected. Nutritional proteomics applies proteomic methods to understand the effects of diet on proteins. Diet can alter the abundance or degree of posttranslational modification of a protein; quantitative proteomic methods are able to measure these changes (2). Although proteomic methods include any large-scale or high-throughput protein analysis, this review focuses only on quantitative proteomics using mass spectrometry (MS). Initially, researchers using MS-based methods were limited to generating lists of protein parts for various biological samples. Recent developments in MS and bioinformatics enable the quantification of thousands of proteins from complex samples. These advances are of great value to nutrition research because of the inherent complexity of typical samples collected in nutritional studies, including blood, cells, and tissues. With the use of quantitative MS, the status of multiple sets of proteins can now be monitored. Quantitative proteomic data also provide a more complete understanding of the effects of nutrition on cells, tissues and organisms.
The desire to obtain quantitative data has inspired the development of multiple proteomic techniques. Ideally, these techniques would be able to measure the absolute concentrations of each protein in a sample. Although determining the absolute concentration of a protein in a sample is possible, absolute quantitation requires that a known amount of labeled standard be spiked into the test sample. Because it is practical to synthetically generate such standards for only a small number of peptides, absolute quantitation is not possible for the large numbers of proteins typically measured in proteomic experiments. Therefore, large-scale proteomic experiments compare relative concentrations of proteins or protein modifications between
2 experimental conditions. Before going into details, we list in Table 1
several commonly used methods in quantitative proteomics and provide comments about their advantages and limitations. In brief, metabolic labeling with stable isotopes is the method of choice when possible, whereas chemical labeling may be more practical for clinical samples. Label-free methods are simple but sensitive enough only for large changes.
|
| SAMPLE PREPARATION |
|---|
|
|
|---|
After the sample has been prepared, there are 2 options for separating the individual components for analysis: gel-based 2-dimensional (2-D) electrophoresis, which separates proteins, or liquid chromatography (LC)–based multidimensional protein identification technology (MudPIT), which digests proteins into peptides before separation. One major problem for both 2-D gel and LC-based methods is the enormous dynamic range displayed by the proteome. The proteome has a range of 10 orders of magnitude, whereas these techniques have ranges of 102 to 104 (13). Separation methods to reduce the range and increase the depth of analysis are being developed (14, 15).
Gel-based label-free quantitative proteomics
Gel-based label-free quantitative methods were once considered the workhorse of proteomics (3). In these methods, the samples to be compared are run on separate 2-D gels, which separate proteins according to mass and isoelectric point. The gels are stained with Coomassie blue or silver stain, and computer programs are used to compare spot intensities; those that differ between treatments are selected for identification. After in-gel digestion, the spot is cut from the gel, and the peptides are purified and analyzed by using MS. In this method, relative protein concentrations are determined by visualization within the gel, and the mass spectrometer is used only for protein identification.
These techniques were applied to nutritional research in a study to determine the effect of soy isoflavones on proteomic biomarkers from blood mononuclear cells isolated from postmenopausal women (16). Blood mononuclear cells were isolated from subjects after 8 wk of a diet supplemented with either a placebo or soy isoflavones. The cells were lysed, and the soluble protein fraction was isolated. Using 2-D gel separation, the investigators were able to resolve >700 different protein spots, 41 of which showed significant changes between the samples, and 29 of which were identified. The findings suggested that soy isoflavones may prevent atherosclerosis by increasing the antiinflammatory response of blood mononuclear cells.
Chemicals used for labeling samples are expensive, and thus the benefit of using label-free quantitative proteomic methods is the relatively low cost per experiment. However, the financial benefit may be outweighed by several other factors. 2-D gel separation of label-free samples is a labor-intensive process, because as many as 5 gels/sample may be required to determine statistically significant differences between treatments (3). Moreover, for extremely complex samples, co-migration can compromise the accuracy of protein quantification (4). Most significantly, the many steps of sample preparation, even when control and experimental conditions are handled in parallel, allow the introduction of errors into the experiment. Slight differences in running conditions for 2-D gels can complicate the alignment of protein spots, even with the aid of computer software. This possibility for variation places statistical requirements on results such that only large differences between samples can be considered biologically relevant. Many of these concerns are addressed by using labeled internal standards, which allow relative quantitation between experiments. This approach enables the researcher to run biological rather than technical replicates. The following 3 sections discuss the use of labeled standards, which may be combined with either gel-based or LC-based separation methods.
Internal standards
Quantitative proteomic methods using labeled internal standards attempt to generate standard proteins that are biologically and chromatographically indistinguishable from sample proteins but are easily distinguished by the researcher. Generating the internal standard is not a simple matter, because thousands of proteins or tens of thousands of peptides need to be differentially labeled.
Two strategies for generating internal standards are chemical labeling and metabolic labeling. Chemical labeling is done after the proteins have been isolated from a sample. Metabolic labeling occurs when organisms or cells incorporate a stable isotope that was added to their food or culture medium. Thus, the added steps of chemical labeling after the isolation of protein from a sample are avoided.
Gel-based quantitative proteomics using labeled internal standards
Difference gel electrophoresis uses chemical labeling of each sample with a different fluorescent dye (17). Once the samples are labeled and mixed, they are separated on the same 2-D gel. Individual spots usually represent a single protein, although each spot may have different amounts of the fluorescent dyes. By comparing the intensity of the fluorescence characteristic of each dye, relative concentrations of protein can be determined. Differentially expressed proteins are selected for identification, enzymatically digested, removed from the gel, and analyzed by MS. The resulting spectra are used to identify the protein. The use of fluorescent dyes increases the dynamic range of quantification but may interfere with protein identification (3).
Liquid chromatography–based quantitative proteomics using isotopic labeling
Unlike gel-based methods that quantify the protein concentration before identification in the mass spectrometer, LC-based methods use the mass spectrometer for both quantification and identification. Current mass spectrometers have sufficient resolution to differentiate isotopic peaks of peptides. This allows nonradioactive heavy isotopes, incorporated into proteins, to serve as internal standards for MS–based quantitative proteomics. Proteins contain atoms of nitrogen, carbon, oxygen, hydrogen, and sulfur, and, theoretically, heavy isotopes of any these could be used for metabolic labeling. However, deuterium can alter chromatography, sulfur is the least abundant, and oxygen is expensive. Therefore, 13C and 15N are the practical choices for metabolic labeling, with 15N being least expensive and most commonly used (18).
The principle behind metabolic labeling of biological samples using stable isotopes is that heavy atoms and light atoms are chemically identical and that they appear identical to the organism yet are distinguishable by researchers. In their classic study, Meselson and Stahl (19) found that metabolically labeling Escherichia coli with a stable isotope can provide biological insights. E. coli were grown in media containing 15N-labeled ammonium chloride for many generations to thoroughly label their DNA with heavy nitrogen. These heavy-labeled bacteria were then fed unlabeled (14N) ammonium chloride for 2 generations. Using density-gradient centrifugation, the DNA from this second generation was found to consist of either fully unlabeled DNA or half-labeled DNA. Thus, Meselson and Stahl showed the semiconservative nature of DNA replication.
A less glamorous role for the bacteria is found in experiments by Dong et al (20), in which 15N-labeled bacteria were used as food to metabolically label the nematode Caenorhabditis elegans. After many generations of eating only 15N-labeled bacteria, all of the proteins in the worms had incorporated 15N. These heavy-labeled proteins are biologically, chemically, and chromatographically indistinguishable from their 14N counterparts, but they can be distinguished by a mass spectrometer. The heavy-labeled proteins serve as an internal standard in comparisons of unlabeled protein samples from worms grown under various conditions. Thus, the relative protein abundance can be compared between samples. Using stable isotopes and MudPIT (as discussed below), Dong et al determined the relative abundance of 1685 soluble proteins and found 86 to be differentially expressed in worms with an insulin-signaling defect. Further analysis proved the validity of these identifications and showed a novel regulatory mechanism of insulin signaling.
Incorporation of 15N has been applied to many model organisms, including yeast (5), Arabidopsis (6), Drosophila (5), and rat (4, 21). Incorporation of 15N is not well suited for clinical samples because animals typically need to be raised on diets with only 15N to ensure complete labeling of their proteins. Obviously, this is not an option for human samples. However, it is conceivable that 15N-labeled mouse or rat tissue may be used as internal standards for clinical samples because of their conserved regions of proteins (22). In addition, data-processing algorithms may remove the need for complete labeling, thereby enabling metabolic labeling of other organisms (23). An extra benefit of 15N metabolic labeling is that these samples are excellent internal standards not only for quantitative proteomics, but also for quantitation of any nitrogen-containing metabolites, including amino acids and nucleic acids.
Although the incorporation of 15N into proteins is a possibility for cell culture by supplying 15N-labeled forms of all 20 amino acids (24), stable isotope labeling with amino acids in cell culture (SILAC) is the de facto method of choice. Rather than beginning with inorganic forms of heavy isotopes, as are used with bacteria or plants, commercially available forms of heavy-labeled amino acids are supplied to cells in culture (25). By using multiple forms of heavy arginine and lysine, proteins from up to 5 different biological states can be compared simultaneously. Although SILAC cannot be used to label tissues, it can be used to label differentiated cell lines. The labeled proteins from these cells are termed culture-derived isotope tags; they can be used as internal standards for comparing tissue samples. Even though the protein components of the culture-derived isotope tags will not completely overlap with the tissue proteome, hundreds of proteins can still be quantified (26). Therefore, with the use of culture-derived isotope tags, SILAC may be applicable to analysis of clinical samples.
By combining samples much earlier in the experiment, metabolic labeling techniques are able to minimize the variability introduced by sample processing, and thus they are generally preferable to chemical labeling techniques. However, chemical labeling is currently the best option for generating internal standards for analyzing clinical samples. There are many techniques for chemical labeling (27). Reagents are commercially available for cleavable isotope–coded affinity tag (cICAT) labeling and isobaric labeling, and these methods are more commonly used.
cICAT analysis relies on chemical labeling of cysteine residues with a 4-part tag containing a cysteine-reactive iodocetamide, a biotin affinity tag, an isotopic linker, and an acid cleavage site (7). Samples and controls are labeled with tags of identical chemical composition but differing isotopic composition. An equal ratio of sample and the control proteins is then mixed and enzymatically digested. The biotin tags are used for affinity purification. After cleavage of the tags, the peptides are separated by LC. One advantage afforded by the cICAT system is the selection of cysteine-containing peptides by affinity purification, which reduces sample complexity. However, this often results in the identification and quantitation of proteins on the basis of a single peptide, and both processes are prone to errors in the analysis of single peptides.
Chemical labeling techniques using isobaric tags—eg, isobaric tags for relative and absolute quantitation (iTRAQ) and tandem mass tags (TMTs)—allow the combination of multiple samples without an increase in complexity. The 3-part isobaric tags have a reactive group to covalently link to primary amines, a reporter ion used for quantification, and a mass normalization group to compensate for the mass differences in the reporter ion. iTRAQ can use up to 8 different isotopic tags (8), whereas TMT has 6 tags available (9). Although isobaric labeling reagents are expensive and cannot be used with ammonium-based buffers (28), they are well suited for clinical studies and systems in which metabolic labeling is not feasible. However, chemical labeling must be completed before samples can be combined (Figure 1
). These steps provide opportunities to introduce error (4), potentially compromising a key advantage of using internal standards.
|
Metabolic labeling with SILAC was applied by Kruger et al (30) to quantify changes in tyrosine-phosphorylated proteins in insulin-treated adipocytes. Cells were grown in either unlabeled media or media containing heavy-labeled arginine and lysine. After treatment with insulin and cell lysis, antiphosphotyrosine antibodies were used to enrich for tyrosine-phosphorylated proteins. This dramatically reduced sample complexity and allowed proteins to be separated by one-dimensional gel electrophoresis. Proteins extracted from the gel were digested, and the resulting peptides were analyzed by LC-MS. Kruger et al identified 40 proteins involved in insulin signaling and were able to identify 16 sites of tyrosine phosphorylation.
Although chemical labeling methods can be used for analyzing some protein modifications, metabolic labeling is more widely applicable because the modification may interfere with the site of chemical labeling. For example, iTRAQ labeling, which requires an intact lysine, can be used to study phosphorylation, which occurs on serine, tyrosine, and threonine (31). However, iTRAQ cannot be used for analyzing ubiquitination or other modifications that occur on lysine.
Researchers should also be aware that iodoacetamide, an alkylating reagent commonly used during sample preparation, can cause artifacts that appear as sites of ubiquitination (32). Therefore, chloroacetamide should be used in analyses of ubiquitinated proteins (32).
| DATA ACQUISITION |
|---|
|
|
|---|
Data acquisition for multidimensional protein identification technology
The MudPIT strategy takes the "shotgun" approach, using proteolytic enzymes to digest proteins into peptides before separation (33, 34). Trypsin, the most widely used proteolytic enzyme, cuts specifically after arginine and lysine, producing peptides of various sizes; typically, peptides of 7–25 amino acids are analyzed. Converting each protein into many peptides increases the number of objects that need to be separated. Although having more objects complicates the separation, similarly sized peptides behave more uniformly during chromatographic separation than do their parent proteins. Because each protein has a unique chromatographic profile, a given purification condition will likely exclude some proteins. Digesting a protein into multiple peptides, each with chromatographic properties different from their parent protein, increases the likelihood that at least some peptides from each protein will be identified. For example, a difficult-to-handle hydrophobic membrane protein may be identified through analysis of peptides derived from an intracellular domain.
After the peptides have been generated, they are bound to the strong cation exchange resin portion of a biphasic column. Typically, the peptides are eluted in multiple steps. During the first part of each step, a short salt gradient will cause a subset of peptides to elute from the strong cation exchange resin and onto the reverse-phase section of the column. During the second part of the step, the peptides are separated, and they enter the mass spectrometer. Often a series of 10–12 steps of increasing salt concentration are used to elute different sets of peptides from the cation exchange resin. A more detailed review of MudPIT appears in the article by Fournier et al (35). Typically, tandem MS (MS/MS) is performed during LC-based methods. In MS/MS, a survey scan (MS1) detects peptide ions as they elute from the reverse-phase column. This scan contains the m/z and the relative intensity of all of the ions eluting at that moment. On the basis of the data in the MS1 scan, the mass spectrometer selects individual m/z values for further analysis. Generally, the most abundant ions are chosen, provided they were not chosen from a recent MS1 scan. Peptide ions selected from an MS1 scan are fragmented in a collision cell with helium. The relative abundance and the molecular masses of the fragments are measured in the second scan (MS2). The fragmentation pattern measured in the MS2 scan can be used to determine the amino acid sequence or site of PTM on the peptide.
Mass spectrometry requirements
Mass accuracy, resolution, mass range, and scan speed are factors to consider in choosing an instrument for quantitation. To quantify proteins in a complex mixture by using a shotgun approach, it is important to choose a fast-scanning mass spectrometer to identify as many peptides as possible for quantitation. In the meantime, sufficient mass accuracy and resolution are required for reliable identification and quantitation. Therefore, it is important to balance the need for speed with the need for accuracy and resolution. Hybrid instruments combining a fast-scanning, low-resolution ion trap and a slow-scanning, high-resolution mass analyzer, such as the LTQ-FT or the LTQ-Orbitrap (Thermo Electron, San Jose, CA), are suited for this purpose. These instruments can simultaneously perform a high-resolution, high-mass accuracy MS1 scan and several low-resolution MS2 scans. As such, a large number of MS2 scans are collected to maximize the number of peptides identified, whereas the high quality of MS1 spectra supports quantitation. Q-TOF instruments are also used for quantitative applications, eg, with SILAC (36). Q-TOF instruments are also well-suited for measuring the iTRAQ reporter ions of m/z 114–117. These small ions cannot be trapped, and hence they cannot be detected by an ion trap under normal conditions. However, a technique has been developed by which ion-trap instruments can detect these ions (37).
Quantitative proteomics with isotopic labels
The information available in each scan is dependent on the type of isotopic labeling. For 15N metabolic labeling and cICAT, heavy and light peptides co-elute and thus are detected in the same MS1 scan in which their relative abundances are determined (Figure 2
). The peptide fragmentation pattern generated in the MS2 scan is used for peptide identification (see Data processing, below).
|
Label-free liquid chromatography quantitative proteomics
Label-free quantitative methods for LC-based proteomics are well suited for high-throughput applications. When peptide ions are continuously eluted from the column and introduced to the mass spectrometer, total ion chromatograms containing relative abundance and retention time data are generated; these chromatograms can be aligned and compared between runs to detect differences in ion intensities. America et al (39) used label-free LC-based methods to measure differentially abundant proteins in ripe red and unripe green tomatoes. Analysis of total ion chromatograms of MS1 scans showed peaks corresponding to peptides that were differentially abundant in the samples. These peaks were selected for MS2 scans during a second MS analysis to determine the peptide sequence. They were able to generate chromatograms for
7000 peptides per run, and they found that the concentration of 3230 peptides was altered by at least a factor of 2 between ripe and unripe tomatoes.
A second label-free LC-based method is spectral counting, which uses MS2 data to determine relative protein abundance. Spectral counting records the number of times a peptide was identified from an MS2 scan. Because the mass spectrometer selects the most abundant peptides from an MS1 scan for a subsequent MS2 scan, the number of MS2 scans is correlated to protein abundance (11).
These inexpensive methods are accurate for large changes in abundance but are not as well suited for small changes. Higher-precision data can be obtained by using differentially labeled samples.
| DATA PROCESSING |
|---|
|
|
|---|
There are also multiple algorithms for quantifying protein abundance (40). Database search programs usually report spectral counts for each protein or peptide identified. Label-free quantitation of ion chromatograms is possible with the use of a variety of software, including CENSUS (Internet: http://fields.scripps.edu/census) (44) and METALIGN (Internet: http://www.metalign.nl) (39, 40).
Heavy and light peptides labeled by SILAC, cICAT, and 15N appear as peaks of differing m/z values in the MS1 scan. For SILAC, the difference in molecular mass between the peaks depends on the number of arginines and lysines in the peptide; for 15N, the difference depends on the number of nitrogens, whereas, for cICAT, the difference between heavy and light tags is 9 atomic mass units (the mass difference between the tags). Quantitation of labeled peptides requires that both heavy and light forms can be seen in the MS1 spectra. Once either heavy or light peptide is identified, the m/z value of the sister peptide can be calculated and searched for in the MS1 spectra. After both heavy and light peptides are identified, an ion chromatogram derived from surrounding MS1 scans is generated, and the area under the curve is calculated. For iTRAQ and TMT experiments, the intensity of the reporter ions in the MS2 scan is compared.
| VALIDATION |
|---|
|
|
|---|
For MudPIT analysis, the most abundant ions detected in the MS1 scan are selected for analysis in the MS2 scan, and therefore abundant proteins are more reproducibly detected. Abundant proteins will likely be detected in most technical replicates, whereas less abundant proteins may be detected in <20% of replicate runs (11). This means that running technical replicates can verify results for abundant proteins but is unlikely to provide confirmation of results for less abundant proteins, as was observed in an analysis of human saliva. The abundance of the 20 most abundant proteins varied by only 10% between replicate runs, whereas less abundant proteins were sampled sporadically and had greater variation (45). It is interesting that the study by Millea et al also found significant variation between persons, which highlights the difficulty of interpreting highly specific and sensitive MS data, given the wide ranges observed in the general population (45, 46).
Selected reaction monitoring and multiple reaction monitoring are MS-based approaches that focus on specific predefined peptides for analysis and that therefore can be used to verify results of less abundant proteins (47). Complex samples are reanalyzed, but only select ions, determined from previous experiments, are selected for MS2 analysis. When all other ions are ignored, more data are gathered for the ion of interest, and the data are held in higher confidence.
Given the diverse array of test conditions, large numbers of proteins analyzed, and biological variability, strict guidelines for selecting proteins for follow-up analyses are not easily determined. Often this decision is based on the difficulty of follow-up experiments, the gene ontology information, or the quality of data—and not on an expected difference (46).
Alternatively or in addition, protein abundance differences can be verified by the performance of biological replicates, in which new samples are analyzed. Biological replicates control for variability in the biological system being tested and not for variability in sample processing. Ideally, biological and technical replicates would be analyzed; however, limited resources often may result in compromise.
Traditional biochemical techniques, such as Western blots, can verify protein changes if antibodies are available. And, although protein and transcript concentrations may not always agree, similar changes observed in mRNA concentrations may provide encouragement for further analysis.
Determination of the biological significance of protein abundance changes for a large number of targets is a daunting task. RNA interference technology, which is amenable to high-throughput applications, may be the frontline tool for validation. Gene ontology analysis of target proteins can identify systems and pathways modulated during the experiment, which may provide hypotheses for future experiments.
| SUMMARY |
|---|
|
|
|---|
| ACKNOWLEDGMENTS |
|---|
The authors' responsibilities were as follows—JJM and MQD: wrote the draft of the manuscript; and JRY: assisted in the revision of the manuscript. None of the authors had a personal or financial conflict of interest.
| REFERENCES |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |