Skip to Content

Ebook Analysis Of Dynamic Protein Expression Data

While the focus of biochemical research was addressed on the genome in the last decade the view is now turned onto the proteome. Big data sets of gene expression obtained from DNA-microarrays made the development of statistical methods necessary to make correct inferences from these measurements. For quantitative protein expression analysis either mass spectrometry (cf. Aebersold and Goodlett ([1]) and Gygi et al. ([7])) or two-dimensional gel electrophoresis (2-DE) (cf. Westermeier et al. ([14])) is applied. In this paper we focus on the analysis of protein expression data obtained from a new detection method (Difference GelElectrophoresis,DIGE) based on fluorescence labelling before 2-DE. 2-DE separates the proteins of a mixture by their isoelectric point (pI) and molecular size to distinct spots. After separation the proteins are detected using a confocal fluorescence scanner whereas fluorescence intensity of a spot can be regarded as a measure of expression for its respective protein. DIGE enables the user to put up to three different mixtures of proteins on the same gel.

The different mixtures are labelled by different fluorescence dyes (Cy2, Cy3 and Cy5). For quantitative proteome analysis image analysis software automatically determines the boundaries and sizes of the spots. Usually, a DIGE experiment is designed such that m independent replications of treatment and control mixtures are put on the same m gels. The internal standard, a mixture of same amounts of all m treatment and m control probes, is also put on each gel. This internal standard allows high accuracy calibration of the expression values. Calibration and normalization of protein expression data is reviewed in section 2. In order to obtain information about interactions of treatment and control with the time, DIGE experiments often include measurements over several time points. Known statistical methods for the analysis of longitudinal data can be used to analyze those experiments. One possible method for such an analysis is detailed in section 3. Often, 2-DE data contains up to 50% of missing values.

The missing values occur because not each protein is visible on each gel when replicating probes on several gels. For example, on gel number one there are 1732 protein spots and 1967 spots are on gel number two, but only 1447 of these spots belong to proteins commonly represented on both gels. Some statistical methods, however, need complete data sets, for example, some methods for the detection of differentially expressed genes (cf. Gannoun et al. ([6])) or the corre pondence analysis for microarray data (cf. Fellenberg et al. ([5])). These methods could also be applied to protein expression data if the data sets were complete. One possible method to overcome this problem is to estimate the missing values by using the available measurements from other proteins. In section 4, we investigate how the k nearest neighbor method behaves when being applied to DIGE data. This method was also applied for the estimation of missing values in gene expression data by Troyanskaya et al. ([13]). The idea of this method is that there are groups of proteins with similar expression profiles. A missing value of a protein can then be estimated by available values from the proteins of the same group.

Download
PDF Ebook Analysis Of Dynamic Protein Expression Data