Microarray Gene Expression Analysis : Data Transformation and Multiple-Comparison Bootstrapping
A simple transform function is proposed to preprocess the intensity of gene expression, where the intensity can be that of a colored dye for cDNA microarrays or a gauge of probe matching for oligonucleotide arrays. A new measure of skewness is introduced to show that the transform function effectively reduces the asymmetry of intensity values for Affymetrix data of Golub et al. (1999). This transform approaches a logarithmic transform for large intensities, but approaches a linear transform for small intensities, so that the effect of spurious ratios of small intensities is avoided. When the intensity is the average difference (AD) score, the suggested transform function preserves the stochastic nature of AD values rather than resetting negative values to arbitrary positive values. A conservative estimator of the fold-change based on this transform is proposed. After the B-cell ALL and the AML data of Golub et al. (1999) was transformed, a nonparametric bootstrapping method found that the number of genes considered differentially expressed is 172 when controlling the family-wise error rate at the 5% level and 709 when controlling the false-discovery rate at the 1% level