Robust Cluster Analysis of Microarray Gene Expression Data with the Number of Clusters Determined Biologically
The success of each method of cluster analysis depends on how well its underlying model describes the patterns of expression. Outlier-resistant and distribution-insensitive clustering of genes are robust against violations of model assumptions.A measure of dissimilarity that combines advantages of the Euclidean distance and the correlation coefficient is introduced. The measure can be made robust using a rank order correlation coefficient. A robust graphical method of summarizing the results of cluster analysis and a biological method of determining the number of clusters are also presented. These methods are applied to the data of DeRisi et al. (1997), showing that rank-based methods perform better than log-based methods.Software is available from .orwill have updates and related articles