Abstract: Sarcomas in children, adolescents, and young adults account for about 10 percent of cancer in this age range, with survivals ranging from about 95 percent in favorable rhabdomyosarcoma to nearly 10 percent in most patients with metastatic disease. Currently, there is no known biologic reason for these vastly different behaviors. For the majority of these sarcomas, we lack reliable methods to predictively segregate histologically similar tumors with very different outcomes, and we do not know the molecular basis for their cellular or disease phenotypes, including different drug responsiveness. To address these problems, we propose a scalable, high throughput functional genomics approach centered on generating and analyzing large scale gene expression profiles. A primary goal is to obtain the most comprehensive gene expression state measurements possible for approximately 500 tumors per year. Samples will be from the three pediatric cooperative groups that, in aggregate, account for nearly 95 percent of children in North America with cancer. To maximize gene representation, productivity, and economics we will use a mix of two different kinds of array measurements already established in our labs. Selected results from array measurements will be subjected to confirmatory experimental analyses (Northerns, quantitative PCR, tissue In Situ hybridization, immunohistochemistry, etc.). We believe, however, that the greatest challenge in this work is in data management and analysis. To meet this challenge all data enter an integrated object database that is web accessible object database. To meet this challenge, arrays are made and expression data is acquired, and stored in web accessible object database. It is linked to MIMIR, an evolving suite of both novel and standard clustering algorithms and statistical methods that will be used to analyze expression data and other types of pertinent data. Gene expression “signatures” derived from initial clustering analyses will then be mined correlations with clinical data and those correlations evaluated for significance. Within this project we are also developing ways to measure the robustness of gene expression clusters, the strength of membership of a gene in one or more clusters, and the relatedness of clusters with each other and with other data types. Proposed work also includes ongoing development of user friendly interfaces for viewing data and its annotations to help biologists use the results to generate new hypotheses about drug targets, the biological basis for metastasis, drug sensitivities, and tumor classification.