CAGE

Home
Objectives
Consortium
Project details
Links
   
 
 

Compendium of Arabidopsis Gene Expression

   
   
     
    Objectives:
   

The CAGE project aims to build a gene expression reference database. A Consortium of European Arabidopsis functional genomics centers has teamed up with bioinformatics partners that contribute expertise in microarray data processing, analysis and storage/distribution. A total of 2000 Arabidopsis samples will be produced and analysed under largely standardised conditions. These samples will be profiled on CATMA microarrays containing gene specific probes for most Arabidopsis genes, to build a Compendium of expression profiles. The data will be assessed for statistical significance and submitted to the ArrayExpress database at the European Bioinformatics Institute (EBI). EBI will deliver specific CAGE ontology, and data submission pipelines. The Compendium data will be annotated and analysed for content and confirmation of gene function. The Compendium will further be maintained by EBI.

   
   
    Aims:
   
  • To demonstrate the utility of the CATMA array (a novel microarray format for whole-genome Arabidopsis transcription profiling (see: http://www.catma.org)
  • To deliver a prototype Compendium of Arabidopsis expression profiles which is a catalogue of molecular phenotypes describing plant development and function; such a catalogue will be a reference for future transcription analysis in Arabidopsis as it will be maintained and accessible through the European Bioinformatics Institute;
  • To demonstrate that adherence to guidelines, standard protocols and procedures will generate added value through coherence and quality of data;
  • To establish data submission and processing pipelines, and analytical tools for the Arabidopsis research community, facilitating future genome-wide Arabidopsis transcription profiling.
     
     
    Description of the work:
   
Arabidopsis Functional Genomics today faces the immense challenge to map genomic sequence to function since most of the 29,000 or more genes identified in the Arabidopsis genome have not been characterised experimentally. A particularly powerful technology for the association of gene-to-function is microarray-based expression analysis. In the CAGE project we will build a publicly available functional genomics knowledge base using the novel CATMA microarray. The project will demonstrate both the power of this microarray (designed to discriminate highly between gene homologues), and the added value of analysis of microarray data in a Compendium format.
       
   

Arabidopsis thaliana
Columbia, growth stage 1.04, BBCH scale

(Boyes et al., Plant Cell, 200,)[PDF]

.

     
   
To successfully accomplish this we have brought together a consortium of European laboratories including a series of plant research centres (URGV/France, VIB/Belgium, HRI/United Kingdom, MPI/Germany, SLU/Sweden, RUU/Netherlands, CSIS/Spain and UNIL/Switzerland); the VIB-Micro Array Facility; and partners that excel in developing statistics and mining algorithms for the analysis of gene expression (ESAT/Belgium, EBI/United Kingdom). All microarrays will be produced by VIB-MAF, thereby controlling variance and reducing cost. A total of 4000 microarrays will be provided to the project partners. Together they will analyse 2000 carefully chosen Arabidopsis samples (two chips per sample, Reference design) to explore Arabidopsis’ “developmental and functional space”. Samples will consist of biological replicates; some tissues and organs will be sampled even more extensively).
   
   
The resulting data will be statistically analysed for quality and significance by ESAT, and subsequently submitted to the central ArrayExpress database at EBI. EBI (in collaboration with TAIR) will deliver specific CAGE ontology, and data submission pipelines. The Compendium data will be annotated with pre-computed results, and thoroughly analysed for content and proof of gene function, as will be demonstrated in publications.
     
   
     
    Deliverables:
   
The duration of the project is 3 years. A total of 4000 microarrays with at least 25000 features will be produced over a period of 18 months. The partners will assemble 2000 biological samples, and processing on microarrays will generate close to 100.000.000 data points. The reference sample used in all comparisons will be made publicly available. The first data will be produced in the summer of 2003, and data production will continue until the end of 2004. All data will be first analysed by the consortium partners, and submitted for publication prior to releasing the Compendium data to the scientific community. However, we will keep the lag-time for data submission as short as possible (< 6 months). Data processing pipelines and pre-computed results will be released for public use. Submission of the final data will be completed by the end of 2005. The Ath. ArrayExpress database will be maintained by the EBI.
   
     

 

Last updated:
Questions and comments:Webmaster or Management