Chapter 9 The NCBI GEO Gene Expression database

(NCBI GEO: finding and analyzing expression profiles)

9.1 Overview

9.1.1 Abstract:

Introduction to the contents and utilities of the GEO mRNA expression database.

9.1.2 Objectives:

This unit will:

  • introduce the contents and utilities of the GEO mRNA expression database.

9.1.3 Outcomes:

After working through this unit you:

  • can access GEO, find expression datasets and analyze them with the provided tools.

9.1.4 Deliverables:

Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.

Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.

Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.

9.1.5 Prerequisites:

This unit builds on material covered in the following prerequisite units:

BIN-Data_integration (Data Integration)
BIN-EXPR-Analysis (Expression Analysis)

The transcriptome is the set of a cell's mRNA molecules. The transcriptome originates from the genome, mostly, that is, and it results in the proteome, again: mostly. RNA that is transcribed from the genome is not yet fit for translation but must be processed: splicing is ubiquitous7 and in addition RNA editing has been encountered in many species. Some authors therefore refer to the exome—the set of transcribed exons— to indicate the actual coding sequence.

Microarray technology — the quantitative, sequence-specific hybridization of labelled nucleotides in chip-format — was the first domain of "high-throughput biology". Today, it has largely been replaced by RNA-seq: quantification of transcribed mRNA by high-throughput sequencing and mapping reads to genes. Quantifying gene expression levels in a tissue-, development-, or response-specific way has yielded detailed insight into cellular function at the molecular level, with recent results of single-cell sequencing experiments adding a new level of precision. But not all transcripts are mapped to genes: we increasingly realize that the transcriptome is not merely a passive buffer of expressed information on its way to be translated into proteins, but contains multiple levels of complex, regulation through hybridization of small nuclear RNAs[(2015) The noncoding explosion. Nat Struct Mol Biol 22:1. (pmid: 25565024)
Jarvis & Robertson (2011) The noncoding universe. BMC Biol 9:52. (pmid: 21798102)].

NCBI's GEO database stores expression data and experiment metadata and makes it publicly available.

9.2 Task 20

Read the article below for a comprehensive current introduction to the GEO database. But do some active reading in the sense that you actually access the GEO database and follow along on the Web with what is being described in the paper.

Clough & Barrett (2016) The Gene Expression Omnibus Database. Methods Mol Biol 1418:93-110. (pmid: 27008011) PubMed DOI

9.3 Self-evaluation

If in doubt, ask!
If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.


Author: Boris Steipe boris.steipe@utoronto.ca
Created: 2017-08-05
Modified: 2017-11-10
Version: 1.0
Version history:
1.0 first live version
0.1 First stub

  1. Strictly speaking, splicing is an eukaryotic achievement, however there are examples of splicing in prokaryotes as well