Data+Analysis

====**This section describes the analysis of Gene expression and the use of GEO database ,R and Bioconductor to analyze microarray data. This section also describes about the different types of file formats involved in storing gene expression data.**====

**Video discusses on Gene expression and NCBI's GEO website . This is a three part lecture series covering introduction of gene expression and how the data is stored in GEO database.**
media type="custom" key="24720900"

**General Information for R and Bioconductor**
To get started with R and [|Bioconductor] it is important to know where you can find help for the numerous functions, classes, and concepts you are about to come across. The ? operator is the most immediate source of information about R objects.Some specialized sources for help are the R and Bioconductor mailing lists (http://www.r-project.org/mail.html, http://www.bioconductor.org/mailList.html). To make the installation of Bioconductor packages as easy as possible, a Web-accessible script called biocLite that you can use to install any Bioconductor package along with its dependencies. You can also use biocLite to install packages hosted on CRAN.Below shows how to install bioconductor packages using biocLite. The command update.packages can be used to check for and install new versions of already installed packages code source("http://bioconductor.org/biocLite.R") biocLite(c("graph", "GEOquery")) source("http://bioconductor.org/biocLite.R") update.packages(repos=biocinstallRepos,ask=FALSE)
 * 1) To update bioconductor packages from the bioconductor repository

code Since, Genomic microarray data can be very complex the package Biobase contain standardized data structures to represent genomic data.The // ExpressionSet // class is designed to combine several different sources of information into a single convenient structure.The data in an ExpressionSet consist of: If you have access to .CEL or other files produced by microarray chip manufacturer hardware. Usually the strategy is to use a Bioconductor package such as **affyPLM, affy, oligo, limma**, or **arrayMagic** to read these files. These Bioconductor packages have functions (e.g., // ReadAffy, expresso , or justRMA // in **affy**) to read CEL files and perform preliminary preprocessing, and to represent the resulting data as an ExpressionSet or other type of object.
 * assayData: Expression data from microarray experiments (assayData is used to hint at the methods used to access different data components, as we show below).
 * metadata: A description of the samples in the experiment (phenoData), metadata about the features on the chip or technology used for the experiment (featureData), and further annotations for the features, for example gene annotations from biomedical databases (annotation).
 * experimentData: A flexible structure to describe the experiment.

code dataDirectory <- system.file("extdata", package="Biobase") exprsFile <-file.path(dataDirectory, "exprsData.txt") exprs <- as.matrix(read.table(exprsFile, header=TRUE, sep="\t", row.names=1, as.is=TRUE)) head(exprs) # to see the expression matrix code In order to load the GEO data files it provides a format, known as SOFT, which stands for Simple Omnibus Format in Text. There are actually four types of GEO SOFT file available:
 * The following diagram below shows the R packages and functions used in the each step of microarray data analysis. **
 * Building a expression set by hand **


 * GEO Platform (GPL)-** These files describe a particular type of microarray. They are annotation files.
 * GEO Sample (GSM)- ** Files that contain all the data from the use of a single chip. For each gene there will be multiple scores including the main one, held in the VALUE column.
 * GEO Series (GSE)- ** Lists of GSM files that together form a single experiment.
 * GEO Dataset (GDS) - ** These are curated files that hold a summarised combination of a GSE file and its GSM files. They contain normalised expression levels for each gene from each sample (i.e. just the VALUE field from the GSM file).

media type="custom" key="25175030"
 * Video Showing Differential Gene Expression Data Analysis **
 * Video 1**
 * Video 2**

1. Unsupervised and supervised classi cation 2. Training, testing and prediction against a random set
 * Video Showing analysing Microarray Gene expression data from a study cohort comprized of 190 samples from patients suffering from Acute Lymphoblastic Leukemia (ALL) from [|Dan Boer etal] . The major Goal of this Tutorial is to give the notion of:**

media type="youtube" key="oS94lFPfvbI" height="344" width="425"