logo.gif The leading source for professional development COURSES in statistics
Course Login
Home > Our Courses >



Statistical Analysis of Microarray Data with R

Dr. Shailaja Deshmukh and Dr. Sudha Purohit

Aim of Course:

In this course, participants will learn the statistical tools required for the analysis of microarray data, how to apply them using R software and how to interpret the results meaningfully. We will review the biology relevant to microarray data, then cover microarray experiment set up, quantification of information generated from the experiment, preprocessing of data including statistical tools for between array and within array normalization, statistical inference procedures to identify differentially expressed genes under two different conditions, and its extension to situations involving more than two conditions. The course will also introduce multivariate statistical tools, such as principal component analysis & cluster analysis. These tools help to identify differentially expressed genes, sets of co-regulated genes, which in turn will help to assign functions to genes.

Who Should Take This Course:

Biologists and geneticists who need to use statistical methods to analyze microarray data; also computer scientists and statisticians involved in microarray analysis projects. The course is designed to bridge the gap between several disciplines by providing the necessary information to participants with varied background.

For those enrolled in Professional Advancement Programs, this is a required or elective course in the following Programs:

  • Biostatistics (epidemiology) - elective
  • Data Mining - elective

Course Program:

The course is structured as follows

SESSION 1: Introduction to R*
  • Starting and stopping R, data types, using R as a simple calculator, methods of data input: c function, scan function and sequence function.
  • Importing data from text file, using data editor for entering and editing data.
  • Data frames and lists.
  • Using resident data sets.
  • Data accession or indexing from a vector and from a data frame, the functions attach and detach, transform function.
  • Graphics with R: Histogram, box plot, scatter plot.
  • Some functions such as mean, variance, standard deviation, coefficient of variation, mean absolute deviation, quantiles, sort, order.
  • Using on-line help.
  • Writing simple functions.

*If you are familiar with R and this material, you may join the course in its second week.

SESSION 2: Background of Microarrays and Normalization

  • Microarray experimental set up and quantification of information available from microarray experiments.
  • Data cleaning.
  • Transformation of data.
  • Between array & within array normalization.
  • Concordance coefficients and their use in normalization.
  • Numerical illustration for 4-6 with complete set of annotated R-commands.
SESSION 3: Statistical Inference procedures in comparative experiments
  • Basics of statistical hypothesis testing.
  • Two sample t- test.
  • paired t-test.
  • Tests for validating assumptions of t-test.
  • Welch test.
  • Wilcoxon rank sum test, signed rank test.
  • Adjustments for Multiple hypotheses testing including false discovery rate.
  • Numerical illustration for 2-8 with complete set of annotated R-commands.
  • One way ANOVA.
SESSION 4: Multivariate Techniques
  • Principal component analysis.
SESSION 5: Clustering.
  • Cluster analysis.

Note: This course is not intended as a comprehensive introduction to either statistics or the biology of genetics. Rather, it is intended for participants who have some background in one or the other or both. Recognizing that this background may be varied, considerable review material is provided in both biology and statistics, as part of the regular course readings, as noted below. It is anticipated that participants will pick and choose to focus their attention on areas of need. The more of this material you need to cover in the review, the more time (perhaps even beyond the projected 10-15 hours per week) you should budget for the course.

Supplementary Background in Biology: Genome project, structure of eukaryotic cell, DNA, RNA, gene expression, transcription, splicing, translation, microarray experimental setup, quantification of information generated by microarray experiment.

Supplementary Background in Statistics: Descriptive Statistics for univariate data, correlation and regression for bivariate data, basics of statistical hypothesis testing, one sample and two sample t- test, paired t-test, F-test for equality of variances, Welch test, Shapiro - Wilks test, Wilcoxon rank sum test, signed rank test, one way ANOVA, Bartlett's test, problem of multiple hypothesis testing, false discovery rate, principal component analysis, cluster analysis.

The Instructor:

Dr. Shailaja Deshmukh Professor of Statistics at the University of Pune, India. Her areas of interest are inference in stochastic processes, applied probability, analysis of microarray data and actuarial statistics. Her book, Microarray Data: Statistical Analysis Using R, (jointly with Dr. Sudha Purohit) is published by Narosa. She is the coauthor (jointly with Dr. Sudha Purohit and Dr. Sharad Gore) of Statistics Using R (forthcoming from Narosa). Her book Introduction to Actuarial Statistics is under preparation. She has a number of research publications in various peer-reviewed journals.

Dr. Sudha Purohit is a Visiting Lecturer in Statistics at the University of Pune and, before her retirement in 2000, was Head of the Department of Statistics at A. G. College, Pune, India. She is a co-author of three books, Life-Time Data: Statistical Models and Methods, Introduction to Biometry, and (with Dr. Shailaja Deshmukh) Microarray Data: Statistical Analysis Using R. She is a coauthor (jointly with Prof.Shailaja Deshmukh and Dr. Sharad Gore) of Statistics Using R (forthcoming from Narosa). Her areas of interest are survival analysis, reliability, programming with R and analysis of microarray data. She has published a number of research papers in various peer-reviewed journals.

Organization of the Course:

The course takes place over the internet, at statistics.com. During each course week, you participate at times of your own choosing - there are no set times when you must be online. Course participants will be given access to a private discussion board. In class discussions led by the instructor, you can post questions, seek clarification, and interact with your fellow students and the instructor. The course is scheduled to take place over 5 weeks, and typically requires 10-15 hours per week. At the beginning of each week, you receive the relevant material, in addition to answers to exercises from the previous session. During the week, you are expected to go over the course materials and work through exercises. Discussion among participants is encouraged. The instructor will provide answers and comments.

Certificates and Grades:

You may be interested only in learning the material presented, and not be concerned with grades or certificates. Or you may be enrolled in a statistics.com Professional Advancement Program that requires demonstration of proficiency in the subject, in which case your work will be assessed for purposes of issuing a grade. Or you may require only a "Certificate of Course Completion," along with professional development credit in the form of Continuing Education Units (CEU's). As you begin the class, you will be asked to specify your category.

Credit:

This course offers continuing education units (CEU's). For those successfully completing the course (generally this means marks of 50% or better on the homework), 6.25 CEU's and a certificate will be issued by statistics.com, upon request.

Dates:

Apr. 11 - May. 16, 2008
Oct. 31 - Nov. 28, 2008
Click here to be notified of future course offerings.

Participants gain access to the online materials on the first day of the course, and typically spend about 10-15 hours per week (at their convenience). You retain full access to course materials, including discussion board, for two weeks after the course closing date.

Level:

Intermediate

Prerequisite:

The equivalent of Introduction to Statistics I: Inference for a Single Variable, and Introduction to Statistics II: Working with Bivariate Data (and, if necessary before these courses, Introduction to Statistics for Beginners or Survey of Statistics for Beginners). Some familiarity with statistical modeling will also be helpful. Any of the following statistics.com courses would provide useful background in modeling: Regression,Logistic Regression, Introduction to Data Mining. Participants should also be familiar with basic molecular biology and microarray experiments, including gene expression, transcription, splicing, and translation.

Please also read the note at the end of the course outline concerning the course's review materials in biology and statistics, and the time that you should budget for this course. Also, please note the use of R software, as described below.

Course Text:

The required text is Microarray Data: Statistical Analysis Using R by Deshmukh and Purohit, Narosa Publishing Company, ISBN # 978-81-7319-850-2.

Use this printed order form for orders in the US, Canada, and anywhere in North or South America. You will need to fax it to the number indicated on the form.

Orders elsewhere in the world should use this online form.

PLEASE ORDER YOUR COPY IN TIME FOR THE COURSE! Neither of these distributors is accustomed to taking retail orders online, and if you are submitting your order within 2 weeks of the course date, please confirm it by phone with the book distributor (not statistics.com).

IF YOUR BOOK ORDER IS DELAYED OR OUT OF STOCK: Please fax us proof of your book order along with your email address (fax to statistics.com at 703-522-5846) and we will get you an electronic copy (PDF) to use while you are awaiting the book.

Software:

The software used in course illustrations and assignments is R, an open-source, freely-available statistical programming environment. Click Here for information on obtaining a free copy. Note that the initial week of the course is devoted to a brief introduction/refresher to R. Participants who are unfamiliar with R should download and install the software prior to the beginning of the course. If you are confident and comfortable with learning new software, the brief introduction in this course may suffice. If not, you should consider taking statistics.com's Introduction to R.

Registration:

Register Online - $449
Register Online (academic) - $349 (you must be affiliated with a college, university or high school)

Add $50 service fee if you require a prior invoice, or if you need to submit a purchase order or voucher, pay by wire transfer or EFT, or refund and reprocess a prior payment. Please use this printed registration form, for these and other special orders.

Note: Courses may fill up at any time and registrations are processed in the order in which they are received. Your registration will be confirmed for the first available course date, unless you specify otherwise.