Chapter 4
Running Analysis

4.1 Enrichment analysis

Enrichment analysis consists in a quantitative measure to infer if the values of a biological condition (e.g. fold changes) shows statistically significant, concordant values or increased proportion for a particular set of biological annotations (module or gene set).

4.1.1 How to run an enrichment analysis

Before running an enrichment analysis you should have this data prepared in files:

To start the enrichment wizard go to the menu File > New > Analysis > Enrichment analysis ...

Selection of the destination file

PIC

The first wizard page allows to specify the prefix name for the files generated during the analysis and the folder where they will be created. The text in name and folder can be freely edited but it is also possible to navigate through system folders in order to select one pressing the button [Browse].

Selection of the data

PIC

Data format & Data file A file containing the data can be selected pressing the button [Browse...] and selecting a file. The format of the file will be automatically detected if the file extension matches any of the known file formats, otherwise a warning will appear and it should be the user who selects the appropiate format.

Population file This field can be left blank if the data file contains data for all background elements, otherwise a file with the elements of the background population should be selected. The format of this file is a simple text file with one element per line. For example the list of all the protein coding genes, each one in a different row.

Filter rows by label This feature is not yet implemented but will allow to select a subset of rows of the data file using a list of labels from another file.

Transform to 1 / 0 Some statistical tests are designed to work with discrete events (as Binomial or Fisher’s exact tests), this option allows to transform a matrix with real values into a binary matrix containing only 1’s and 0’s for the analysis. All the values which satisfy the condition will be transformed to 1 and the rest to 0. For example, if the data file is an expression matrix with log2 ratios it can be transformed to a binary matrix having a 1 for all the log2 ratios greater than 1.5. Other possible application is when the matrix have p-values, if a significance of 0.05 is considered, all values less than 0.05 could be transformed to 1’s.

Filter out rows for which no information appears in the module Sometimes is conveniento to restrict the background population to only those elements belonging to any module, for example, the data file could have information for all the genes of a microarray but only the genes with GO biological process annotations should be considered for the background.

Selection of the modules

PIC

File format & File A file containing the modules information can be selected pressing the button [Browse...] and selecting a file. The format of the file will be automatically detected if the file extension matches any of the known file formats, otherwise a warning will appear and it should be the user who selects the appropiate format.

Modules filtering If the number of elements of one module is too low some tests could not generate reliable results (i.e. zscore or bionamial tests), on the other hand there are tests best suited for small modules like fisher’s exact test. It is possible to discard the modules with less or more than a certain quantity using this filters.

Selection of the test

PIC

There are different kind of statistical tests that can be used, in this page the user can select the one that he/she wants to use. See section 4.1.2 for more details.

Analysis details

PIC

This step is optional but recommended as it allows to give some details about the analysis for better organization and annotation of the results for future reviews.

It will be possible to specify free attributes for the analysis as Organization, Operator, Platform and so forth but it is not finished yet. Note that this option is already available through command line interface.

4.1.2 How is this calculated ?

Currently there are three different statistical tests implemented for enrichment analysis in Gitools.

We are preparing some detailed information about the implemented tests but it is not ready yet. Meanwhile you can search in the wikipedia for the tests to have an overall idea.

Binomial (Bernoulli) test

This part of the documentation is still missing.

Fisher’s exact test

This part of the documentation is still missing.

Z-score test

The Z-score test is used to see if there is a significant deviation from random expectation for an estimator value (mean or median) of each module in each of the conditions:

Zi = (X−μi)
  σi

where μi is the mean and σi is the standard deviation of the estimator for different permutations for condition i.

4.2 GSEA

This analysis is not yet implemented but is planned for next releases.

4.3 Oncodrive analysis

This analysis is not yet implemented but is planned for next releases.

4.4 Correlations

This analysis is not yet implemented but is planned for next releases.

4.5 Combinations

This analysis is not yet implemented but is planned for next releases.