Skip to content

enrichment (subset of the xstats package) is a Python library providing two methods to perform enrichment analysis, such as gene set enrichment analysis in bioinformatics

License

Notifications You must be signed in to change notification settings

ajmazurie/xstats.enrichment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

xstats.enrichment

enrichment is a Python library, part of the xstats toolkit, which you can use to perform to type of enrichment analysis:

  • you can compare how enriched a subset of objects is in some annotation when compared with a general population. E.g., how significant it is to find 12 women in a group of 20 people knowing that help of the world's population is female? The significance is evaluated using a Fisher's exact test.
  • you can evaluate how enriched the top of a ranked list of objects is in some annotation. There is no need to apply a cut-off to decide what is the top of the list; the significance of this enrichment is evaluated using methods from [Eden2007a] and [Eden2007b]

A typical use of such library is in bioinformatics, to perform gene set enrichment analysis. Given a set of genes for which a property (such as the expression level) is measured, enrichment can evaluate how enriched is the subset of all genes with expression level above a threshold in some functional annotations. It can also evaluate how enriched the top of a list of genes, ranked by decreasing expression level, is in some functional annotations.

[Eden2007a]Eden E, Lipson D, Yogev S and Yakhini Z. Motif discovery in ranked lists of DNA sequences. PLoS Computational Biology, 2007 Mar 23;3(3):e39
[Eden2007b]Eden E. Discovering motifs in ranked lists of DNA sequences. Research thesis, 2007 Jan
  • Contact Aurelien Mazurie <[email protected]>
  • Keywords Python, Enrichment analysis, Statistics, Bioinformatic, Fisher's exact test, GOrilla, mHG

Getting started

  • Download the latest version of the library from http://github/ajmazurie/xstats.enrichment/downloads
  • Unzip the downloaded file, and cd in the resulting directory
  • Run python setup.py install. Alternatively, you can package the library by typing python setup.py bdist, which will result in the creation of a file dist/xstats.enrichment-xxx.tar.gz, with 'xxx' being the version number and the name of your platform. Installing the library is then as simple as easy_install dist/Enrichment-xxx.tar.gz (see the setuptools documentation)

From then you only have to import enrichment to use the library:

import xstats.enrichment

# Analysis 1: how significant is it to have 10 objects out of 500
# that share a given annotation, knowing that 120 out of the 1800
# objects in the general population have this annotation?
l, r, t = xstats.enrichment.evaluate_subset(10, 500, 120, 1800)

# the left-tailed probability is the probability of having less
# than 10 objects out of 500 with this annotation:
print "left-tailed:", l # 5.25e-8

# the right-tailed probability is the probability of having more
# than 10 objects out of 500 with this annotation:
print "right-tailed:", r # 0.99

# the two-tailed probability is the probability of observing 10
# objects out of 500 with this annotation, plus the probabilities
# of observing even less likely proportions:
print "two-tailed:", t # 7.78e-8

# as a result, we demonstrate that finding 10 objects with this
# annotation is unexpected, as shown by the two-tailed p-value
# (significant at an error rate of 5%). To know exactly in which
# way the finding is unexpected, just look at the left- and right-
# tailed p-values. In this example the left-tailed p-value is very
# low, while the right-tailed p-value is almost 1. It means that
# finding 10 objects is unexpectedly low in regard of the general
# population.

# conversely, finding 100 objects in the selection of 500 with
# this property is unexpectedly high:
l, r, t = xstats.enrichment.evaluate_subset(100, 500, 120, 1800)

# the right-tailed p-value is very low, while the left-tailed is 1
print "left- and right-tailed:", l, r # 1, 1.32e-39


# Analysis 2: in a ranked list of 1000 objects, of which half of
# them share a given annotation, how significant is it to find 20
# objects with this annotation at the top of the list?

# we build an occurrence vector which, for each object in the list,
# contains either True or False to represent if this object have
# the annotation considered.

# we start with an homogeneous distribution:
occurrences = [True, False] * 500

# as expected, there is no significant enrichment at the top of the list:
print xstats.enrichment.evaluate_list(occurrences) # 0.99

# we now build a second occurrence vector, in which the first 20 objects
# all have the annotations:
occurrences = [True] * 20 + [False] * 20 + [True, False] * 480

# this time we found a significant enrichment at the top of the list,
# which in this case is determined as the 20 first entries:
p_value, pivot = xstats.enrichment.evaluate_list(occurrences, with_pivot = True)

print "p-value:", p_value # 3.67e-5
print "pivot:", pivot # 20

About

enrichment (subset of the xstats package) is a Python library providing two methods to perform enrichment analysis, such as gene set enrichment analysis in bioinformatics

Resources

License

Stars

Watchers

Forks

Packages

No packages published