CIS: Compound Importance Sampling Method for Protein-DNA Binding Site p-value Estimation

Yoseph Barash1,3, Gal Elidan1,3, Tommy Kaplan1,2,3 and Nir Friedman1

1. School of Computer Science & Engineering, The Hebrew University, Jerusalem 91904, Israel
2. Hadassah Medical School, The Hebrew University, Jerusalem 91120, Israel
3. These authors contributed equally to this manuscript

Motivation: Transcription regulation involves binding of transcription factors to sequence-specific sites and controlling the expression of nearby genes. Given binding site models, one can scan the regulatory regions for putative binding sites and construct a genome-wide regulatory network. Several recent works demonstrated the importance of modeling dependencies between positions in the binding site. The challenge is to evaluate the statistical significance of binding sites using these models.

Results: We present a general, accurate and efficient method for this task, applicable to any probabilistic binding site and background models. We demonstrate the accuracy of the method on synthetic and real-life data.

Availability: The algorithm used to compute the statistical significance of putative binding sites scores is available online at http://compbio.cs.huji.ac.il/CIS

Download ISMB/ECCB 2004 short paper.
A short explanation about importance sampling and the CIS method.

The CIS software is available upon request. Please contact tommy@cs.huji.ac.il.