BLiC

This site supports a paper published in PLoS computational Biology on 29 Feb 2008

Home

Overview:

Transcription map:

Condition dependant
transcription factors

Supplementary Tables:


A Novel Bayesian DNA Motif Comparison Method
for Clustering and Retrieval

Naomi Habib1,2, Tommy Kaplan1,2, Hanah Margalit2, Nir Friedman1

Abstract

Characterizing the DNA-binding specificities of transcription factors
is a key problem in computational biology that has been addressed
by multiple algorithms. These usually take as input sequences that
are putatively bound by the same factor and output one or more DNA
motifs. A common practice is to apply several such algorithms
simultaneously to improve coverage at the price of redundancy.
In interpreting such results two tasks are crucial: clustering
of redundant motifs, and attributing the motifs to transcription factors
by retrieval of similar motifs from previously characterized
motif libraries. Both tasks inherently involve motif comparison.
Here we present a novel method for comparing and merging motifs,
based on Bayesian probabilistic principles. This method takes into
account both the similarity in positional nucleotide distributions
of the two motifs and their dissimilarity to the background distribution.
We demonstrate the use of the new comparison method as a basis for
motif clustering and retrieval procedures, and compare it to several
commonly used alternatives. Our results show that the new method
outperforms other available methods in accuracy and sensitivity.
We incorporated the resulting motif clustering and retrieval
procedures in a large-scale automated pipeline for analyzing DNA
motifs. This pipeline integrates the results of various DNA motif
discovery algorithms and automatically merges redundant motifs from
multiple training sets into a coherent annotated library of motifs.
Application of this pipeline to recent genome-wide transcription
factor location data in S. cerevisiae successfully identified
DNA motifs in a manner that is as good as semi-automated analysis
reported in the literature. Moreover, we show how this analysis
elucidates the mechanisms of condition-specific preferences of
transcription factors.

1 School of Computer Science and Engineering, The Hebrew University, Jerusalem, Israel
2 Department of Molecular Genetics and Biotechnology, Faculty of Medicine,
The Hebrew University, Jerusalem, Israel