Overview:
The BLiC ScroeA small intruduction:Distinguishing between informative and noninformative positions:Two pairs of aligned motifs, both pairs have three identical positions and two different ones. However, the identical positions in pair number 1 are noninformative, while the identical positions in pair number 2 are informative. The desired score should distinguish between these two types of similarities and assign a higher similarity score to pair number 2. The nucleotide distribution in each motif is represented schematically (with a sequence logo, where the size of the nucleotide is proportional to its probability and the nucleotides are ordered according to their probability). Problematic aspects of currently used motif similarity functions: The similarity score of two PFMs decomposes into the sum of similarities of single aligned positions, due to the positionindependence assumption in the model. Here we present similarity scores for pairs of positions in DNA motifs by the various similarity functions in addition to a desired score. The nucleotide distribution in each position is represented schematically (with a sequence logo using probabilities, as in A). As shown here, the PearsonCorrelation does not reflect the true sequence similarity and the JensenShannon divergence and Euclidean distance do not differ between informative and background uniform positions. Clearly, position 1 should get a higher score than position 2, but the PearsonCorrelation scores for these positions are equal. Position 3 should get the lowest possible score, but the PearsonCorrelation does not capture this. Both in positions 1 and 4 identical distributions are compared, but position 1 should get a higher score than position 4, however all three methods fail to obtain this. Positions 4 and 5 should get similar scores, since in position 4 two identical positions are compared and in position 5 there are small differences between the position, however the PearsonCorrelation grades position 5 significantly lower than position 4 due to small deviations from the uniform distribution. The BLiC (Bayesian Likelihood 2Components) score 