帮助 关于我们


]A combined statistical model for multiple motifs search


Gao LiFeng 1   Liu Xin 2   Guan Shan 3  
文摘 Transcription factor binding sites (TFBS) play key roles in genebior 6.8 wavelet expression and regulation. They are short sequence segments with definite structure and can be recognized by the corresponding transcription factors correctly. From the viewpoint of statistics, the candidates of TFBS should be quite different from the segments that are randomly combined together by nucleotide. This paper proposes a combined statistical model for finding over-represented short sequence segments in different kinds of data set. While the over-represented short sequence segment is described by position weight matrix, the nucleotide distribution at most sites of the segment should be far from the background nucleotide distribution. The central idea of this approach is to search for such kind of signals. This algorithm is tested on 3 data sets, including binding sites data set of cyclic AMP receptor protein in E.coli, PlantProm DB which is a non-redundant collection of proximal promoter sequences from different species, collection of the intergenic sequences of the whole genome of E.Coli. Even though the complexity of these three data sets is quite different, the results show that this model is rather general and sensible.
来源 Chinese Physics. B ,2008,17(12):4396-4400 【核心库】
DOI 10.1088/1674-1056/17/12/011
关键词 transcription factor binding sites ; motif ; position weight matrix

1. Chinese Academy of Agriculture Science, Beijing, 100081  

2. Institute of Theoretical Physics, Lanzhou University, Lanzhou  

3. Physics Science and Technology Department, Yangzhou University, Yangzhou, 225009

语种 英文
ISSN 1674-1056
学科 物理学
基金 国家自然科学基金 ;  国家自然科学基金
文献收藏号 CSCD:3437045

参考文献 共 14 共1页

1.  Bussemaker H J. Proc. Natl. Acad. Sci. USA,2000,97(18):10096 CSCD被引 5    
2.  Bussemaker H J. Proc. Int. Conf. Intell. Syst. Mol. Biol,2000,8:67 CSCD被引 1    
3.  Helden J V. J. Mol. Biol,1998,281:827 CSCD被引 2    
4.  Sinha S. Nucleic Acids Res,2002,30:5549 CSCD被引 12    
5.  Stormo G D. Proc. Natl. Acad. Sci. USA,1989,86(4):1183 CSCD被引 10    
6.  Bailey T L. Proc. Int. Conf. Intell. Syst. Mol. Biol,1994,2:28 CSCD被引 15    
7.  Dempster A P. J. R. Statist. Soc. Ser. B,1977,39:1 CSCD被引 994    
8.  Lawrence C. Proteins,1990,7:41 CSCD被引 1    
9.  Lawrence C. Science,1993,262(5131):208 CSCD被引 7    
10.  Liu J S. J. Am. Statist. Assoc,1995,90:1156 CSCD被引 7    
11.  Li H. Proc. Natl. Acad. Sci. USA,2002,99(18):11772 CSCD被引 1    
12.  McCue L A. Nucleic Acids Res,2001,29:774 CSCD被引 1    
13.  Mcguire A M. Nucleic Acids Res,2000,28:4523 CSCD被引 1    
14.  Shahmuradov I A. Nucleic Acids Res,2003,31(1):114 CSCD被引 18    
引证文献 2

1 Sun Zhonghua Prediction of protein binding sites using physical and chemical descriptors and the support vector machine regression method Chinese Physics. B,2010,19(11):110502-1-110502-6
CSCD被引 6

2 Li Ying Synchronization between different motifs Chinese Physics. B,2010,19(11):110501-1-110501-7
CSCD被引 2


PlumX Metrics


版权所有 ©2008 中国科学院文献情报中心 制作维护:中国科学院文献情报中心
地址:北京中关村北四环西路33号 邮政编码:100190 联系电话:(010)82627496 E-mail:cscd@mail.las.ac.cn 京ICP备05002861号-4 | 京公网安备11010802043238号