Table 5

An example for constructing string features

Input and processed documents and condidate string features


Input document

The Three Human Syntrophin Genes Are Expressed in Diverse Tissues, Have Distinct Chromosomal Locations, and Each Bind to Dystrophin and Its Relatives

Processed document

the three human syntrophin genes are expressed in diverse tissues have distinct chromosomal locations and each bind to dystrophin and its relatives

Candidate string features

the thr

he thre

e three

three h

hree hu

ree hum

ee huma

e human


The length of substring is fixed to 7. The example document only has one sentence (the title of the document of PMID:8576247). A seven-character window moves along the sequential text. All characters are converted to lower case. Only alphabetical letters and the space character are processed. Punctuation is converted to the space character.

Huang et al. Genome Biology 2008 9(Suppl 2):S12   doi:10.1186/gb-2008-9-s2-s12

Open Data