In vivo regulatory screen of all 6-mers. (a) Left top: naively concatenating three 6-mers creates an oligomer with multiple representatives of ATTGCG (red bars). Center and left: cartoon of a de Bruijn graph. Nodes (colored boxes) represent 6-mers; edges (arrows) represent overlap. A standard de-Bruijn-sequence library is built from one path that traverses each of 4,096 nodes once. Constructed from multiple paths, our MRCC library instead uses one representative for each pair of reverse-complementary 6-mers (green and yellow; self-reverse-complementary palindromes in blue), making it nearly 50% smaller (Additional file 1). Right: 16 of 16,384 edges shown. Our algorithm removes reverse-complementary paths (black, red) between palindrome pairs and then decomposes the remaining graph into reverse-complementary cycles. It allowed us to design an ultra-compact library of DNA sequences with uniform 6-mer coverage. (b) Schematic depicting the sub-cloning of each 15-bp multiplexed oligomer into the E1b-Tol2 vector and subsequent injection into single-cell zebrafish embryos. (c) Violin plots depicting the distribution of the expression patterns of each tissue at 24 hpf. White lines indicate the fractional expression values for the empty vector construct. (d) Scatter plot depicting the method by which we selected consistently expressed multiplexed oligomers whose expression was not significantly correlated to minimal-promoter bias. The vertical dotted line denotes the 40% fractional expression threshold that was used, whereas the horizontal dotted line corresponds to a false discovery rate-adjusted P-value of 0.05. (e) Histogram depicting the tissue specificity of the 22 uncorrelated, consistently expressed constructs at 24 hpf. (f) Representative images at 24 hpf for embryos injected with 1EF03 (top) and 3EF09 (bottom), exhibiting broad expression that was correlated with that of the empty vector expression. The full sequence of each construct (with XhoI and BglII flanking sites) is listed below each figure. Both constructs have a 5' GC-rich region. GFP, green fluorescent protein.
Smith et al. Genome Biology 2013 14:R72 doi:10.1186/gb-2013-14-7-r72