next up previous
Next: Set Pattern Mining Up: Home

1.1 Algorithm for distance constrained pattern mining



HybridGenePatternMining($ T_\delta, T_p, T_z, T_s,G$): 
1)	$ {\mathcal R}$ = dSetPatternMining($ T_\delta, T_p, T_z, G = \{G_1,\dots,G_n\}$)   
2)	foreach $ R^{T_p} \in {\mathcal R}$ 
3)		twinPatternMining($ R^{T_p}, T_p, T_s, G = \{G_1,\dots,G_n\}$)
	endforeach 


dSetPatternMining($ T_\delta, T_p, T_z, G = \{G_1,\dots,G_n\}$): // Compute $ T_p$-tuples from $ T_p$ different genomes. 1) $ {\mathcal R} = \emptyset$; 2) Compute $ DMSET(G_i)$ for $ 1 \le i \le n$. 3) Let $ {\mathcal G}^k$ be a set of $ k$-tuples, { $ (G_{i_1},\dots,G_{i_k}) ~\vert~ G_{i_j} \in G$ } 4) $ {\mathcal G}^1 = \{(G_1),\dots,(G_n)\}$; 5) $ k = 2$; 6) while ( $ k \le T_p $) // Level $ k$ enumeration 7) $ {\mathcal G}^k = \emptyset$; 8) foreach $ (G_{i_1},\dots,G_{i_{k-1}}) \in {\mathcal G}^{k-1}$ 9) $ DMSET(G_{i_1},\dots,G_{i_{k-1}},G_j) = \emptyset$ 10) foreach ($ R^{k-1} \in DMSET(G_{i_1},\dots,G_{i_{k-1}})$ and $ G_j$ such that $ i_{k-1} < j$ for $ 1 \le j < k$) 11) $ DMSET(G_{i_1},\dots,G_{i_{k-1}},G_j)$ $ = DMSET(G_{i_1},\dots,G_{i_{k-1}},G_j) \bigcup$ Compute_$ DMSET_k(T_z, R^{k-1}, G_{j})$; endforeach 12) if ($ DMSET(G_{i_1},\dots,G_{i_{k-1}},G_j) \ne \emptyset$) $ {\mathcal G}^k = {\mathcal G}^k \bigcup \{(G_{i_1},\dots,G_{i_{k-1}},G_j)\}$; 13) if ($ k +1 == T_p$) $ {\mathcal R} = {\mathcal R} \bigcup DMSET(G_{i_1},\dots,G_{i_{k-1}},G_j)$; endforeach 14) $ k = k +1$; endwhile 15) return $ {\mathcal R}$;

Compute_$ DMSET_k(T_z, R^k ,G_j)$ // Compute $ (k+1)$-tuples by adding a run form $ G_j$ to $ R^k$. 1) Let $ {\mathcal R}^{k+1} = \emptyset$ 2) foreach $ s \in RUN(G_j)$ 3) $ {\mathcal R}^{k+1} = {\mathcal R}^{k+1} \bigcup$ ComputeMdsetMatch($ T_z, R_k, s$) endforeach 4) return $ {\mathcal R}^{k+1}$

ComputeMdsetMatch($ T_z, R^k=(r_1, r_2, \dots, r_k), s$) // Compute $ T^{k+1} = R^k \wedge s$ 1) if ($ P_{dset}(R^k) == P_{dset}(s) $) return {$ (r_1, r_2, \dots, r_k, s$)}; // Each run $ r_i$ will be iteratively dset-matched 2) $ {\mathcal T}^2$ = dSetMatch($ T_z, r_1, s_j$) 3) for $ i = 3 $ to $ k$ do 4) foreach (($ r'_1,\dots,r'_{i-1}$) $ \in {\mathcal T}^{i-1}$) 5) $ {\mathcal T}^i$ = ComputeMdsetMatch($ T_z,(r'_1,\dots,r'_{i-1}), s_j$); endforeach endfor 6) return $ {\mathcal T}^{k+1}$

dSetMatch($ T_z, r, s$) // Compute $ r \wedge s$ 1) if ($ \vert P_{dset}(r)\vert < T_z$  or $ \vert P_{dset}(s)\vert < T_z$) return $ \emptyset$ 2) if ($ \vert P_{dset}(r)\vert == \vert P_{dset}(s)\vert$) return $ \{(r,s)\}$ 3) if ($ P_{dset}(s) - P_{dset}(r) \ne \emptyset$) 4) Split $ s$ into $ B = \{s_1,\dots s_m\}$ by breaking where $ g_i \in P_{dset}(s)$ and $ g_i \notin P_{dset}(r)$; 5) Remove $ s_i$ from $ B$ if $ \vert P_{dset}(s_i)\vert < T_z$; else 6) $ B = \{s\}$; endif 7) $ {\mathcal R}^2 = \emptyset$; 8) foreach $ s_j \in B$ 9) $ {\mathcal R}^2 = {\mathcal R}^2 \bigcup$ dSetMatch($ T_z, s_j, r$); endforeach 10) return $ {\mathcal R}^2$;


next up previous
Next: Set Pattern Mining Up: Home