We will compare four genomes,
AE000520.fna, AE000783.fna, L43967.fna, and U00089.fna.
The data is on /bioinfo/data/L519/data, but
create your own working directory and make symbolic links.
Three tools wil be used, cross_match (an engine for Phrap),
MUMmer, and Blastz (the engine for PIPmaker).
1. Compare a pair of genomes that are very close,
L43967.fna, and U00089.fna, using the three tools
and compare the results.
2. Do the same experiment with a pair of genomes,
AE000520.fna and AE000783.fna.
They are related but far away compared to the pair,
L43967.fna, and U00089.fna.
3. For each experiment, look for matches that corresponds to
protein coding genes and also look for intergenic regions.
We will build a PSSM (position specific scroing matrix)
using PSI-Blast.
1. select a query (any protein sequence) from the four genomes (.faa files).
2. blastpgp -i query -d allfour.faa -j 2 -Q pssm.txt
3. Compare pssm.txt with BLOSUM62
We will peform a motif discovery experiment.
1. Select a COG family from /bioinfo/data/COG/COGs
and run MEME
$ meme your-family -nmotifs 5
2. Select a couple of sequences from you family and
perform PFAM search using 'hmmpfam' commmand.
$ hmmpfam /bioinfo/data/Pfam/Pfam_fs query > output1
$ hmmpfam /bioinfo/data/Pfam/Pfam_ls query > output2
3. Compare the motifs with MEME against the domains detected
in Pfam 8.0
4. Select another COG family and perform
PFAM search to find domains in the PFAM 8.0.
5. Merge the two COG families into one and run MEME
to find motifs and then compare the motifs against
the known domains.