Skip to content

solvejlilienthal/ImplementingSearch

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ImplementingSearch

This is an exercise to demonstrate the power of Suffix-Array and FM-Index based searching.

Usage

For FU-Berlin students I recommend looking ssh tutorial to log into our compute servers. I recommend using compute01-compute09 (e.g. compute04.mi.fu-berlin.de ). Also for longer running processes I can advice looking at this tmux tutorial for some basic instructions.

Cloning (very cool)

To checkout the code you can run:

  • git clone --recurse-submodules https://github.com/SGSSGene/ImplementingSearch

gitpod:

Lets say you are not an fu-student and don't have access to an linux machine, you can also use gitpod, which provides an online ide:

How to build the software

$ # We are assuming you are in the terminal/console inside the repository folder
$ mkdir build # creates a folder for our build system
$ cd build
$ cmake ..    # configures our build system
$ make        # builds our software, repeat this command to recompile your software
$ ./bin/naive_search --reference ../data/hg38_partial.fasta.gz --query ../data/illumina_reads_40.fasta.gz --query_ct 100 # calls the code in src/naive_search.cpp
$ ./bin/suffixarray_search --reference ../data/hg38_partial.fasta.gz --query ../data/illumina_reads_40.fasta.gz # calls the code in src/suffixarray_search.cpp

$ ./bin/fmindex_construct --reference ../data/hg38_partial.fasta.gz --index myIndex.index # creates an index, see src/fmindex_construct.cpp
$ ./bin/fmindex_search --index myIndex.index --query ../data/illumina_reads_40.fasta.gz --query_ct 100 --errors 0  # searches by using the fmindex, see src/fmindex_search.cpp

$ ./bin/fmindex_pigeon_search --reference ../data/hg38_partial.fasta.gz --index myIndex.index --query ../data/illumina_reads_40.fasta.gz --query_ct 100 --errors 0  # searches by using the fmindex, see src/fmindex_pigeon_search.cpp

What to do?

This demonstration is supposed to show you the power of the FM-Index. To fully feel the power you are supposed to compare a naively implemented search to an fm-index based search.

  1. Naive Search:
  • Check out the src/naive_search.cpp file. Fill in the //!TODO ImplementMe.
  • Change the --query_ct argument to play around with a different number of searches.
  • Run ./bin/naive_search for different query sizes and measure the time.
  1. SuffixArray Search:
  • Check out src/suffixarray_search.cpp. Fill in the //!TODO !ImplementMe
  • Change the --query_ct argument to play around with a different number of searches.
  • Run ./bin/suffixarray_search for dfferent query sizes and measure the time.
  1. FMIndex Search:
  • Check out src/fmindex_construct.cpp (nothing to do here). This builds an fm-index for you.
  • Run ./bin/fmindex_construct to build an fmindex. (It is saved as our_index.index)
  • Check out src/fmindex_search.cpp. Fill in the //!TODO !ImplementMe use the seqan3::search function to search.
  • Change the --query_ct argument to play around with a different number of searches.
  • Run ./bin/fmindex_search for different query sizes and measure the time.
  1. Which search is faster?
  • Check different query lengths: 40, 60, 80 and 100.
  1. Which search is more memory efficient?
  • Check different query lengths: 40, 60, 80, 100.
  1. FMIndex with errors:
  • Configure the fmindex_search to search with up to 2 errors
  • Implement src/fmindex_pigeon_search.cpp and use error free fmindex and use the pigeon hole principle to search for stuff with up to 2 errors.
  • Compare run times between fmindex_search and fmindex_pigeon_search with up to 2 errors.

Hints:

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C 67.0%
  • CMake 24.0%
  • C++ 9.0%