At the beginning of computation, data resides in hard disk. Hard disks are slow, but their biggest benefit comes from being their ability to retain data after computer is turned off. During the course of computation, part or all of input data gets pulled from hard drive into memory (RAM). Subsequently, processor and other logical units interact with data in memory, morph it into final form and save output in the disk.
In functional terms, both RAM and hard disk are equivalent storage units. The above way of bringing data into memory and saving output back to hard disk is guided by the fact that hard disk access takes far more time than RAM access. Modern processors take the concept one step further by adding another layer of memory (cache) within the processor. Some part of data are moved from RAM to cache, processed and moved back to RAM, thus increasing the speed of processing by many times.
Bioinformaticians working with NGS data often face the challenge of having too little RAM for their programs. Some operating systems (such as Windows-based laptops) limits the maximum amount of RAM to be installed in the computer. Does that mean some types of bioinformatics programs are out of reach for such computers?
Given that memory and hard-disk are functionally equivalent, there is no mathematical or algorithmic limitation in executing the programs. Any RAM-based program can be rewritten to be run in hard-drive except that the speed of execution will be impacted by movement of data back and forth between hard-drive and processor. However, by replacing the mechanical hard-drive with a solid-state drive with faster access, an user can ameliorate the speed problem.
One elegant algorithm using such option is DSK for k-mer counting (SSD and k-mer counting).
Storage and retrieval of data from the disk is the weakest link in the entire computing chain. It is possible to reduce that time by moving data to a faster disk. The concept is nothing new. In the earlier days, bulk data used to be stored in tape drives for long term. During computation, data from those tapes were loaded into mechanical hard-drives.
With improvement in technology, mechanical hard-drives have become the slow storage modules, and SSDs or solid-state disks have taken the place for fast drives.
Draw back - Very large RAM.
One possibility - replace large RAM by inexpensive hard drive, and then interact.
k-mer counting.
SSDs are about 7-8 times more expensive. However, they do not need to store all data.
Considerations -
operating system support
http://en.wikipedia.org/wiki/Solid-state_drive
RAM - huge SSD - persistent
Second option - bringing large block of data from harddrive in one cycle and saving back