Probabilistic Data Structures - an Excellent Introduction

Probabilistic Data Structures - an Excellent Introduction


Probabilistic data structures are being used to significantly cut down the memory requirements of de Bruijn graphs. We discussed their application to NGS world in earlier articles - here and here.

We found a blog post (hat tip: Rayan Chikhi) that provides excellent introduction to probabilistic data structures. The context of the blog post is the internet world, but many issues mentioned there are equally applicable to those working on NGS. If you do not immediately recognize the similarity, here is a clue. The typical questions asked for the ‘random integer count’ example in the above blog post are identical to k-mer counting problem in NGS.

Probabilistic Data Structures for Web Analytics and Data Mining

The original Whang, Vander-Zaden, Taylor article that the above-mentioned blog summarizes can be found at this link.



Written by M. //