A beachcomber is interested in obtaining up to 10 examples of every type of shell present on a beach. The shells are individually easy to find, but some types are really rare and some are really abundant. The beachcomber has access to a large group of helpers, and wants to know how to most efficiently divide up the helpers across the beach to gather the shells most quickly, while minimizing discussion and communication between the helpers.
Or, more generally, given a data set D containing some unknown number of item types M with an unknown abundance distribution, how can we most efficiently recover at least N items of each type from D using Z workers, in the absence of fast communication between the workers?
There is a fairly easy I/O bound streaming online solution for this, I think, but I’m interested in knowing if this is isomorphic to an established algorithms problem.
Parallel and Distributed Association Mining: A Survey
Mohammed J. Zaki, Rensselaer Polytechnic Institute
The author surveys the state of the art in parallel and distributed association-rule-mining algorithms and uncovers the fields challenges and open research problems. This survey can serve as a reference for both researchers and practitioners.
C. Titus Brown also posted a detailed manual on how to use digital normalization for various types of data (genome, metagenome, transcriptome, etc.). Looks like we need to create a new category for diginorm.