Ecosyste.ms: Packages

An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

Top 8.2% on proxy.golang.org

proxy.golang.org : github.com/tylertreat/boomfilters

Package boom implements probabilistic data structures for processing continuous, unbounded data streams. This includes Stable Bloom Filters, Scalable Bloom Filters, Counting Bloom Filters, Inverse Bloom Filters, several variants of traditional Bloom filters, HyperLogLog, Count-Min Sketch, and MinHash. Classic Bloom filters generally require a priori knowledge of the data set in order to allocate an appropriately sized bit array. This works well for offline processing, but online processing typically involves unbounded data streams. With enough data, a traditional Bloom filter "fills up", after which it has a false-positive probability of 1. Boom Filters are useful for situations where the size of the data set isn't known ahead of time. For example, a Stable Bloom Filter can be used to deduplicate events from an unbounded event stream with a specified upper bound on false positives and minimal false negatives. Alternatively, an Inverse Bloom Filter is ideal for deduplicating a stream where duplicate events are relatively close together. This results in no false positives and, depending on how close together duplicates are, a small probability of false negatives. Scalable Bloom Filters place a tight upper bound on false positives while avoiding false negatives but require allocating memory proportional to the size of the data set. Counting Bloom Filters and Cuckoo Filters are useful for cases which require adding and removing elements to and from a set. For large or unbounded data sets, calculating the exact cardinality is impractical. HyperLogLog uses a fraction of the memory while providing an accurate approximation. Similarly, Count-Min Sketch provides an efficient way to estimate event frequency for data streams. TopK tracks the top-k most frequent elements. MinHash is a probabilistic algorithm to approximate the similarity between two sets. This can be used to cluster or compare documents by splitting the corpus into a bag of words.

Registry - Source - Documentation - JSON
purl: pkg:golang/github.com/tylertreat/boomfilters
Keywords: bloom-filter, count-min-sketch, counting-bloom-filters, cuckoo-filter, data-stream, filter, go, probabilistic-programming, scalable-bloom-filters, stable-bloom-filters
License: Apache-2.0
Latest release: about 3 years ago
First release: about 3 years ago
Namespace: github.com/tylertreat
Stars: 1,519 on GitHub
Forks: 110 on GitHub
See more repository details: repos.ecosyste.ms
Last synced: 29 days ago

Readme