github.com/cockroachdb/swiss | proxy.golang.org

Top 9.2% on proxy.golang.org
Top 8.7% dependent packages on proxy.golang.org

proxy.golang.org : github.com/cockroachdb/swiss

Package swiss is a Go implementation of Swiss Tables as described in https://abseil.io/about/design/swisstables. See also: https://faultlore.com/blah/hashbrown-tldr/. Google's C++ implementation: Swiss tables are hash tables that map keys to values, similar to Go's builtin map type. Swiss tables use open-addressing rather than chaining to handle collisions. If you're not familiar with open-addressing see https://en.wikipedia.org/wiki/Open_addressing. A hybrid between linear and quadratic probing is used - linear probing within groups of small fixed size and quadratic probing at the group level. The key design choice of Swiss tables is the usage of a separate metadata array that stores 1 byte per slot in the table. 7-bits of this "control byte" are taken from hash(key) and the remaining bit is used to indicate whether the slot is empty, full, or deleted. The metadata array allows quick probes. The Google implementation of Swiss tables uses SIMD on x86 CPUs in order to quickly check 16 slots at a time for a match. Neon on arm64 CPUs is apparently too high latency, but the generic version is still able to compare 8 bytes at time through bit tricks (SWAR, SIMD Within A Register). Google's Swiss Tables layout is N-1 slots where N is a power of 2 and N+groupSize control bytes. The [N:N+groupSize] control bytes mirror the first groupSize control bytes so that probe operations at the end of the control bytes array do not have to perform additional checks. The separation of control bytes and slots implies 2 cache misses for a large map (larger than L2 cache size) or a cold map. The swiss.Map implementation differs from Google's layout: it groups together 8 control bytes and 8 slots which often results in 1 cache miss for a large or cold map rather than separate accesses for the controls and slots. The mirrored control bytes are no longer needed and and groups no longer start at arbitrary slot index, but only at those that are multiples of 8. Probing is done by taking the top 57 bits of hash(key)%N as the index into the groups slice and then performing a check of the groupSize control bytes within the group. Probing walks through groups in the table using quadratic probing until it finds a group that has at least one empty slot. See the comments on probeSeq for more details on the order in which groups are probed and the guarantee that every group is examined which means that in the worst case probing will end when an empty slot is encountered (the map can never be 100% full). Deletion is performed using tombstones (ctrlDeleted) with an optimization to mark a slot as empty if we can prove that doing so would not violate the probing behavior that a group of full slots causes probing to continue. It is invalid to take a group of full slots and mark one as empty as doing so would cause subsequent lookups to terminate at that group rather than continue to probe. We prove a slot was never part of a full group by looking for whether any of the groupSize-1 neighbors to the left and right of the deleting slot are empty which indicates that the slot was never part of a full group. The Swiss table design has a significant caveat: resizing of the table is done all at once rather than incrementally. This can cause long-tail latency blips in some use cases. To address this caveat, extendible hashing (https://en.wikipedia.org/wiki/Extendible_hashing) is applied on top of the Swiss table foundation. In extendible hashing, there is a top-level directory containing entries pointing to buckets. In swiss.Map each bucket is a Swiss table as described above. The high bits of hash(key) are used to index into the bucket directory which is effectively a trie. The number of bits used is the globalDepth, resulting in 2^globalDepth directory entries. Adjacent entries in the directory are allowed to point to the same bucket which enables resizing to be done incrementally, one bucket at a time. Each bucket has a localDepth which is less than or equal to the globalDepth. If the localDepth for a bucket equals the globalDepth then only a single directory entry points to the bucket. Otherwise, more than one directory entry points to the bucket. The diagram below shows one possible scenario for the directory and buckets. With a globalDepth of 2 the directory contains 4 entries. The first 2 entries point to the same bucket which has a localDepth of 1, while the last 2 entries point to different buckets. The index into the directory is "hash(key) >> (64 - globalDepth)". When a bucket gets too large (specified by a configurable threshold) it is split. When a bucket is split its localDepth is incremented. If its localDepth is less than or equal to its globalDepth then the newly split bucket can be installed in the directory. If the bucket's localDepth is greater than the globalDepth then the globalDepth is incremented and the directory is reallocated at twice its current size. In the diagram above, consider what happens if the bucket at dir[3] is split: Note that the diagram above is very unlikely with a good hash function as the buckets will tend to fill at a similar rate. The split operation redistributes the records in a bucket into two buckets. This is done by walking over the records in the bucket to be split, computing hash(key) and using localDepth to extract the bit which determines whether to leave the record in the current bucket or to move it to the new bucket. Maps containing only a single bucket are optimized to avoid the directory indexing resulting in performance that is equivalent to a Swiss table without extendible hashing. A single bucket can be guaranteed by configuring a very large bucket size threshold via the WithMaxBucketCapacity option. In order to avoid a level of indirection when accessing a bucket, the bucket directory points to buckets by value rather than by pointer. Adjacent bucket[K,V]'s which share are logically the same bucket share the bucket.groups slice and have the same values for bucket.{groupMask,localDepth,index}. The other fields of a bucket are only valid for buckets where &m.dir[bucket.index] = &bucket (i.e. the first bucket in the directory with the specified index). During Get operations, any of the buckets with the same index may be used for retrieval. During Put and Delete operations an additional indirection is performed, though the common case is that this indirection is within the same cache line as it is to the immediately preceding bucket in the directory. The implementation follows Google's Abseil implementation of Swiss Tables, and is heavily tuned, using unsafe and raw pointer arithmentic rather than Go slices to squeeze out every drop of performance. In order to support hashing of arbitrary keys, a hack is performed to extract the hash function from Go's implementation of map[K]struct{} by reaching into the internals of the type. (This might break in future version of Go, but is likely fixable unless the Go runtime does something drastic). A swiss.Map has similar or slightly better performance than Go's builtin map for small map sizes, and is much faster at large map sizes. See [README.md] for details. [README.md] https://github.com/cockroachdb/swiss/blob/main/README.md

Registry - Source - Documentation - JSON
purl: pkg:golang/github.com/cockroachdb/swiss
License: Apache-2.0
Latest release: 4 days ago
Namespace: github.com/cockroachdb
Dependent packages: 6
Stars: 158 on GitHub
Forks: 8 on GitHub
See more repository details: repos.ecosyste.ms
Last synced: 4 days ago

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Packages

proxy.golang.org : github.com/cockroachdb/swiss