Top 5.8% dependent packages on proxy.golang.org
Top 4.8% dependent repos on proxy.golang.org
proxy.golang.org : github.com/openacid/slimarray
Package slimarray uses polynomial to compress and store an array of uint32. A uint32 costs only 5 bits in a sorted array of a million number in range [0, 1000*1000]. We use a polynomial y = a + bx + cx² to describe the overall trend of the numbers. And for every number i we add a residual to fit the gap between y(i) and nums[i]. E.g. If there are 4 numbers: 0, 15, 33, 50 The polynomial and residuals are: In this case the residuals require 3 bits for each of them. To retrieve the numbers, we evaluate y(i) and add the residual to it: Another space efficient data structure to store uint32 array is trie or prefix tree or radix tree. It is possible to use bitmap-based btree like structure to reduce space(very likely in such case it provides higher compression rate). But it requires the array to be sorted. SlimArray does not have such restriction. It is more adaptive with data layout. To achieve high compression rate, it only requires the data has a overall trend, e.g., roughly sorted, as seen in the above 4 integers examples. Additionally, it also accept duplicated element in the array, which a bitmap based or tree-like data structure does not allow. SlimArray splits the entire array into segments(Seg), each of which has 1024 numbers. And then it splits every segment into several spans. Every span has its own polynomial. A span has 16*k numbers. A segment has at most 64 spans. A SlimArray is a compacted data structure. The original data structures are defined as follow(assumes original user data is `nums []uint32`): A span stores 16*k int32 in it, where k ∈ [1, 64). `Seg.SpansBitmap` describes the layout of Span-s in a Seg. The i-th "1" indicates where the last 16 numbers are in the i-th Span. e.g.: In the above example: `Seg.Rank` caches the total count of "1" in all preceding Seg.SpansBitmap. This accelerate locating a Span in the packed field SlimArray.Polynomials . `Span.width` is the count of numbers stored in this span. It does not need to be stored because it can be calculated by counting the "0" between two "1" in `Seg.SpansBitmap`. `Span.Polynomial` stores 3 coefficients of the polynomial describing the overall trend of this span. I.e. the `[a, b, c]` in `y = a + bx + cx²` `Span.Config.Offset` adjust the offset to locate a residual. In a span we want to have that: But if the preceding span has smaller residual width, the "offset" could be negative, e.g.: span[0] has residual of width 0 and 16 residuals, span[1] has residual of width 4. Then the "offset" of span[1] is `-16*4` in order to satisfy: `(-16*4) + i * 4` is the correct residual position, for i in [16, 32). `Span.Config.ResidualWidth` specifies the number of bits to store every residual in this span, it must be a power of 2: `2^k`. `Span.Residuals` is an array of residuals of length `Span.width`. Every elt in it is a `ResidualWidth`-bits integers. SlimArray compact `Seg` into a dense format: `SlimArray.Residuals` simply packs the residuals of every nums[i] together.
Registry
-
Source
- Documentation
- JSON
purl: pkg:golang/github.com/openacid/slimarray
Keywords:
array
, compacted
, compress
, go
, golang
, memory
, space
License: MIT
Latest release: over 4 years ago
First release: over 4 years ago
Namespace: github.com/openacid
Dependent packages: 1
Dependent repositories: 1
Stars: 40 on GitHub
Forks: 2 on GitHub
See more repository details: repos.ecosyste.ms
Last synced: 15 days ago