proxy.golang.org : github.com/changkun/gomat
Package gomat is a golang matrix package with cache-aware lock-free tiling optimization. The following contents explains how multiplication algorithm works. Assume only 2 levels in the hierarchy, fast(registers/cache) and slow(main memory). All data initially in slow memory Larger q means time closer to minimum f*t_f Tiled matrix multiply Consider A, B, C to be N-by-N matrices of b-by-b sub-blocks: Thus Hence, computational intensity q = f/m = 2*n*n*n / (2*(N+1)*n*n) So we can improve performance by increasing the blocksize b Can be much faster than matrix-vector multiply (q=2) The blocked algorithm has computational intensity q ~= b: the large the block size, the more efficient algorithm will be Limit: All three blocks from A, B, C must fit in fast memory (cache), so we cannot make these blocks arbitrarily large. Assume our fast memory has size M_fast: To build a machine to run matrix multiply at 1/2 peak arthmetic speed of the machine, we need a fast memory of size This size is reasonable for L1 cache, but not for register sets Note: analysis assumes it is possible to schedule the instructions perfectly. The tiled algorithm changes the order in which values are accumulated into each C[i, j] by applying commutativity and associativity: Get slightly different answers from naive version, because of roundoff. The previous anlysis showed that the blocked algorithm has computational intensity: There is a lower bound result that says we cannot do any better than this (using only associativity): Theorem (Hong & Kong, 1981): Any reorganization of this algorithm (that uses only associativity) is limited to:
Registry
-
Source
- Documentation
- JSON
purl: pkg:golang/github.com/changkun/gomat
Keywords:
cache-aware
, lock-free
, matrix-library
License: MIT
Latest release: about 7 years ago
First release: about 7 years ago
Namespace: github.com/changkun
Stars: 2 on GitHub
Forks: 0 on GitHub
See more repository details: repos.ecosyste.ms
Last synced: about 1 month ago