spack.io "gpu" keyword
py-transformer-engine
A library for accelerating Transformer models on NVIDIA GPUs, including fp8 precision on Hopper ...Latest release: 1 day ago - 2,840 stars on GitHub - 1 maintainer
alpaka 0.8.0
Abstraction Library for Parallel Kernel Acceleration.6 versions - Latest release: almost 4 years ago - 1 dependent package - 393 stars on GitHub - 1 maintainer
omega-h 9.34.1
Omega_h is a C++11 library providing data structures and algorithms for adaptive discretizations....13 versions - Latest release: almost 4 years ago - 2 dependent packages - 102 stars on GitHub - 1 maintainer
care 0.3.0
CHAI and RAJA extensions (includes data structures and algorithms).4 versions - Latest release: almost 4 years ago - 31 stars on GitHub - 2 maintainers
cutlass
CUDA Templates for Linear Algebra SubroutinesLatest release: 6 days ago - 8,604 stars on GitHub
Top 3.3% on spack.io
51 versions - Latest release: about 1 month ago - 14 dependent packages - 4,124 stars on GitHub - 1 maintainer
nccl 2.29.2-1
Optimized primitives for collective multi-GPU communication.51 versions - Latest release: about 1 month ago - 14 dependent packages - 4,124 stars on GitHub - 1 maintainer
hipfort 7.2.0
Radeon Open Compute Parallel Primitives Library48 versions - Latest release: 9 days ago - 81 stars on GitHub - 4 maintainers
arborx 1.2
ArborX is a performance-portable library for geometric search5 versions - Latest release: almost 4 years ago - 4 dependent packages - 210 stars on GitHub - 1 maintainer
Top 8.9% on spack.io
29 versions - Latest release: almost 4 years ago - 8 dependent packages - 382 stars on GitHub - 3 maintainers
umpire 6.0.0
An application-focused API for memory management on NUMA & GPU architectures29 versions - Latest release: almost 4 years ago - 8 dependent packages - 382 stars on GitHub - 3 maintainers
hipsycl 0.9.1
hipSYCL is an implementation of the SYCL standard programming model over NVIDIA CUDA/AMD HIP3 versions - Latest release: almost 4 years ago - 586 stars on GitHub - 1 maintainer
py-fastfold 0.2.0
Optimizing Protein Structure Prediction Model Training and Inference on GPU Clusters.1 version - Latest release: about 3 years ago - 453 stars on GitHub - 1 maintainer
nvtop 3.0.1
Nvtop stands for Neat Videocard TOP, a (h)top like task monitor for AMD and NVIDIA GPUS. It can h...10 versions - Latest release: over 3 years ago - 9,582 stars on GitHub - 1 maintainer
Top 9.8% on spack.io
50 versions - Latest release: 14 days ago - 10 dependent packages - 134 stars on GitHub - 4 maintainers
rocprim 7.2.0
Radeon Open Compute Parallel Primitives Library50 versions - Latest release: 14 days ago - 10 dependent packages - 134 stars on GitHub - 4 maintainers
aluminum 1.0.0
Aluminum provides a generic interface to high-performance communication libraries, with a focus o...12 versions - Latest release: almost 4 years ago - 3 dependent packages - 85 stars on GitHub - 2 maintainers
py-qiskit-aer 0.11.1
Aer is a high performance simulator for quantum circuits that includes noise models2 versions - Latest release: over 3 years ago - 596 stars on GitHub - 1 maintainer
celeritas 0.5.1
Celeritas is a new Monte Carlo transport code designed for high- performance (GPU-targeted) simul...19 versions - Latest release: about 1 year ago - 92 stars on GitHub - 2 maintainers
neon
NeoN is a PDE solver for CFD frameworks.Latest release: 16 days ago - 75 stars on GitHub - 2 maintainers
py-cuml 0.15.0
cuML is a suite of libraries that implement machine learning algorithms and mathematical primitiv...1 version - Latest release: almost 4 years ago - 4,968 stars on GitHub - 1 maintainer
py-mpi4jax 0.3.11.post3
Zero-copy MPI communication of JAX arrays, for turbo-charged HPC applications in Python.1 version - Latest release: about 3 years ago - 496 stars on GitHub - 1 maintainer
cuda-memtest master
Maintained and updated fork of cuda_memtest. original homepage: http://sourceforge.net/projects/c...1 version - Latest release: almost 4 years ago - 134 stars on GitHub - 1 maintainer
rocrand 7.2.0
The rocRAND project provides functions that generate pseudo-random and quasi-random numbers.47 versions - Latest release: 17 days ago - 8 dependent packages - 130 stars on GitHub - 4 maintainers
cans 1.1.4
CaNS (Canonical Navier-Stokes) is a code for massively-parallel numerical simulations of fluid fl...4 versions - Latest release: almost 4 years ago - 250 stars on GitHub - 4 maintainers
libceed 0.1
The CEED API Library: Code for Efficient Extensible Discretizations.8 versions - Latest release: almost 4 years ago - 2 dependent packages - 236 stars on GitHub - 4 maintainers
Top 9.5% on spack.io
49 versions - Latest release: 18 days ago - 7 dependent packages - 133 stars on GitHub - 5 maintainers
rocfft 7.1.1
Radeon Open Compute FFT library49 versions - Latest release: 18 days ago - 7 dependent packages - 133 stars on GitHub - 5 maintainers
nekrs 21.0
nekRS is an open-source Navier Stokes solver based on the spectral element method targeting class...1 version - Latest release: almost 4 years ago - 1 dependent package - 356 stars on GitHub - 2 maintainers
Top 9.5% on spack.io
2 versions - Latest release: almost 4 years ago - 1 dependent package - 4,200 stars on GitHub - 1 maintainer
py-gpustat 0.6.0
An utility to monitor NVIDIA GPU status and usage.2 versions - Latest release: almost 4 years ago - 1 dependent package - 4,200 stars on GitHub - 1 maintainer
Top 8.2% on spack.io
Latest release: 22 days ago - 4 dependent packages - 673 stars on GitHub - 1 maintainer
sycl
hipSYCL is an implementation of the SYCL standard programming model over NVIDIA CUDA/AMD HIPLatest release: 22 days ago - 4 dependent packages - 673 stars on GitHub - 1 maintainer
chai 2.4.0
Copy-hiding array interface for data migration between memory spaces13 versions - Latest release: almost 4 years ago - 2 dependent packages - 109 stars on GitHub - 4 maintainers
py-stringzilla 4.2.1
Search, hash, sort, and process strings faster via SWAR and SIMD1 version - Latest release: 5 months ago - 2,903 stars on GitHub
tsne-cuda 3.0.1
tsne-cuda is an optimized CUDA version of FIt-SNE algorithm with associated python modules. Autho...2 versions - Latest release: 4 months ago - 1,881 stars on GitHub - 1 maintainer
babelstream
Measure memory transfer rates to/from global device memory on GPUs. This benchmark is similar in ...Latest release: 22 days ago - 348 stars on GitHub - 3 maintainers
mpibind 0.8.0
A portable runtime library that automatically maps parallel applications to heterogeneous hardwar...4 versions - Latest release: almost 4 years ago - 46 stars on GitHub - 1 maintainer
tiled-mm
Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to bo...Latest release: 22 days ago - 17 stars on GitHub - 3 maintainers
elbencho 2.0-7
Elbencho storage benchmark8 versions - Latest release: over 2 years ago - 234 stars on GitHub - 1 maintainer
bohrium 0.9.1
Library for automatic acceleration of array operations3 versions - Latest release: almost 4 years ago - 218 stars on GitHub - 1 maintainer
rpp 7.1.1
Radeon Performance Primitives (RPP) library is a comprehensive high- performance computer vision ...28 versions - Latest release: 23 days ago - 1 dependent package - 66 stars on GitHub - 2 maintainers
py-heat 1.6.0
Heat is a flexible and seamless open-source software for high performance data analytics and mach...8 versions - Latest release: 2 months ago - 158 stars on GitHub - 3 maintainers
sirius 7.3.0
Domain specific library for electronic structure calculations30 versions - Latest release: almost 4 years ago - 2 dependent packages - 91 stars on GitHub - 5 maintainers
rocm-tensile 7.1.1
Radeon Open Compute Tensile library49 versions - Latest release: about 2 months ago - 173 stars on GitHub - 4 maintainers
Related Keywords
cuda
22
hpc
12
rocm
10
hip
10
cpp
8
high-performance-computing
7
mpi
7
gpu-computing
6
opencl
6
python
6
amd
6
parallel
6
parallel-computing
5
nvidia
5
machine-learning
4
deep-learning
4
openmp
4
gpgpu
4
portability
4
gpu-acceleration
3
radiuss
3
distributed
3
cfd
3
sycl
3
linux
3
pytorch
3
fortran
2
fft
2
blas
2
high-performance
2
nvidia-cuda
2
clang
2
memory-management
2
random
2
blt
2
benchmark
2
kokkos
2
turbulence
2
numpy
2
jax
2
tensors
2
raja
2
openacc
2
matrix-multiplication
2
monitoring
2
high-order
2
parallelism
2
tsne-cuda
1
search
1
data-abstraction
1
memory-bandwidth
1
parallel-processing
1
mpibind
1
performance
1
productivity
1
scientific-computing
1
system-software
1
cublas
1
cublasxt
1
fp8
1
simd
1
sorting-algorithms
1
parser
1
levenshtein-distance
1
string
1
information-retrieval
1
string-manipulation
1
string-matching
1
string-parsing
1
string-search
1
substring
1
barnes-hut
1
hashing
1
barnes-hut-tsne
1
hash
1
data-analysis
1
edit-distance
1
data-visualization
1
fit-tsne
1
mnist
1
multithreading
1
tsne
1
tsne-algorithm
1
dataset
1
matmul
1
mivisionx
1
openvx
1
radeon-performance-primitives
1
rpp
1
warp-affine
1
analytics
1
data
1
density-functional-theory
1
electronic-structure-calculations
1
full-potential
1
lapw
1
planewave
1
pseudopotential
1
assembly
1
auto-tuning
1