top of page
What is GPULlama3.java?
GPU-accelerated Llama3 inference in pure Java using TornadoVM.
Llama3 models written in native Java automatically accelerated on GPUs with TornadoVM. Runs Llama3 inference efficiently using TornadoVM's GPU acceleration.
Currently, it supports Llama3, Mistral, Qwen2.5, Qwen3 and Phi3 models in the GGUF format.

Running on a RTX 5090 with nvtop on bottom to track GPU utilization and memory usage

bottom of page