What is GPULlama3.java?
GPU-accelerated Llama3 inference in pure Java using TornadoVM.
Llama3 models written in native Java automatically accelerated on GPUs with TornadoVM. Runs Llama3 inference efficiently using TornadoVM for GPU acceleration.
Currently, it supports Llama3, Mistral, Qwen2.5, Qwen3 and Phi3 models in the GGUF format.


LangChain4j Integration
Starting from LangChain4j v1.7.1, GPULlama3.java is officially supported as a model provider.
This means you can use GPULlama3.java directly inside your LangChain4j applications - with no custom glue code, and with full GPU acceleration through TornadoVM.
​
If you're building Java LLM applications, agent workflows, or RAG pipelines, GPULlama3.java plugs in seamlessly as a first-class LangChain4j model.​
Code Example in LangChain4j
Use this model anywhere a ChatLanguageModel is accepted:
agentic workflows, tool-based agents, retrieval pipelines, chat endpoints, or LangChain4j AI Services.
Integrating GPULlama3.java with LangChain4j in your project

For Maven projects (pom.xml)
For Gradle projects (build.gradle)
Quarkus Integration
Since v1.4.2, Quarkus applications can use GPULlama3.java for locally hosted Llama3 and other compatible models (e.g., Mistral, Qwen3, Phi3) for chat-based inference, leveraging GPU acceleration without requiring native OpenCL or CUDA code.
​
You can use GPULlama3.java seamlessly to build and run your Quarkus application.
Code Example in Quarkus
Integrating GPULlama3.java with Quarkus in your project

For Maven projects (pom.xml)
Integrating GPULlama3.java in your project

For Maven projects (pom.xml)
Running on a RTX 5090 with nvtop on bottom to track GPU utilization and memory usage





