GPULlama3.java | GPU Acceleration for Java AI

What is GPULlama3.java?

GPU-accelerated Llama3 inference in pure Java using TornadoVM.

Llama3 models written in native Java automatically accelerated on GPUs with TornadoVM. Runs Llama3 inference efficiently using TornadoVM for GPU acceleration.
Currently, it supports Llama3, Mistral, Qwen2.5, Qwen3 and Phi3 models in the GGUF format.

Get started

LangChain4j Integration

Starting from LangChain4j v1.7.1, GPULlama3.java is officially supported as a model provider.
This means you can use GPULlama3.java directly inside your LangChain4j applications - with no custom glue code, and with full GPU acceleration through TornadoVM.

If you're building Java LLM applications, agent workflows, or RAG pipelines, GPULlama3.java plugs in seamlessly as a first-class LangChain4j model.

Code Example in LangChain4j

Use this model anywhere a ChatLanguageModel is accepted:
agentic workflows, tool-based agents, retrieval pipelines, chat endpoints, or LangChain4j AI Services.

LangChain4J docs

Integrating GPULlama3.java in your project

For Maven projects (pom.xml)

Integrating LangChain4J in your project

For Maven projects (pom.xml)

For Gradle projects (build.gradle)

Running on a RTX 5090 with nvtop on bottom to track GPU utilization and memory usage

What is GPULlama3.java?

LangChain4j Integration

Code Example in LangChain4j

Integrating GPULlama3.java in your project

For Maven projects (pom.xml)

Integrating LangChain4J in your project

For Maven projects (pom.xml)

For Gradle projects (build.gradle)

Running on a RTX 5090 with nvtop on bottom to track GPU utilization and memory usage

Interested in production deployments?