This is an old revision of the document!

🧠 Llama.cpp + OpenCL (RX 6600 XT on Ubuntu 24.04)

📅 Summary

GPU: AMD RX 6600 XT
Runtime: Mesa Rusticl (no ROCm needed)
Model Format: .gguf (quantized)
Server: llama-server HTTP + Web UI
Client: Web browser or PowerShell Invoke-RestMethod

—

🧰 Step-by-step Procedure

# 1. Optional: ROCm was tried (not strictly needed with Rusticl)
sudo apt install rocm-opencl-dev
 
# 2. Mesa drivers + OpenCL ICD loader
sudo apt install mesa-opencl-icd clinfo
 
# 3. Confirm GPU visibility
clinfo | grep 'Device Name'
# → Should list RX 6600 XT
 
# 4. Get the source
mkdir -p ~/oclLlama
cd ~/oclLlama
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
 
# 5. Install build deps
sudo apt install cmake build-essential \
  libclblast-dev ocl-icd-opencl-dev \
  libcurl4-openssl-dev
 
# 6. Build with OpenCL
mkdir build && cd build
cmake .. -DLLAMA_CLBLAST=on -DCMAKE_BUILD_TYPE=Release
cmake --build . --config Release -j$(nproc)
 
# 7. Download or upload a model
mkdir -p ~/oclLlama/llama.cpp/models
# (Copy from your laptop or download a .gguf file)
 
# 8. Run the API server
./bin/llama-server \
  --model ~/oclLlama/llama.cpp/models/phi-2.Q4_K_M.gguf \
  --host 0.0.0.0 \
  --port 11434 \
  --n-gpu-layers 100

—

🌍 Test Access from Browser

http://ryzen-ubuntu.facundoitest.space:11434
✅ Built-in Web UI opens

—

🧪 Test from PowerShell

Invoke-RestMethod -Uri "http://ryzen-ubuntu.facundoitest.space:11434/v1/completions" `
  -Method Post `
  -ContentType "application/json" `
  -Body '{
    "model": "phi-2.Q4_K_M.gguf",
    "prompt": "OpenCL advantages?",
    "max_tokens": 64
  }'

—

✅ Confirm GPU Usage

radeontop      # real-time GPU load
strings ./bin/main | grep -i clblast
ldd ./bin/main | grep -i opencl

—

🧹 Optional Cleanup

sudo apt purge rocm-opencl-dev amdgpu-install