====== ๐ง Llama.cpp + OpenCL (RX 6600 XT on Ubuntu 24.04) ======
=== ๐
Summary ===
* **GPU**: AMD RX 6600 XT
* **Runtime**: Mesa Rusticl (no ROCm needed)
* **Model Format**: .gguf (quantized)
* **Server**: llama-server HTTP + Web UI
* **Client**: Web browser or PowerShell Invoke-RestMethod
----
=== ๐งฐ Step-by-step Procedure ===
Based on this cmd history:
4 sudo apt install rocm-opencl-dev
5 mkdir rocm
6 cd rocm
7 wget https://repo.radeon.com/amdgpu-install/6.3.3/ubuntu/noble/amdgpu-install_6.3.60303-1_all.deb
8 sudo apt install ./amdgpu-install_6.3.60303-1_all.deb
11 sudo reboot
16 cd rocm/
18 sudo apt update
19 sudo apt full-upgrade
20 sudo apt install rocm-opencl-dev
22 sudo apt install clinfo
24 clinfo
31 sudo apt install mesa-opencl-icd clinfo
32 sudo nano /etc/drirc
33 sudo reboot
35 sudo clinfo | grep 'Device Name'
36 mkdir oclLlama
37 cd oclLlama/
38 git clone https://github.com/ggerganov/llama.cpp
39 cd llama.cpp
46 make LLAMA_OPENCL=1
47 sudo apt install cmake build-essential libclblast-dev ocl-icd-opencl-dev
48 mkdir build
49 cd bould
50 cd build/
51 cmake .. -DLLAMA_CLBLAST=on
52 sudo apt install curl
53 sudo apt install libcurl4-openssl-dev
54 cmake .. -DLLAMA_CLBLAST=on
55 cmake --build . --config Release -j$(nproc)
59 ls -lh /home/facundo/oclLlama/llama.cpp/models/*.gguf
60 cd ~/oclLlama/llama.cpp/models
68 ls *.gguf
69 ./bin/llama-server --model /home/facundo/oclLlama/llama.cpp/models/phi-2.Q4_K_M.gguf --host 0.0.0.0 --port 11434 --n-gpu-layers 100
70 ls ..
71 cd ..
72 ls
73 cd build/
here I uploaded the .gguf from my laptop
74 ./bin/llama-server --model /home/facundo/oclLlama/llama.cpp/models/phi-2.Q4_K_M.gguf --host 0.0.0.0 --port 11434 --n-gpu-layers 100
Gepetto said:
# 1. Optional: ROCm was tried (not strictly needed with Rusticl)
sudo apt install rocm-opencl-dev
# 2. Mesa drivers + OpenCL ICD loader
sudo apt install mesa-opencl-icd clinfo
# 3. Confirm GPU visibility
clinfo | grep 'Device Name'
# โ Should list RX 6600 XT
# 4. Get the source
mkdir -p ~/oclLlama
cd ~/oclLlama
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
# 5. Install build deps
sudo apt install cmake build-essential \
libclblast-dev ocl-icd-opencl-dev \
libcurl4-openssl-dev
# 6. Build with OpenCL
mkdir build && cd build
cmake .. -DLLAMA_CLBLAST=on -DCMAKE_BUILD_TYPE=Release
cmake --build . --config Release -j$(nproc)
# 7. Download or upload a model
mkdir -p ~/oclLlama/llama.cpp/models
# (SCP from your laptop or download a .gguf file to /models/ dir)
# 8. Run the API server
./bin/llama-server \
--model ~/oclLlama/llama.cpp/models/phi-2.Q4_K_M.gguf \
--host 0.0.0.0 \
--port 11434 \
--n-gpu-layers 100
----
=== ๐ Test Access from Browser ===
* http://ryzen-ubuntu.facundoitest.space:11434
* โ
Built-in Web UI opens
----
=== ๐งช Test from PowerShell ===
Invoke-RestMethod -Uri "http://ryzen-ubuntu.facundoitest.space:11434/v1/completions" `
-Method Post `
-ContentType "application/json" `
-Body '{
"model": "phi-2.Q4_K_M.gguf",
"prompt": "OpenCL advantages?",
"max_tokens": 64
}'
----
=== โ
Confirm GPU Usage ===
radeontop # real-time GPU load
strings ./bin/main | grep -i clblast
ldd ./bin/main | grep -i opencl
----
=== ๐งน Optional Cleanup ===
sudo apt purge rocm-opencl-dev amdgpu-install