This post documents what worked for me to run an Ollama in WSL on Windows, while querying it from another machine using Open WebUI.
This yields a ChatGPT-like service that runs privately and is still usable on underpowered clients.
~/.wslconfig
has the following contents:
[ws12]
networkingMode = mirrored
wsl --shutdown
New-NetFirewallRule -DisplayName "Allow Ollama" -Direction Inbound -Action Allow -Protocol TCP -LocalPort 11434
ubuntu
172.16.*.*
):
ifconfig
For example, my local IP was 192.168.1.204
.
ollama
according to instructions
curl -fsSL https://ollama.com/install.sh | sh
OLLAMA_HOST
to listen to all network interfaces:
export OLLAMA_HOST="0.0.0.0:11434"
ollama
in the background:
ollama serve &
ollama
is truly listening on the NIC instead of localhost:
curl http://192.168.1.204/api/tags
/app/backend/data
export DATA_PATH="/app/backend/data"
export OLLAMA_BASE_URL="http://192.168.1.204:11434"
docker run -d -p 3000:8080 -e OLLAMA_BASE_URL -v open-webui-data:${DATA_PATH} --name open-webui ghcr.io/open-webui/open-webui:main
docker logs open-webui --follow
http://localhost:3000
That’s everything!
My poor Macbook Air (M1 with 16GB RAM/VRAM) can basically either run an LLM or anything else due to the nature of limited shared memory. This setup offloads the LLM to the greatest GPU of all time, a GTX 1080 Ti (11GB VRAM).
I wanted to try a private LLM for reasons including avoiding contributing to future training data or odd court mandates. Also, it’s fun to try things just because!
WSL allows the best of both Windows (hardware support) and Linux (UNIX-like app setup). Historically, WSL’s NAT traversal and virtual adapters introduced annoyances for a server. I’ve always struggled with port mapping and firewalls on Windows, and no LLM could help me defeat WSL’s NAT setup.
Luckily, WSL now allows a networkingMode = mirrored
setting, which magically rids us of NAT traversal and greatly simplifies setup with only minor caveats.
While I could run Ollama in a Docker container, I find the WSL GPU passthrough pretty seamless, and then I don’t have to deal with orchestrating Ollama or Docker directly on Windows.
As to why I still have a Windows machine in 2025, I don’t really know either.
Power usage is significantly better on ARM Macs.
Additional system power usage while generating a response based on my own power metering:
Also, responses are noticeably lower quality than ChatGPT. I find myself having to clarify things to the LLM more often, which ultimately makes queries slower.
Only for funsies, especially if new hardware is an option.
Honestly, the best new setup is likely a new Mac with tons of extra RAM. M1’s efficiency is otherworldly compared to an 8 year old GPU, and more recent M-series chips are strictly faster still.
…But that’s expensive. This lets me use my old 1080Ti as a modern appliance, and my slowly-aging Macbook Air as the thin client it excels at being—which is good enough for me!