Streamlining AI | My Journey into LLMs

Since there’s not a single unique experience these days, I’ve too have been interested in LLMs. Unfortunately, I’m not a huge fan of sharing my thoughts with a third party provider. I had some limited interactions with ChatGPT, checking in on the abilities I heard about and seeing how at my risk my job might be (not very) but prior to Deepseek-R1’s release I had limited motivation to explore a local LLM. I understood it was likely to be expensive and yield a much poorer experience than the commercial offerings.

Enter Holocron

After hearing reports of successful DeepSeek-R1 models running on systems with as little as a 1070 GPU, I decided to put together an AI focused server, named fittingly Holocron for my homelab. I had a spare 2060 in my media center PC and happened to find a 3060 for $225 so I decided it was time.

Holocron’s specs:

CPU: Ryzen 5 5600x
RAM: 32GB DDR4
GPU: 3060 12GB VRAM

I will say, you are going to want to hardwire any AI box you have, especially early in your exploration. I ended up downloading around 800GB of models before I settled on a core group of models that take up collectively 212GB. I suspect the most traffic I’ll be using beyond model updates are for the live-time vocal conversations.

blah… blah… blah (Links at bottom of post)

Oogabooga vs Ollama

It’s a real topic and I don’t know that I’m happy about it. The rundown between the approaches is, Oogabooga is for the tinkerer who is not interested in the actual chat experience. Oogabooga is *fantastic* if you want to learn what each model and framework does and how all of the valves affect the output. I’ll… probably want to do that, eventually? The refined models provided through Ollama have similar levels of tweakability, it’s simply not the main focus of the program and is a bit more cumbersome to work with if that’s your main focus.

I’ve opted to go with Ollama, access through Open-WebUI which strives to replicate the interface of ChatGPT and provide a full suite of mature AI pipelines. It’s attempting to provide that community, it boasts a large collection of functions and tools made by the community along with agents and prompts to go along with them. I haven’t explored this facet has much, I’ve been much more interested in the OpenAI API endpoint it exposes and the AI based tooling I can connect to it.

VSCode / CoPilot

Since Open-WebUI/Ollama expose an OpenAI compatible endpoint, you can use tools like vscode-openai with a user-provided base URL. You can choose the model from VSCode or hardcode it in Ollama for the API to ensure you’re using exactly what you are expecting.

I have the luxury of running a full fledge model on dedicated hardware, so I’m able to run a 14b model with code assist while doing dev work from my main desktop. There are smaller models available in the 7b range which are useful for code completion but start to fail in the advice and bug fixing arenas.

Stable Diffusion / Images

Another feature of Open-WebUI that sold me, was the ability to generate and display images natively in the Chat UI. You have the option to prompt it to create an image or to generate an image after the fact from it’s response.

There seemed to be one option if I wanted to explore more than just Stable Diffusion models. I had heard of new models such as Flux and didn’t want to invest my time into learning a system that wouldn’t be compatible in the future.

I ended up landing on ComfyUI and I’m really glad I did. It uses a node based approach to image generation, you can even remove the AI image generation entirely and utilize AI models for photo touch up and manipulation. I’ve been trying to make AI approachable to my creative friends who may be weary of AI ruining their medium. I highlighted the AI’s ability to decern lighting variations in a room and equalize them in much higher precision than a human would.

ComfyUI utilizes a workflow concept, each image generation method is it’s own workflow allowing you to copy/paste nodes between workflows and share your flows with the community effortlessly. You are able to import/export using JSON and the formatting is widely supported by the AI tooling ecosystem.

I’ll automate your n8n?

I was looking for a way to orchestrate all of these moving pieces and provide a structured process for my AI agents to interact with the world. Turns out n8n is literally meant for this purpose. It also makes use of a node based design and allows you to integrate with hundreds of 3rd party services, while also having raw tools for you to script your own functions and automations.

The use case I decided on to learn n8n was an automatic MR review bot, there was an existing workflow I was able to import and learn from. I was able to hook the pipeline up to my private Gitlab instance with an auth token and whenever I comment “+Holocron” on my MR, the qwen2.5-coder model is loaded, provided with the codebase and the git diff and is enabled to write comments on my MRs, providing feedback and insight. I’ll admit, a lot of the complaints are syntax or code hygiene complaints but I also haven’t given it a newer piece of code yet. I look forward to utilizing the Code Assistant and this MR bot to improve my self-learning feedback cycle.

Technical Links

Model Links (Ollama)

Note: 14b models take somewhere between 8-10GB of VRAM, where applicable I try to use 14b, some models only come in 7b which is 4-6GB of RAM typically.

Applications

Open-WebUI
- In order to expose Ollama, you need to add OLLAMA_HOST="0.0.0.0" to your docker environment, additionally expose port 11434
n8n
- A few environment variables should be set, but they will depend on your environment. The big ones are N8N_SECURE_COOKIE, N8N_HOST, and WEBHOOK_URL
ComfyUI
AllTalk (you must build locally)

Utilities/Extensions

vscode-openai
- Allows you to set a base URL which points to Ollama OpenAI API
n8n workflow – MR Review