Stilo AI Server

Welcome to the Stilo Solutions AI Server. This server provides a shared AI infrastructure for the team — local LLM inference, AI agent crews, vector memory, and a management UI.

Quick access

Service	URL	What it's for
Chat	chat.stilosolutions.com (opens in a new tab)	Talk to AI models via Open WebUI
API	api.stilosolutions.com (opens in a new tab)	LiteLLM proxy — OpenAI-compatible endpoint
Traces	traces.stilosolutions.com (opens in a new tab)	Langfuse — agent run tracing
Docker	docker.stilosolutions.com (opens in a new tab)	Portainer — container management

What's running

Local LLM inference — Qwen3-Coder-Next-FP8 (coding) and Qwen3.6-27B-FP8 (general), served via vLLM on a 96GB NVIDIA RTX PRO 6000 Blackwell GPU
Cloud fallback — Anthropic Claude (Sonnet, Opus, Haiku) when local models aren't enough
AI crews — CrewAI-based agent teams that research, write, scrape, and execute code
Memory — Qdrant (vector search) + Postgres (structured data) + Obsidian vault per user (synced via Syncthing)

New to the server?

Start with New User Setup to get your account, API key, Obsidian vault, and first crew running.

New User Setup