Stilo AI Server
Welcome to the Stilo Solutions AI Server. This server provides a shared AI infrastructure for the team — local LLM inference, AI agent crews, vector memory, and a management UI.
Quick access
| Service | URL | What it's for |
|---|---|---|
| Chat | chat.stilosolutions.com (opens in a new tab) | Talk to AI models via Open WebUI |
| API | api.stilosolutions.com (opens in a new tab) | LiteLLM proxy — OpenAI-compatible endpoint |
| Traces | traces.stilosolutions.com (opens in a new tab) | Langfuse — agent run tracing |
| Docker | docker.stilosolutions.com (opens in a new tab) | Portainer — container management |
What's running
- Local LLM inference — Qwen3-Coder-Next-FP8 (coding) and Qwen3.6-27B-FP8 (general), served via vLLM on a 96GB NVIDIA RTX PRO 6000 Blackwell GPU
- Cloud fallback — Anthropic Claude (Sonnet, Opus, Haiku) when local models aren't enough
- AI crews — CrewAI-based agent teams that research, write, scrape, and execute code
- Memory — Qdrant (vector search) + Postgres (structured data) + Obsidian vault per user (synced via Syncthing)
New to the server?
Start with New User Setup to get your account, API key, Obsidian vault, and first crew running.