Overview

Stilo AI Server

Welcome to the Stilo Solutions AI Server. This server provides a shared AI infrastructure for the team — local LLM inference, AI agent crews, vector memory, and a management UI.

Quick access

ServiceURLWhat it's for
Chatchat.stilosolutions.com (opens in a new tab)Talk to AI models via Open WebUI
APIapi.stilosolutions.com (opens in a new tab)LiteLLM proxy — OpenAI-compatible endpoint
Tracestraces.stilosolutions.com (opens in a new tab)Langfuse — agent run tracing
Dockerdocker.stilosolutions.com (opens in a new tab)Portainer — container management

What's running

  • Local LLM inference — Qwen3-Coder-Next-FP8 (coding) and Qwen3.6-27B-FP8 (general), served via vLLM on a 96GB NVIDIA RTX PRO 6000 Blackwell GPU
  • Cloud fallback — Anthropic Claude (Sonnet, Opus, Haiku) when local models aren't enough
  • AI crews — CrewAI-based agent teams that research, write, scrape, and execute code
  • Memory — Qdrant (vector search) + Postgres (structured data) + Obsidian vault per user (synced via Syncthing)

New to the server?

Start with New User Setup to get your account, API key, Obsidian vault, and first crew running.