obleth
Fairshare admission for shared AI capacity.
Put one Rust gateway between clients and your OpenAI-compatible backends. obleth resolves tenant identity, admits work by weighted share when GPUs are full, and records token-accurate usage for the teams sharing the cluster.
hot path
auth -> budget -> scheduler -> upstream
under load
queue by weighted share, not arrival time
ledger
tokens, wait time, model route, tenant
8-step
Hot path
Authenticate, budget, cache, schedule, proxy, stream, reconcile, and record every request
OpenAI
Compatible API
Chat, embeddings, images, audio, MCP, and model:auto routing behind one authenticated surface
Weights
Fair under load
Production, research, and sandbox traffic keep their configured share when the pool saturates
Live
Operator console
Tenants, keys, models, queue depth, model health, and usage are visible without shell-diving
Control plane
The gateway is visible
The dashboard is not decoration. It is where operators see capacity, queue pressure, model health, tenants, and keys while the data plane keeps serving traffic.
Scheduler pressure, route health, and tenant load update from the same control-plane concepts documented here.
Dashboard guide
Admission path
Every request earns its place
Cache hits return before fairshare. Token budget exhaustion is a hard stop. Saturation is different: requests wait until their tenant deserves the next slot.
Identify the tenant
Bearer keys are hashed, cached, and resolved into tenant context before any upstream work starts.
Reserve the budget
Redis-backed token buckets enforce per-tenant TPM and term budgets across every gateway pod.
Earn a slot
When capacity is tight, the scheduler admits whichever tenant is most behind its fair share.
Reconcile reality
Actual input and output tokens land in ClickHouse with wait time, admission class, and model route.
Live scheduler
Fairshare you can see
This is an illustrative scheduler lens: slots, queues, share_score, and target share move together so the fairness model is tangible.
Live admission console
Queue pressure becomes weighted admission, held permits, and reconciled usage rows.
queue
12 waiting
pick
api-batch
pool
64/64
ledger
ClickHouse
Incoming demand
Fairshare picker
next permit
api-batch
Waiting tenants are ranked by served tokens divided by weight. Lowest score gets the next released slot.
Active pool
Held request permits by tenant.
64/64
utilization 100%
Usage ledger
req_84f2
chatbot
1.8k tok
12ms
req_0ac9
api-batch
7.2k tok
1.4s
req_91be
analytics
3.1k tok
34ms
After streaming completes, actual tokens and wait time are reconciled into ClickHouse for dashboards and rollups.
Slot utilization
100%
Queue depth
12
Entitlement gap
0
What obleth adds
The layer inference backends skip
vLLM and Aibrix are excellent at serving models. obleth handles the multi-tenant policy layer they deliberately leave out.
Weighted admission
Hierarchical mode partitions global in-flight slots by group, then splits within the group. Weighted mode competes globally on share_score. Both are starvation-free.
Capacity-aware routing
Send model:auto and obleth can choose by capacity, health, price, tags, and an optional intent classifier across registered providers.
Budgets that mean it
Per-tenant TPM, in-flight caps, model allowlists, and lifetime or monthly spend budgets stop overload without hiding why a request was held.
Operate the hot path
Create tenants, rotate keys, tune model slots, watch health, and inspect scheduler pressure from a dashboard backed by Postgres, Redis, and ClickHouse.
Operations
Built for shared clusters
Tune priority live
Raise a tenant's weight during an incident and every gateway pod honors it on the next request.
Find the model knee
Run capacity autotune probes and apply recommended model slot caps when you are ready.
Compose with Aibrix
Let Aibrix or vLLM handle replica execution while obleth owns tenant policy before routing.
Deploy
Bring a real gateway online fast.
Docker Compose gives you the data plane, dashboard, Postgres, Redis, ClickHouse, HAProxy, Prometheus, Grafana, and a benchmark backend. Helm charts are ready when the same shape moves to Kubernetes.
Stack
Compose
Gateway, dashboard, edge, and datastores
Ops
Grafana
Pre-wired Prometheus dashboards
K8s
Helm
Published chart and overrideable values
Start
~5 min
First tenant, key, and chat request