Question 1

How does TokenGuard integrate with our existing LLM setup?

Accepted Answer

TokenGuard deploys as a transparent gateway between your users and agents and the LLM provider APIs. If you are currently calling OpenAI, Anthropic, Google, Azure OpenAI, or Mistral directly, you point your SDK base URL at the TokenGuard gateway — one configuration change, no application logic changes. TokenGuard issues per-user or per-agent synthetic API keys that route through the gateway; the gateway translates to your real provider credentials, which are stored in a secrets vault and never exposed to end users or agents. For LangChain, LlamaIndex, and other orchestration frameworks, we provide pre-built callback handlers and middleware that require no changes to agent logic.

Question 2

What happens when a team exceeds their budget?

Accepted Answer

The enforcement action is configurable per budget policy: Block (the LLM call returns an error), Throttle (the call is queued and rate-limited), or Alert (the call goes through but the budget owner and administrator are notified immediately). Most teams start with Alert on first breach and escalate to Block after a review period. Budget violations produce an immutable audit log entry with the full context of the blocked or alerted call — who made it, which model, how many tokens were estimated at request time, and the actual cost recorded after the response. Budget resets are explicit administrative actions, not automatic — preventing end-of-month budget renewal surprises.

Question 3

How does the agentic loop circuit breaker work?

Accepted Answer

The circuit breaker monitors rate-of-spend rather than cumulative spend — catching runaway loops before they exhaust a monthly budget in hours. If a user or agent key shows token consumption exceeding a configurable multiple of its rolling baseline (default: 5x the 7-day rolling average within a 15-minute window), the circuit breaker fires: the key is suspended, an alert is sent to the administrator, and a human review is required to reinstate. This catches the failure pattern of agentic loops where one misconfigured agent can consume $50,000–$200,000 in tokens before a monthly budget check would have caught it. The circuit breaker is per-key, so a single runaway agent does not affect other users or teams.

Question 4

How does personal-use detection work without reading prompt content?

Accepted Answer

TokenGuard uses probabilistic signal analysis rather than content reading. Signals include: time of request (outside business hours), request volume pattern (single-user sustained high-volume sessions characteristic of personal projects), model selection (higher capability models with no business justification on record), session metadata (request diversity patterns inconsistent with role-based use cases), and volume baseline deviation. These signals are combined into an anomaly score from 0 to 100. Scores above 70 are flagged for human review — they are never auto-blocked, because the signals are probabilistic, not deterministic. A reviewer sees the supporting signals and decides: dismiss or confirm. This approach avoids content analysis, preserving user privacy while still detecting systematic misuse.

Question 5

What does the chargeback report include?

Accepted Answer

Monthly chargeback reports are generated per billing period and include: total cost by user, by team, by cost centre, by model, and by provider; request counts and average cost per request; month-over-month trend; top consumers ranked by spend; and a detailed event log available for download as CSV or via API. Reports are available in the TokenGuard dashboard and can be scheduled for automated delivery to finance and team leads. The cost attribution model is configurable — organisations can attribute by the API key used, by the user identity propagated in request headers, or by the project or cost centre tag set in the request metadata.

Question 6

Can TokenGuard be deployed on-premise for regulated industries?

Accepted Answer

Yes. TokenGuard is available as a fully self-hosted deployment for organisations in regulated industries that cannot route LLM call metadata through a cloud-hosted governance product. The on-premise deployment runs on standard Linux infrastructure, requires no external service dependencies at runtime, and stores all audit logs, budget policies, and anomaly events within your environment. Deployment is supported on Kubernetes, Docker Compose, or bare-metal Linux. For organisations already using GDPR Oversight, the two products share an infrastructure layer — reducing deployment footprint when both governance capabilities are needed.

Capability	What it does	Enforcement layer
Per-request ceiling	Block requests whose estimated token cost exceeds a configurable ceiling per call	Layer 1 — per request
Per-session budget	Track rolling token spend within a session and enforce a session-level budget cap	Layer 2 — per session
Monthly key cap	Enforce a hard monthly spend limit per API key, with configurable alert thresholds (80%, 95%, 100%)	Layer 3 — per key / per month
Model-tier routing	Redirect requests to a lower-cost model tier when budget thresholds are approached, without blocking	Layer 4 — model routing
Circuit breaker	Suspend keys showing anomalous rate-of-spend vs. rolling baseline — catching runaway agent loops before they exhaust monthly budgets	Layer 5 — rate anomaly
Personal-use detection	Probabilistic anomaly scoring for personal-use misuse patterns; flags for human review, never auto-blocks	Async — human review queue
Cost attribution	Per-user, per-agent, per-team, per-cost-centre attribution of actual token costs after provider response	Audit & chargeback
Chargeback reports	Monthly spend reports by team and cost centre, schedulable delivery, CSV export and API access	Finance & accountability

Your AI bill is not a surprise
you should discover at month end.

Four ways AI spend gets out of control

Agentic loops consume 10–20x expected tokens

Employees use corporate AI keys for personal projects

No team-level visibility means no accountability

Budget enforcement happens outside agent code — or not at all

Five enforcement layers, one gateway pass

Attribute

Budget check

Circuit breaker

Anomaly score

Log & attribute cost

What TokenGuard delivers

Three scenarios TokenGuard was built for

Agentic loop consumes $40K in a weekend

30% of LLM spend traced to personal-use misuse

CFO requests team-level AI spend accountability

Common questions

Ready to put a governance layer on your LLM spend?

Your AI bill is not a surpriseyou should discover at month end.