Phase 2 — Foundation (cost attribution, baselines, programme visibility) ships next. Phase 3 — Full (drift detection, fleet intelligence, air-gapped distribution) ships later. The Phase 1 governed runtime — registry, identity, lifecycle authority, audit-grade evidence, and Tier A deployment — is deployable today with design partners.
Agents are in production. Now you need to know if they're working.
Getting agents to production requires solving five things: runtime control, execution identity, lifecycle authority, audit-grade evidence, and a deployment model that works inside your environment. Those are the production problems.
Once agents are in production, a different set of questions emerges. These aren't questions about whether an agent is safe to deploy. They're questions about whether the estate is working.
- Which agents are degrading — and do you know before a customer tells you?
- Which agents are delivering measurable value — and can you show that to leadership?
- What are agents costing per customer, per product, per team — and who is accountable for that spend?
Generic monitoring can show traces and latency. It usually can't answer these questions, because it doesn't know the agent's declared purpose, the policy that was in effect, the customer it was acting for, or what outcome it was supposed to produce. Without that context, telemetry doesn't become intelligence.
Estate intelligence is the layer that connects identity, purpose, policy, and outcome — and produces signals that engineering managers, CTOs, and CFOs can act on.
You find out agents are degrading when customers tell you.
A background alert enrichment agent has been running in production for six weeks. It was performing well at launch. No code changes since then. No deployment events. No incidents in the queue.
Three weeks ago, its goal achievement rate started dropping — from 91% to 84% to 71%. Its average cost per run is up 40% versus the 30-day baseline. Its tool call sequence has quietly reversed: it's calling search.web before db.query when it previously went the other way.
None of these changes required a code change. They happened because an underlying model shifted, a tool response schema evolved, and the distribution of alerts sent to the agent changed. None of them triggered an alert in CloudWatch. None of them appeared in the traces your engineering team monitors.
You found out when a customer escalated.
What estate intelligence surfaces
search.web before db.query where it previously went the other wayThe engineering manager sees the drift 18 days before the customer escalation. They investigate the tool schema change, pin the model version, and restore the previous tool call order. The agent is back to baseline before any customer notices.
You've made a strategic bet on agent infrastructure. You need to know if it's paying off.
Twelve months ago the CTO made a decision: invest in agent infrastructure across the product. Six teams are now running agents in production. Eighteen agents are active. The infrastructure investment was significant.
The board is asking two questions: which agents are delivering value, and is the estate trending in the right direction?
The CTO can't answer either question confidently. Each team has its own metrics. Some track goal completion. Some track latency. Some track cost. None of it is comparable. There's no fleet-level view. There's no way to see which agents are underutilised, which are overloaded, which are showing correlated degradation that might point to a shared dependency failing.
The CTO's quarterly review is in three weeks.
What estate intelligence surfaces
The CTO walks into the board meeting with a fleet view: three agents are exceeding goal achievement targets, two are underperforming with a known root cause being addressed, one is underutilised and a candidate for consolidation. The estate is trending positively. The infrastructure investment is measurable.
Agent compute and model costs are appearing on the bill. You can't attribute them to anything.
Agent infrastructure costs are growing. Model API calls, compute, storage, retrieval — the line items are real and they're increasing. The CFO is being asked to approve the next phase of investment.
The problem: there's no cost attribution model. The engineering team reports total agent infrastructure spend. There's no breakdown by agent, by product module, by customer, or by business outcome. There's no way to know whether the spend is justified by value delivered. There's no chargeback mechanism for BYOC customers or large enterprise accounts that expect cost transparency.
The CFO is being asked to approve spend on something that, from a finance perspective, is a black box.
What estate intelligence surfaces
The CFO approves the next investment phase with a cost model that makes sense: cost per agent, cost per customer, cost per outcome, and a mechanism for chargeback where required. The conversation shifts from "how much are we spending on agents?" to "what is the cost per successful investigation, per enriched alert, per resolved ticket?"
For enterprise customers who expect cost transparency as part of their contract, the per-tenant cost breakdown is also a customer-facing capability — part of what you show them in their account dashboard, not just an internal finance report.
The plane's unique vantage.
Generic monitoring produces telemetry anchored to infrastructure. Estate intelligence is anchored to declared purpose — and that's what makes it different.
Every agent registered with cogward declares its identity, its autonomy class, its declared goal, its approved tool set, its policy scope, and its expected outcome. Every run produces a record that carries all of that context alongside the execution data. Every cost, every tool call, every latency measurement, every policy decision is attributed to a specific agent acting for a specific tenant under a specific declared purpose.
That combination — identity, purpose, policy, outcome, and cost together — is what makes the intelligence semantically meaningful rather than statistically aggregated.
Estate intelligence operates on metadata and behavioural signals — not on content payloads. Tool call sequences are tracked as structured metadata: which tool was called, in which order, at what latency. Goal achievement is calculated from declared goal types and structured outcome signals registered in the agent manifest — not from the content of agent responses. Cost and latency are infrastructure metrics. Policy escalation rates are counts of enforcement events. No estate intelligence signal requires inspecting or storing content payloads inline. The privacy-preserving architecture and the intelligence layer are complementary, not in tension: the plane is blind to what the content was, and fully informed about what happened structurally.
Per-customer baselines. No cross-tenant pooling.
Estate intelligence is per-customer and self-compounding. Every production run adds to that customer's own behavioural baseline. The intelligence improves as the customer's estate grows, without any data leaving the customer's environment and without any cross-tenant data sharing.
For customers in regulated environments — financial services, healthcare, sovereign deployment — this is not a compromise. It is the correct architecture. The customer's data never crosses a tenant boundary to produce intelligence about their estate.
Air-gapped intelligence distribution.
For Tier B and Tier C customers operating in fully on-premise or air-gapped environments, improvements derived from the broader deployment base — refined drift detection thresholds, updated anomaly patterns, cost benchmark curves for common agent types — are distributed as pre-trained behavioural models. Same delivery channel as software releases. For air-gapped deployments: signed physical media, cryptographically verified on receipt.
The CFO at a sovereign deployment gets cost benchmarks. The engineering manager at an air-gapped financial services firm gets drift detection baselines. The intelligence flows one way — from cogward's research environment to the customer's plane — with no telemetry leaving the customer perimeter.
What the plane sees that monitoring tools don't
| Signal | Generic monitoring | Estate intelligence |
|---|---|---|
| Cost per run trending up | Visible as infrastructure spend | Attributed to specific agent and tenant, against 30-day baseline |
| Goal achievement rate dropping | Not visible | Per-agent, per-programme, over time |
| Tool call sequence changed | Not visible | Detected as behavioural drift signal |
| Agent underutilised | Not visible | Fleet utilisation pattern |
| Cost per outcome | Not visible | Cost correlated with declared goal achievement |
| Tenant-level cost breakdown | Requires manual tagging | From registration metadata — no tagging required |
| Drift without code change | Not visible | Detected against per-customer baseline |
| Correlated degradation across agents | Requires manual correlation | Fleet pattern intelligence across estate |
Available in two phases.
Task status and programme visibility. Cost and latency trend baselines per agent and per tenant. Goal achievement rate tracking. Per-team and per-product cost attribution. Chargeback export to SAP and Oracle. Estate activity dashboard.
Behavioural drift detection. Fleet pattern intelligence. Goal achievement trend analysis. Cost anomaly detection. Air-gapped intelligence distribution via signed physical media.
Phase 2 establishes the baselines that make Phase 3's drift detection possible. You cannot detect drift without a baseline to drift from. The foundation phase is not a reduced version of estate intelligence — it is the prerequisite for it.
If agents are already in production, or will be soon, this is the conversation worth having.
The briefing covers the full five-pillar plane, but we can focus specifically on cost attribution, drift detection, and fleet intelligence if that's where the most immediate pain is.
How the five pillars make estate intelligence possible →
How audit-grade evidence and estate intelligence map to DORA, SOC 2, and EU AI Act →