Estate intelligence · Phase 2–3 · For the teams that operate agent estates

Phase 2 — Foundation (cost attribution, baselines, programme visibility) ships next. Phase 3 — Full (drift detection, fleet intelligence, air-gapped distribution) ships later. The Phase 1 governed runtime — registry, identity, lifecycle authority, audit-grade evidence, and Tier A deployment — is deployable today with design partners.

Agents are in production. Now you need to know if they're working.

Getting agents to production requires solving five things: runtime control, execution identity, lifecycle authority, audit-grade evidence, and a deployment model that works inside your environment. Those are the production problems.

Once agents are in production, a different set of questions emerges. These aren't questions about whether an agent is safe to deploy. They're questions about whether the estate is working.

Which agents are degrading — and do you know before a customer tells you?
Which agents are delivering measurable value — and can you show that to leadership?
What are agents costing per customer, per product, per team — and who is accountable for that spend?

Generic monitoring can show traces and latency. It usually can't answer these questions, because it doesn't know the agent's declared purpose, the policy that was in effect, the customer it was acting for, or what outcome it was supposed to produce. Without that context, telemetry doesn't become intelligence.

Estate intelligence is the layer that connects identity, purpose, policy, and outcome — and produces signals that engineering managers, CTOs, and CFOs can act on.

For the engineering manager

You find out agents are degrading when customers tell you.

A background alert enrichment agent has been running in production for six weeks. It was performing well at launch. No code changes since then. No deployment events. No incidents in the queue.

Three weeks ago, its goal achievement rate started dropping — from 91% to 84% to 71%. Its average cost per run is up 40% versus the 30-day baseline. Its tool call sequence has quietly reversed: it's calling search.web before db.query when it previously went the other way.

None of these changes required a code change. They happened because an underlying model shifted, a tool response schema evolved, and the distribution of alerts sent to the agent changed. None of them triggered an alert in CloudWatch. None of them appeared in the traces your engineering team monitors.

You found out when a customer escalated.

What estate intelligence surfaces

Cost per run up 40% versus the 30-day baseline for this agent

Tool call sequence reversed — calling search.web before db.query where it previously went the other way

Latency per task up 60% despite no change in input complexity

Goal achievement rate dropped from 91% to 71% over 21 days with no version change

Policy escalation rate doubled — the agent is hitting edge cases its declared permissions don't cleanly cover

The engineering manager sees the drift 18 days before the customer escalation. They investigate the tool schema change, pin the model version, and restore the previous tool call order. The agent is back to baseline before any customer notices.

The escalation happens, an incident is opened, a postmortem is written, and the root cause is traced back to a schema change that happened three weeks ago with no code change to blame.

For the CTO

You've made a strategic bet on agent infrastructure. You need to know if it's paying off.

Twelve months ago the CTO made a decision: invest in agent infrastructure across the product. Six teams are now running agents in production. Eighteen agents are active. The infrastructure investment was significant.

The board is asking two questions: which agents are delivering value, and is the estate trending in the right direction?

The CTO can't answer either question confidently. Each team has its own metrics. Some track goal completion. Some track latency. Some track cost. None of it is comparable. There's no fleet-level view. There's no way to see which agents are underutilised, which are overloaded, which are showing correlated degradation that might point to a shared dependency failing.

The CTO's quarterly review is in three weeks.

What estate intelligence surfaces

Goal achievement rates by agent and by programme — which agents are achieving their declared outcomes, how that rate has changed over time, and which task types each agent consistently fails on

Fleet utilisation patterns — which agents are underutilised (cost justification), which are overloaded (capacity planning), which are showing correlated degradation suggesting a shared upstream dependency is failing

Trend direction — is the estate improving or degrading over time, and where are the inflection points

Programme-level visibility — for multi-step agent programmes, which tasks are in progress, blocked, completed, or failed across all registered agents, with time-to-completion compared to baseline

The CTO walks into the board meeting with a fleet view: three agents are exceeding goal achievement targets, two are underperforming with a known root cause being addressed, one is underutilised and a candidate for consolidation. The estate is trending positively. The infrastructure investment is measurable.

The CTO assembles the picture manually from six team updates, each with different metrics, different definitions of success, and different levels of instrumentation. The answer to "is it working?" is a narrative, not a number.

For the CFO

Agent compute and model costs are appearing on the bill. You can't attribute them to anything.

Agent infrastructure costs are growing. Model API calls, compute, storage, retrieval — the line items are real and they're increasing. The CFO is being asked to approve the next phase of investment.

The problem: there's no cost attribution model. The engineering team reports total agent infrastructure spend. There's no breakdown by agent, by product module, by customer, or by business outcome. There's no way to know whether the spend is justified by value delivered. There's no chargeback mechanism for BYOC customers or large enterprise accounts that expect cost transparency.

The CFO is being asked to approve spend on something that, from a finance perspective, is a black box.

What estate intelligence surfaces

Per-agent cost tracking — token consumption, compute, retrieval, and model API costs attributed to each registered agent, per run and over time. No manual tagging — attribution comes from registration metadata.

Per-tenant cost tracking — total agent infrastructure cost attributed to each customer tenant, broken down by agent, by model, by tool category. For BYOC deployments, cost tracking runs inside the customer environment.

Per-team and per-product attribution — costs rolled up to owning team and product area, exportable to SAP, Oracle, or internal finance tooling for chargeback.

Cost trend vs outcome — cost per run correlated with goal achievement rate over time. Not just how much an agent costs, but whether that cost is producing the declared outcome.

Cost anomaly detection — when an agent's cost per run increases significantly versus its 30-day baseline, it surfaces as a drift signal before it becomes a budget problem.

The CFO approves the next investment phase with a cost model that makes sense: cost per agent, cost per customer, cost per outcome, and a mechanism for chargeback where required. The conversation shifts from "how much are we spending on agents?" to "what is the cost per successful investigation, per enriched alert, per resolved ticket?"

For enterprise customers who expect cost transparency as part of their contract, the per-tenant cost breakdown is also a customer-facing capability — part of what you show them in their account dashboard, not just an internal finance report.

The CFO sees a single aggregate line item. Approval decisions are made on faith rather than evidence. BYOC customers with cost transparency requirements in their contracts are served by manual estimation.

How it works

The plane's unique vantage.

Generic monitoring produces telemetry anchored to infrastructure. Estate intelligence is anchored to declared purpose — and that's what makes it different.

Every agent registered with cogward declares its identity, its autonomy class, its declared goal, its approved tool set, its policy scope, and its expected outcome. Every run produces a record that carries all of that context alongside the execution data. Every cost, every tool call, every latency measurement, every policy decision is attributed to a specific agent acting for a specific tenant under a specific declared purpose.

That combination — identity, purpose, policy, outcome, and cost together — is what makes the intelligence semantically meaningful rather than statistically aggregated.

Estate intelligence operates on metadata and behavioural signals — not on content payloads. Tool call sequences are tracked as structured metadata: which tool was called, in which order, at what latency. Goal achievement is calculated from declared goal types and structured outcome signals registered in the agent manifest — not from the content of agent responses. Cost and latency are infrastructure metrics. Policy escalation rates are counts of enforcement events. No estate intelligence signal requires inspecting or storing content payloads inline. The privacy-preserving architecture and the intelligence layer are complementary, not in tension: the plane is blind to what the content was, and fully informed about what happened structurally.

Per-customer baselines. No cross-tenant pooling.

Estate intelligence is per-customer and self-compounding. Every production run adds to that customer's own behavioural baseline. The intelligence improves as the customer's estate grows, without any data leaving the customer's environment and without any cross-tenant data sharing.

For customers in regulated environments — financial services, healthcare, sovereign deployment — this is not a compromise. It is the correct architecture. The customer's data never crosses a tenant boundary to produce intelligence about their estate.

Air-gapped intelligence distribution.

For Tier B and Tier C customers operating in fully on-premise or air-gapped environments, improvements derived from the broader deployment base — refined drift detection thresholds, updated anomaly patterns, cost benchmark curves for common agent types — are distributed as pre-trained behavioural models. Same delivery channel as software releases. For air-gapped deployments: signed physical media, cryptographically verified on receipt.

The CFO at a sovereign deployment gets cost benchmarks. The engineering manager at an air-gapped financial services firm gets drift detection baselines. The intelligence flows one way — from cogward's research environment to the customer's plane — with no telemetry leaving the customer perimeter.

What the plane sees that monitoring tools don't

Signal	Generic monitoring	Estate intelligence
Cost per run trending up	Visible as infrastructure spend	Attributed to specific agent and tenant, against 30-day baseline
Goal achievement rate dropping	Not visible	Per-agent, per-programme, over time
Tool call sequence changed	Not visible	Detected as behavioural drift signal
Agent underutilised	Not visible	Fleet utilisation pattern
Cost per outcome	Not visible	Cost correlated with declared goal achievement
Tenant-level cost breakdown	Requires manual tagging	From registration metadata — no tagging required
Drift without code change	Not visible	Detected against per-customer baseline
Correlated degradation across agents	Requires manual correlation	Fleet pattern intelligence across estate

Phasing

Available in two phases.

Phase 2 — Production Governance Next

Task status and programme visibility. Cost and latency trend baselines per agent and per tenant. Goal achievement rate tracking. Per-team and per-product cost attribution. Chargeback export to SAP and Oracle. Estate activity dashboard.

Phase 3 — Intelligence & Evidence Maturity Later

Behavioural drift detection. Fleet pattern intelligence. Goal achievement trend analysis. Cost anomaly detection. Air-gapped intelligence distribution via signed physical media.

Phase 2 establishes the baselines that make Phase 3's drift detection possible. You cannot detect drift without a baseline to drift from. The foundation phase is not a reduced version of estate intelligence — it is the prerequisite for it.

Book a briefing

If agents are already in production, or will be soon, this is the conversation worth having.

The briefing covers the full five-pillar plane, but we can focus specifically on cost attribution, drift detection, and fleet intelligence if that's where the most immediate pain is.

How the five pillars make estate intelligence possible →

How audit-grade evidence and estate intelligence map to DORA, SOC 2, and EU AI Act →

Work email Organisation Agent frameworks in use Deployment target

Primary driver

Runtime control inside our environment Compliance evidence (DORA, SOC 2, EU AI Act, HIPAA) Deployment model (VPC, on-premise, air-gapped) Estate intelligence (drift detection, cost attribution, fleet patterns, goal tracking) Multiple of the above