The previous posts explored what AI-native software could look like and how ambiguity becomes its vital sign. The vision is clear enough: software as a living system, not a factory pipeline. Express intent, let AI handle the translation layers, evolve continuously within safety bounds.

But a vision without a framework is just a metaphor. This post makes it concrete — the primitives, layers, and mechanisms that would let you actually build software this way.

The Stability Contract

The foundation is an immutable layer that agents cannot touch. Think of it like a constitution for the system. Humans write this, agents obey it.

system "crm_system" {

  # Hard invariants — never violatable
  invariant customer_data_never_lost
  invariant response_time < 500ms
  invariant auth_required_for_customer_data

  # Behavioural intent — what the system does
  capability track_interactions {
    accepts: rep_logged_activity, customer_event
    produces: engagement_score, churn_alerts
    property: works_offline
    property: data_stays_in_region unless compliance_allows
  }

  capability assign_rep {
    accepts: new_deal, rep_availability
    produces: deal_assignment
    property: handles_reassignment
    property: respects_territory
  }

  # Quality bounds — the system can optimise within these
  optimise cost within budget($200/month)
  optimise latency minimise
  optimise accessibility WCAG_AA
}

This is what humans care about. Everything below this is the agent’s domain.

The stability contract needs invariants that go beyond technical correctness:

invariant no_spam_or_over_contact
invariant customer_maintains_data_sovereignty
invariant contact_requires_consent
invariant data_retention_compliant
invariant explainable_recommendations  # no black-box scoring

These are harder to verify formally, but they can be checked through a combination of static analysis, simulation, and human-in-the-loop review for the subset of changes that touch these concerns.

Intent as Reactive Specification

The stability contract declares what must always be true. But a living system also needs to declare what should happen when things change — a reactive specification that can answer “given everything that’s happened, what should be true right now?”

A Complete Example

system "customer_platform" {

  # ─── DATA SEMANTICS ───

  entity customer {
    has: interactions[]
    has: deals[]
    has: preferences
    rule: always_has_encryption_key
    rule: deletion_means_crypto_shred
  }

  entity interaction {
    belongs_to: customer
    immutable: true        # once recorded, never changed
    has: type, sentiment, timestamp, context
    rule: timestamp_never_in_future
  }


  # ─── REACTIVE CONSEQUENCES ───
  # Each "when" block declares: if X happens, Y must follow.
  # The conductor just checks: did Y follow?

  when interaction.created {

    # Always happens
    must: update_engagement_score(interaction.customer)
    must: acknowledge_to_rep(interaction)
      within: 5s

    # Conditional consequences
    if engagement(interaction.customer).trend
       crosses threshold.churn_warning {

      must: alert_rep(
        customer: interaction.customer,
        reason: churn_risk_rising,
        actionable: true    # alert must suggest next step
      )
        within: 1m

      must: log_alert_event
    }

    if engagement(interaction.customer).trend
       crosses threshold.churn_critical
       AND interaction.customer.has_account_manager {

      must: notify_account_manager(
        manager: interaction.customer.primary_account_manager,
        summary: engagement_summary(interaction.customer),
        urgency: high
      )
        within: 5m

      must: await_manager_acknowledgment
        within: 24h
        fallback: escalate_to_sales_director
    }
  }

  when customer.assigned_rep {
    must: share_history(
      data: customer.interactions,
      recipient: rep,
      requires: data_access_authorisation
    )
    must: setup_notification_channel(customer, rep)
  }


  # ─── TEMPORAL EXPECTATIONS ───
  # Things that should happen based on time, not events

  every day at rep.preferred_time {
    if rep.has_active_deals {
      should: prompt_activity_log(rep)
        # "should" vs "must" — soft expectation,
        # gap is low priority
    }
  }

  every week {
    for each customer with interactions.count > 3 {
      must: generate_pipeline_review(customer)
      must: recalculate_engagement_scores(customer)
    }
  }

  when alert.unacknowledged for 4h {
    must: re_alert(
      escalate: true,
      channel: most_reliable(rep.contact_preferences)
    )
  }


  # ─── INVARIANTS (always true, not event-triggered) ───

  always {
    assert: every(interaction) is reachable_through_projection
    assert: every(customer.encryption_key) exists AND valid
    assert: no(alert.critical) is unacknowledged for > 24h
    assert: every(deal_score) has explanation
    assert: every(customer_contact) has consent_record
  }
}

The Consequence Graph

The intent layer compiles down to a consequence graph — a computable structure that the conductor evaluates:

Event: interaction.created(i1, customer: c1, type: call, sentiment: negative)

Expected consequences:
├─ MUST engagement_updated(c1)              within: 5s
│   └─ status: ✓ (event e4021 satisfies this)
├─ MUST acknowledged(i1)                    within: 5s
│   └─ status: ✗ GAP — no acknowledgment event found
├─ CONDITIONAL: check churn threshold
│   └─ engagement(c1).trend = 42 → crosses churn_warning(45)
│       ├─ MUST alert_rep(c1, churn_risk_rising) within: 1m
│       │   └─ status: ✓ (event e4022)
│       └─ MUST log_alert_event
│           └─ status: ✓ (event e4023)
└─ CONDITIONAL: check critical + manager
    └─ trend = 42, critical threshold = 20 → not crossed
        └─ no consequences required ✓

Gaps found: 1
  → acknowledgment missing for interaction i1
  → severity: must-within-5s (high priority)
  → spawn agent to resolve

The conductor doesn’t need to know anything about CRM. It just evaluates the consequence graph. The intelligence is in the intent spec, not the gap detector.

Priority: Must, Should, Could

The must/should distinction is how the framework decides which gaps to resolve now and which to let evolve:

  • must — hard gap. Conductor spawns agent immediately. Promotion is blocked until resolved.
  • should — soft gap. Conductor logs it, spawns agent on low priority. System continues functioning.
  • could — optimisation opportunity. Conductor notes it, addresses when there’s spare budget.

Invariant violations sit above all three — fix immediately.

Intent Evolution Creates Gaps Automatically

When the intent evolves, the system automatically knows what work needs to happen to bring existing data into compliance. No migration scripts. No Jira tickets. The gap just appears and agents resolve it.

# Day 1: intent spec says
when interaction.created {
  must: update_engagement_score(customer)
}

# Day 30: human updates intent spec to add
when interaction.created {
  must: update_engagement_score(customer)
  must: check_competitor_mentions(customer)   # NEW
}

# The conductor immediately detects:
# "there are 5,000 existing interactions where
#  check_competitor_mentions never ran"
#
# This is a CONSEQUENCE GAP — retroactive.
# The intent layer can specify:

when intent.evolved {
  backfill_policy: sequential   # process old events against new rules
  # vs: prospective_only        # only apply to new events
  # vs: human_decision          # ask human what to do
}

The Persistence Model

Code is stateless — you can throw it away and regenerate it. Data is the opposite. It’s accumulated, irreplaceable, and it has meaning that changes as the system evolves. If agents are going to restructure the system continuously, the data has to survive every restructuring.

The Immutable Event Log

Underneath everything, the framework maintains an immutable, append-only event log. Every piece of primary data enters the system as an event and never changes.

Event log (immutable, append-only, source of truth):

  e001: { customer: c1, type: interaction_logged,
           data: {call, duration: 15m, notes: "pricing Q"}, time: ... }
  e002: { customer: c1, type: deal_updated,
           data: {stage: negotiation, value: 50k}, time: ... }
  e003: { customer: c1, type: rep_assigned,
           data: {rep: jsmith}, time: ... }
  e004: { customer: c1, type: consent_granted,
           data: {marketing_emails: true}, time: ... }

This is event sourcing elevated to a framework primitive rather than an architectural choice. Agents never touch this log — they can only append to it. Everything else — the queryable database, the API responses, the user-facing views — is a projection of this log, and projections can be rebuilt from scratch at any time.

Disposable Projections

This is what makes evolution safe. When an agent restructures the system, it’s not migrating data. It’s building a new projection of the same immutable events.

Current system:
  Event log: [e001, e002, e003, ... e50000]
  Projection A: PostgreSQL with schema v3
    → customers table, interactions table, deals table

Agent creates a draft with a completely different data model:
  Projection B: Graph database
    → company nodes, contact nodes, deal nodes, relationship edges

Promotion process:
  1. Agent builds Projection B from the FULL event log
     (replays all 50,000 events into new structure)
  2. Verification: is every primary data point accessible
     through the new projection?
  3. Canary: run both projections simultaneously, compare
     results for real queries
  4. Swap: route traffic to Projection B
  5. Keep Projection A alive for rollback window
  6. Event log never changed. Nothing was "migrated."

The event log is the invariant. Projections are disposable. Agents can be as radical as they want about restructuring the queryable layer because the raw data is untouchable.

Enrichment and Deletion

Schema evolves. Early events might have {type: "call"}. Later the intent adds duration, sentiment, follow-up. The framework handles this through enrichment events — agents create derived events that augment old data without modifying it:

# Original event (immutable, stays forever)
e001: { type: "call" }

# Enrichment event (agent-generated, linked to original)
e001_enriched: {
  source: e001,
  type: retroactive_enrichment,
  inferred_sentiment: null,       # honestly unknown
  normalised_interaction: "call_outbound",
  enrichment_confidence: 0.95
}

The enrichment is clearly marked as inferred, not original. Projections can use enrichments or ignore them.

For genuine deletion — a customer exercises their GDPR right — the log records the intent through tombstone events. The raw events get cryptographically shredded (delete the encryption key for that customer’s events, rendering them unrecoverable without actually modifying the log structure).

An agent could literally throw away the entire application layer and rebuild it from scratch, and no customer would lose a single data point. That’s what makes “living system” safe rather than terrifying.

The Conductor

The conductor is the only persistent process. It continuously compares two things: what has happened (event log) versus what should have happened (intent layer). The gap between those is work.

loop forever:

  events = event_log.since(last_checkpoint)

  for event in events:

    # What consequences should this event have triggered?
    expected = intent.expected_consequences(event)

    # What actually happened?
    actual = event_log.consequences_of(event)

    # The gap is work
    unresolved = expected - actual

    if unresolved:
      agent = spawn_agent(
        goal: resolve(unresolved),
        context: event,
        constraints: stability_contract,
        sandbox: new_draft()
      )
      track(agent)

These gaps are the concrete, computable form of the ambiguity described earlier — places where the system’s declared intent and its actual behaviour haven’t converged. Instead of reading ambiguity off a mind map, the conductor detects it mechanically.

Five Types of Gaps

1. Reaction gap — event happened, expected consequence didn’t. “Rep logged a high-risk interaction, no churn alert was generated.” Spawn a reactive agent.

2. Consistency gap — system state doesn’t match what the event log implies. “Event log shows 50 interactions, projection only has 49.” Spawn a repair agent.

3. Intent gap — the intent layer changed, existing data or behaviour doesn’t match. “New invariant: all deal scores must be explainable. 300 existing scores lack explanations.” Spawn an enrichment agent.

4. Quality gap — everything works, but optimisation targets aren’t met. “Latency is 400ms, target is 200ms.” Spawn an optimisation agent.

5. Temporal gap — something should have happened by now and hasn’t. “48-hour follow-up was due yesterday.” Spawn a temporal agent.

Ephemeral Agents

The spawned agents are ephemeral and purpose-built. They’re not long-running services. They’re born with a specific goal, given a sandbox, and they either resolve the gap or fail.

SPAWN → conductor creates agent with:
  - specific goal (resolve this gap)
  - relevant context (the events involved)
  - a sandbox/draft to work in
  - resource budget (time, compute, cost)
  - constraints from stability contract

WORK → agent operates in sandbox:
  - reads event log (read-only)
  - reads current projections (read-only)
  - builds solution in draft
  - generates proof of invariant satisfaction

RESOLVE → agent attempts to close the gap:
  - submits draft for promotion
  - OR appends new events (reactions, enrichments)
  - OR reports: "this gap requires human input"

DIE → agent terminates. Always.
  - no persistent agent state
  - no long-running background agents
  - if the gap recurs, a new agent gets spawned

Agents don’t linger. The conductor is the only persistent process. Everything else is short-lived. This prevents the system from accumulating zombie processes, stale agent states, or conflicting long-running agents stepping on each other.

Safety

Agents don’t modify the live system. Ever. They work in sandboxed drafts — complete parallel instances of the system that they can restructure however they want. The key primitives:

  • Draft — a candidate next-state of the whole system
  • Proof — evidence that a draft satisfies all invariants
  • Promote — atomic swap from current state to draft (only if proof checks out)
  • Rollback — instant revert if runtime behaviour diverges from proof

An agent’s workflow: create draft, restructure freely, generate proof, request promotion. If the proof doesn’t verify, the draft never touches production. The agent can try again with a different approach.

The stability membrane enforces this. Before any promotion, it runs:

  • Pre-promotion verification — formal checking of invariants against the draft. Does customer data survive? Do all API contracts still hold? This isn’t unit tests. It’s closer to property-based verification across the entire system state space.
  • Canary materialisation — even after proof, the draft gets materialised for a fraction of real traffic first. Agents can be wrong about the real world even when they’re logically correct.
  • Continuous runtime monitoring — the live system is perpetually checked against the intent spec. If latency creeps past 500ms, the membrane flags it and agents get a synthesis task automatically.

Agent resolution can cascade — resolving one gap by appending new events might itself create gaps. That’s fine; the conductor picks them up in the next loop. But circuit breakers prevent spiralling:

# Prevent infinite spawn loops
max_agent_depth: 5
  → if resolving gap A spawns gap B spawns gap C...
     stop at depth 5, flag for human review

# Prevent resource exhaustion
max_concurrent_agents: 20
max_agents_per_gap: 3
  → if 3 agents have failed to resolve the same gap,
     escalate to human

# Prevent thrashing
cooldown_per_gap: 30s
  → don't re-spawn for the same gap immediately

# Budget enforcement
total_compute_budget: $X/hour
  → conductor prioritises gaps by severity
     when budget is constrained

The biological analogy holds: the event log is DNA (immutable record), the intent layer is gene expression, projections are proteins (functional structures built from the record), the conductor is the immune system (gap detection), and spawned agents are antibodies (targeted, ephemeral, disposable). Draft/promote is cell division with error checking. And just like an immune system, it can sometimes get it wrong — that’s why the promotion gate and rollback mechanism exist.

Where to Start

A proof-of-concept needs five things: an event log, a parsed intent registry, a gap detector (the conductor’s core loop), an agent runtime with sandbox and tool access, and a promotion gate that verifies drafts against invariants. The agents themselves can initially just be Claude or similar models with tool access. The framework is the guardrails, not the intelligence.