CodexScribe

The Formal Foundry CodexScribe provides the foundation of verifiable safety to AI systems

AI has advanced quickly, but most operational safeguards still rely on heuristics—prompts, pattern checks, red‑teaming, and test suites. These controls are useful yet inherently brittle: behaviour drifts, wording is interpreted inconsistently, and corner cases accumulate. For homeland security and safety work, this variability is hard to accept.

The durable alternative is formal verification: expressing requirements as precise specifications that can be checked by a proof assistant. In practice, the barrier has been formalization – turning expert intent and knowledge into exact, machine‑checkable definitions. It is slow, requires a specialist, and is difficult to audit.

Formal Foundry addresses this bottleneck. The product provides a conversational formalization workflow in which experts state the rule they intend; a candidate formalization is immediately type‑checked; the verified meaning is then read back deterministically in clear, semi‑natural language. This keeps ambiguity and “hallucinations” out of the specification and makes the process trackable down to the smallest decision. The outcome is a clear, unambiguous, and expressive specification that tames complexity and can be referenced by downstream engineering and assurance tooling—creating a practical path towards formally defined controls, without the tool itself being a runtime safety layer.

Our tool

A conversational formalization tool that turns a domain expert’s intent into machine‑checkable, auditable specifications verified in a proof assistant (Agda today; architecture designed to support additional assistants). The result is a dependable description of rules for AI‑enabled workflows in homeland security and safety domains.

Key innovations

Closed‑loop formalization. An expert describes a requirement (e.g., separation‑of‑duties, cross‑domain transfer conditions, data‑use limits). The system proposes a formal version, checks it in a proof assistant, then reads it back deterministically for human confirmation. This prevents ambiguity, policy drift, hidden assumptions, and AI hallucinations from entering the specification.
Proof‑checked acceptance. Only definitions that pass the type checker become part of the model, producing consistent and reliable specifications that persist across teams and over time.
Compliance by design; accountability by default. The formalization process can be tracked down to the smallest decision—expert input, AI proposal, check outcome, and read‑back—creating a traceable record that supports AI Act compliance, broader compliance requirements, and independent audit.
European focus and data sovereignty. Deploy on own local infrastructure (including restricted or air‑gapped environments) or as a hosted option. The workflow is compatible with European LLMs—for example Mistral and Magistral (France) and PLLuM and Bielik (Poland)—supporting data sovereignty and helping organizations reduce dependence on US‑based AI providers.
Accessible to non‑specialists. The tool “speaks the expert’s language” on input and returns the checked meaning in clear text, enabling use by investigators, operators, and programme managers—not only logicians.

Representative application contexts

Identity and access rules: dual control, no self‑approval, segregation of duties.
Sensitive‑data handling: which data may be used, by which roles, under which conditions.
Cross‑domain information movement: precisely stated conditions for transfer and use.
Mission procedures: “always/never” events captured unambiguously and kept auditable.

Deployment

Available hosted by Formal Foundry or installed on customer infrastructure. Configurable logging and versioning preserve a verifiable history of changes and reviews.

Regulatory and sovereignty context

Staged obligations under the EU AI Act and wider cybersecurity frameworks (including NIS2 at Member‑State level) elevate the demand for traceable, evidence‑ready specifications. Many programmes also prioritise data sovereignty and seek to reduce dependence on US‑based AI providers, favouring solutions deployable on own local infrastructure and compatible with European LLMs (e.g., Mistral/Magistral, PLLuM, Bielik).

Primary users and beneficiaries

Government, law‑enforcement, and defence: formalized operational rules for AI‑assisted analysis, collaboration, and case management, with robust auditability.
Critical‑infrastructure operators (energy, transport, health, water): safety and data‑use requirements captured as machine‑checkable specifications that persist across tools and teams.
Systems integrators and primes: a single, authoritative source of programme rules that reduces ambiguity and rework.

Value proposition by beneficiary and observable impact

Government, law‑enforcement, and defence programmes
Value: a single, unambiguous source of truth for mission rules that can be reviewed by non‑specialists and trusted by technical teams.
Impact: fewer policy disputes, accelerated sign‑off before operational deployment, and a repeatable evidence trail aligned with AI Act expectations and national accreditation practices.
Critical‑infrastructure operators (energy, transport, health, water)
Value: durable specifications of safety and data‑use requirements that remain stable across personnel changes and tooling updates.
Impact: reduced operational risk from misinterpretation, smoother vendor hand‑offs, and faster audit preparation during incident reviews or regulatory assessments.
Systems integrators and primes
Value: precise, machine‑checkable requirements that limit scope creep and enable predictable delivery.
Impact: fewer change orders driven by ambiguity, clearer acceptance criteria, and improved reuse of validated specifications across related projects.
Auditors, assessors, and accreditation authorities
Value: transparent artefacts linking human intent to checked formal definitions, with traceable changes.
Impact: shorter evidence reviews, clearer non‑compliance findings when they occur, and greater confidence that specifications reflect what operational leaders intended.