At FormalFoundry, we present an innovative approach to improve AI safety using formal methods and proof assistants. In this whitepaper:
- We propose an architectural framework that verifies the correctness of generative AI outputs.
- Our empirical tests with advanced AI models substantiate the feasibility of this approach.
- We also chart a collaborative roadmap for a research program with the ultimate goal of industry-wide adoption, ready to address diverse real-world applications.
AI’s New Ally: Why Formal Methods are Key to Large-Scale Correctness
The Increasing Relevance of Formal Methods: A Quiet Evolution
The field of formal methods, traditionally considered a niche within computer science, has been quietly maturing and evolving in its corner. Yet, it holds profound potential for addressing many of the recent widespread fears and challenges arising from recent advances in AI.
The nexus between these two domains is the concept of ‘large-scale correctness.’ As AI models grow in complexity, the potential for errors or unintended consequences increases correspondingly. Verifying the correctness of these models’ outputs – ensuring that they function as intended without harmful side effects – is a daunting task that is no longer feasible through conventional methods. This is where the potential of formal methods and, in particular, proof assistants comes into play.
The Interplay of Demand and Supply in Formal Methods: Powered by AI
Proof assistants serve as the conduit between the theoretical world of formal methods and the pragmatic world of AI. They provide a structured and systematic way of verifying the correctness of systems. Although proof assistants have been traditionally used in verifying hardware systems and safety-critical software, their application in AI, and more specifically generative AI, is still an emerging frontier.
This need for large-scale correctness, fuels a two-fold impact on the demand and supply for formal methods. On the demand side, as AI systems become integral to various facets of our daily lives, the requirement for these systems to be ‘correct,’ safe, and reliable increases. On the supply side, AI systems themselves can contribute to formal methods, enabling them to handle larger, more complex systems. In fact, AI can empower proof assistants to become more automated and efficient, thereby broadening their applicability
Redefining AI Safety: Incorporating Proof Assistants into the Framework
Understanding Generative AI-based Systems: Basic Scheme
Our initiative primarily targets the enhancement of safety in generative AI-based systems. To simplify, let’s consider these systems under a basic scheme. We can presume that at the heart of a product driven by generative AI is a process where a user of this product requests a specific task or output to be produced by the system, which then delivers the output. There may be several intermediary layers that help structure the input and process the output in a domain-specific way. But for the purposes of our explanation, we will stick to this simplified model.
Employing Agda: A Layer of Correctness Verification
The crux of our idea is to ensure that whenever an output is generated, it is created according to strict, predefined rules. To achieve this, we rely on the key technology of proof assistants. In our approach, we employ Agda, which is a dependently typed language. A primary feature of this technology is its extreme expressiveness, signifying that it is one of the most expressive systems developed by humans and can encode almost any arbitrarily complex argument.
It can represent both the correctness of these arguments and also be used to state specifications or theorems, and express their proofs. However, to clarify, while it can check proofs and specifications, it doesn’t generate those proofs itself.
Addressing the Implementation Challenges: Knowledge Encoding, Expressing Correctness, and Proof Generation
Our proposal is to incorporate Agda as an additional layer into generative AI systems. Each time an output is requested from the generative AI, this layer will also demand a proof of its correctness. Implementing this involves addressing several issues:
- Domain-Specific Knowledge Encoding: The domain-specific knowledge necessary for the product must be encoded. Here, the expressiveness of Agda (the dependently typed language) comes to our aid.
- Expressing Correctness: For each input, we need to express the correctness of this specific input in the context of the formalization. We plan to provide a toolkit addressing this task, which can be employed differently depending on usecase.
- Generating Proof: Once the task is passed to the generative AI, we need to generate a proof that the produced output is indeed correct.
In addition to these, we also aim to make this process inspectable and explainable. An important objective will be to create a natural language representation of a theorem stating that the output is indeed correct. This step enhances the interpretability and accountability of our architecture.
Less Speculation, More Evidence: Verifying the Promise of Large Language Models in AI Safety
(more about our architecture in separate paper : Formal Foundry Architecture)
The Evolving Feasibility of Proposed Architecture
The architecture proposed in this paper, once seen as speculative, is increasingly feasible due to rapid advancements in the field of generative AI and large language models. These technological strides not only justify the need for an architecture of this kind but also supply the tools necessary for its realization.
Our work is grounded in a series of empirical tests using large language models, specifically, OpenAI’s GPT-3.5 Turbo and GPT-4.0 models. These models have demonstrated a promising performance trend on the tasks we assigned, signifying a positive trajectory for large language models’ capabilities.
Empirical Validation Through Structured Database of Agda Problems
A significant component of our work is the creation of a structured database populated with a diverse range of Agda problems. This resource serves a dual purpose: firstly, as a performance evaluation tool for large language models in generating accurate proofs across various tasks, and secondly, as a repository of problems which can be input into our system for the iterative production of Agda proofs by the large language model.
Through performance validation of large language models, we have reason to believe that there may already be business scenarios where the models’ proof-generating performance meets the requirements for practical application. This indication suggests the potential of integrating large language models into formal methods to achieve practical results, thus extending the scope of AI applications.
Easing the Creation of Formal Specifications: Expanding the Horizon for Formal Methods
Another significant area of our experimental work involved user assistance in generating formal specifications of output. This component aimed to simplify the process of creating formal specifications for users without formal methods experience, thereby increasing accessibility and promoting broader adoption of these techniques across diverse industries.
We view this capability as essential for wide-ranging application of formal methods. By facilitating formal specification generation for users lacking in-depth expertise in formal methods, we anticipate increased adoption and reduced barriers to entry. The outcome of this would be the extension of safer, more reliable AI applications across various sectors.
Formal Foundry: A Roadmap for Industry Collaboration
No Need for Breakthroughs: Capitalizing on Existing Innovations
To actualize the architecture proposed in this paper, we do not see the need for breakthrough innovations. The crucial pieces are already present, albeit scattered across diverse domains. Existing efforts have primarily focused on small-scale experiments, lacking a systemic approach toward constructing a reusable toolkit or infrastructure. Our objective is to consolidate these advancements, focusing on the implementation of proof assistants’ recent research, novel techniques for building real-world infrastructure and equipping generative AI developers with the necessary knowledge.
Beginning with Industry Collaboration: Necessity of Early Integration
This is why our program revolves around close industry collaboration from the outset. While one might argue that these techniques should be refined to the point of robustness before venturing into real-world applications, we believe that the unfamiliarity of formal methods to most developers necessitates early integration efforts. These initial attempts may be limited and not immediately fit for production-level systems, but they represent a crucial first step.
Challenges and Importance of Integrating Methods into Existing Products
Integration of these methods into existing products will be a lengthy process that requires deep involvement from product-specific development teams. The majority of the work will lie in formalizing relevant knowledge, a potentially challenging and costly task. Given this complexity, it becomes challenging to accurately scope such a product and predict its impact on the company. Nevertheless, ensuring the safety of output is of paramount importance.
In this context, the ability to undertake limited formalization efforts and deploy this safety layer in an experimental setting can be incredibly beneficial to a company. It could provide a strategic understanding of these techniques’ applicability scope to their products, potentially sparking new product ideas or expanding product applicability to new industries.
Strategic Partnerships and Real-world Applications: The Cornerstones of Our Projects
We are structuring our research and development around collaboration with a select number of strategic partners. For these partners, we are developing POCs (proofs of concept) to evaluate our architecture across their various workflows and industries. We believe it is imperative to test and structure solutions for these and more real-world problems, data, and business settings from the start. We aim to ensure that we are targeting industry-relevant problems and that our toolkit is well-equipped for deployment atop existing generative AI-based systems
For additional information on this paper, please contact: [email protected]