The SR 11-7 Blind Spot: What Banks Discover When AI Hits Model Risk Management
Banks are deploying AI in credit underwriting, fraud detection, compliance monitoring, and customer service — but the governing framework, SR 11-7, was written in 2011 for statistical models. The OCC, Fed, and FDIC have issued preliminary guidance indicating AI is in scope. The validation gap for third-party LLM APIs, the model version change management problem, and what bank examiners are beginning to ask.
Key Numbers
Background
SR 11-7, issued jointly by the Federal Reserve and the OCC in April 2011, established the model risk management framework that governs model inventories, validation requirements, and governance processes at U.S. bank holding companies and national banks. The guidance defined a “model” as a quantitative method, system, or approach that applies statistical, economic, financial, or mathematical theories, techniques, and assumptions to process input data into quantitative estimates. It was written for the models that defined bank risk management at the time: credit scoring algorithms, value-at-risk calculations, stress-testing models, loan prepayment models, and asset/liability management systems. It was not written for generative AI.
Between 2021 and 2024, the OCC, Federal Reserve, and FDIC each issued preliminary guidance indicating that AI and machine learning models are subject to SR 11-7 requirements. None of that guidance specified how banks should apply SR 11-7's validation requirements — which assume the ability to inspect a model, run it independently, and test it against challenger models — to large language models that are delivered as third-party APIs and whose internal architecture is not disclosed to customers. The regulators acknowledged the gap. They did not resolve it.
In the meantime, banks deployed. JPMorgan has disclosed deployments of AI in contract analysis (the COiN system that reviews legal documents), equity trading (LOXM for order execution), and customer service (Chase's virtual assistant). Wells Fargo operates Fargo, its AI-powered banking assistant. Bank of America's Erica has handled over two billion client interactions. Regional and community banks are adopting third-party AI tools for credit decisioning, BSA/AML transaction monitoring, customer service automation, and compliance workflow management — many of these products built on top of OpenAI, Anthropic, or Google Vertex AI APIs.
The governance gap is structural. SR 11-7 validation methodology requires model developers and validators to understand the model's theoretical basis, test it against real-world outcomes, run it independently, and document its limitations. For a logistic regression credit scoring model, this is straightforward: you have the coefficients, the training data lineage, the performance metrics. For a third-party LLM API, the “model” is a black box delivered over HTTPS. The bank does not have access to model weights, training data, or architecture. When GPT-4 is replaced by GPT-4o, the behavior changes materially — and the bank is not in the change management loop. Every bank using a third-party AI tool in a consequential workflow has, at minimum, a model change management gap that is visible to an examiner who knows what to look for.
Examiners are beginning to look. In 2024 and 2025, OCC examination teams began incorporating AI governance questions into standard safety and soundness exams — initially as informational inquiries, increasingly as validation of whether institutions have applied SR 11-7 rigor to their AI deployments. The pattern that has historically preceded formal MRA findings is present: regulators have issued the guidance, banks have deployed without complete governance frameworks, and examination teams are now testing whether those frameworks exist. The banks that are building their AI model inventory and validation protocol in advance of examination are solving a smaller problem. The banks that are not are building a larger one.
Decision Required
Is your AI tool a “model” under SR 11-7 — and if it is, can you actually validate it to the standard the guidance requires?
Most bank legal and compliance teams that have reviewed the question have concluded that AI tools used in credit decisioning, fraud detection, customer communication, and compliance monitoring meet the SR 11-7 model definition. That conclusion creates downstream obligations: model inventory entry, validation plan, documented performance standards, ongoing monitoring, and a governance process for model changes. For a statistical model, those obligations are operationally achievable. For a third-party LLM API, the standard validation protocol has no clear analog for several of its core requirements.
The model change management problem is the acute risk. SR 11-7 requires institutions to assess the impact of model changes before they affect consequential outputs. Third-party AI providers update models continuously — sometimes with advance notice, sometimes without, sometimes with a version change that is functionally transparent to API callers but materially different in behavior. When OpenAI deprecated GPT-3.5-turbo in favor of GPT-3.5-turbo-0125, or when Google updated its Gemini Pro endpoint, banks that had deployed those models in consequential workflows had a model change event under SR 11-7 that most governance processes were not designed to detect, assess, or document.
Options
Take the position that AI tools used in advisory, productivity, or workflow-assistance roles do not meet the SR 11-7 model definition because they do not generate quantitative outputs used as primary inputs to consequential decisions — a human reviews and decides. This argument is viable for narrowly scoped use cases: internal document summarization, meeting notes, code assist tools. It is increasingly difficult to sustain for AI tools that screen credit applications, flag suspicious transactions, generate customer-facing communications, or produce compliance determinations that are routinely accepted without review. Examiner risk is high if the tool is in scope and the governance record does not exist.
Add all AI tools to the model inventory. Apply the SR 11-7 model definition test to each one; document the reasoning for inclusions and exclusions. For third-party API models, build a validation protocol that addresses what is testable: output performance testing across representative scenarios, adversarial testing for failure modes, demographic parity analysis for credit and customer-facing tools, and consistency testing across model versions. Document explicitly what cannot be validated: model internals, training data lineage, architecture. Establish a model change monitoring process: subscribe to provider release notes, designate a model owner, define the threshold for formal re-validation. This is the defensible posture under current regulatory guidance — it does not require resolving the LLM validation gap, it requires documenting it honestly.
Limit AI tools to internal productivity use cases — document summarization, code assistance, meeting notes — that do not touch credit decisioning, fraud detection, compliance monitoring, or customer communications that affect account status. This posture eliminates SR 11-7 model inventory risk by keeping AI outside the model definition boundary. It also forfeits the business value in the areas where AI has demonstrated the most measurable ROI in financial services: transaction monitoring efficiency, credit application processing time, and customer service deflection rates. It is the right posture for institutions that assess their regulatory risk tolerance as low and their operational AI capability as immature. It is not a sustainable long-term position as competitive pressure and vendor integration push AI deeper into bank workflows regardless of governance posture.
Require that any AI model entering the model inventory be deployed on bank-controlled infrastructure — on-premises or private cloud — with full access to model weights, training data, and architecture for validation purposes. This eliminates third-party API black-box risk and gives the model validation team the access SR 11-7 validation methodology assumes. It is operationally realistic for JPMorgan, Bank of America, and Goldman Sachs, which have deployed hundreds of ML engineers. It is not realistic for the $5B community bank or the regional bank without a dedicated ML platform team. The tooling and talent required to train, deploy, and maintain production-grade LLMs internally exceeds the capacity of most institutions below the top 20 by assets.
Recommendation
Build the model inventory before the examination — not during it. The defensible posture is not a perfect validation protocol for LLMs, which does not currently exist. The defensible posture is documented governance: every AI tool in scope identified, the model definition test applied and recorded, validation conducted to the extent possible with limitations documented, and a change management process that creates a record of how model updates are detected and assessed.
Start with an AI tool audit that goes beyond your IT asset management system. Business units and departments are deploying AI tools through SaaS subscriptions and browser-based access that do not appear in formal technology acquisition records. The tools your model risk committee does not know about are the ones that appear in examiner findings. Require business unit heads to disclose all AI tools in current use, including trial deployments and pilot programs. The scope of your model inventory cannot be accurate if the scope of your AI deployment is unknown.
Apply the SR 11-7 model definition test to each tool with written documentation of the reasoning. The test has two parts: does the tool produce quantitative output, and is that output used as a primary basis for consequential decisions? For tools that clearly fail both parts — document summarization for internal use, meeting transcription — the exclusion is defensible. For tools where the answer is ambiguous — an AI that generates credit denial explanations, an AI that produces BSA/AML case scores — err toward inclusion and document why. An examiner who finds an undocumented model in a consequential workflow is making a finding. An examiner who finds a documented exclusion with documented reasoning has a starting point for a conversation.
For third-party API models in your inventory, build the validation protocol around what is achievable: structured output testing using representative transaction samples, adversarial scenario testing for failure modes, and where the tool affects credit or customer decisions, a demographic parity analysis. Document the validation gap explicitly in the model file — what SR 11-7 validation would require, what the third-party API structure prevents, and what compensating controls are in place. Regulators are not currently expecting banks to have solved the LLM validation problem. They are expecting banks to have acknowledged it and built governance around what is possible.
Brief your primary regulator before they ask. Every OCC district office and Federal Reserve regional bank has indicated that proactive engagement from institutions on AI governance is preferred over discovering undisclosed model risk in examination. If your institution has consequential AI deployments, a pre-examination briefing that describes your model inventory, your validation methodology, and your known limitations is a better outcome than an examination finding. It is also an opportunity to get informal guidance on whether your approach meets examiner expectations before your approach is formally tested.
Enjoying this brief? The next one ships Tuesday.
One enterprise AI deployment, dissected weekly. Free during beta · No credit card · Unsubscribe anytime
Risks
An AI tool used in credit decisioning, fraud detection, compliance monitoring, or customer communication that is not in the model inventory is an SR 11-7 finding. The finding does not depend on whether the tool performed correctly. It depends on whether your governance process covered it. Banks that deployed third-party AI tools through business unit procurement — outside the formal model risk governance process — have created inventory gaps that are structurally identical to the gaps that generated MRA findings for statistical models a decade ago. The pattern is not new. The tools are.
Third-party AI providers update models without prior notice to customers, or with notices buried in developer changelog emails that never reach the model risk management team. GPT-3.5, GPT-4, GPT-4 Turbo, GPT-4o, GPT-4o mini — each transition changed model behavior materially. If a bank was running GPT-4 in a credit explanation workflow and the provider updated to GPT-4o without triggering the bank's model change management process, there is an undocumented model change event in the bank's model inventory record. Multiply that across all third-party AI tools and the cumulative change management gap is significant. Most bank governance processes were designed to catch internal model retrain events, not API provider updates.
AI tools used in credit underwriting, pricing, marketing, or customer communication decisions carry ECOA, Fair Housing Act, and UDAP exposure. The use of AI in these workflows requires a disparate impact analysis — testing whether the AI's outputs produce differential outcomes across protected class proxies. Most banks that deployed AI credit or marketing tools in 2023 and 2024 have not completed this analysis for the specific model versions currently running in production. The analysis must be rerun when the model changes. The compliance gap compounds with each untracked model version update.
Banks that have deployed multiple AI workflows on a single provider — OpenAI, Microsoft Azure AI, Google Vertex AI — have created a single vendor dependency that spans multiple model inventory entries simultaneously. A provider pricing change affects multiple model inventory items at once. A provider outage affects multiple critical workflows. A provider deprecating an API endpoint requires coordinated model re-validation across multiple business lines. The model inventory risk is not just the risk of any single model — it is the portfolio risk of how much of your model inventory is exposed to a single third-party dependency.
Model validation teams built and staffed for statistical model validation — the core of bank MRM for the past 15 years — do not have the training, tools, or methodological framework to validate large language models. The gap is not a failure of the teams; it is a structural mismatch between the skills SR 11-7 has historically required and the skills the current AI landscape requires. Banks that use their existing validation teams to validate LLMs using statistical model validation protocols will produce validation reports that satisfy the procedural requirement but do not address the failure modes specific to LLMs: hallucination, prompt injection, context sensitivity, and behavior drift across model versions. The inadequate validation methodology finding is an examiner risk even when validation is technically performed.
Questions Your Team Should Be Answering
These are the questions that distinguish organizations that get this right from those that do not. If your team cannot answer them, that is your first deliverable.
- 1.
Do you have a complete inventory of every AI tool currently deployed at your institution, including tools procured through business unit budgets, SaaS subscriptions, or pilot agreements that did not go through the formal model risk governance process?
- 2.
Has your legal or compliance team applied the SR 11-7 model definition test to each AI tool in use? For each tool: which have been determined to be in scope, which have been explicitly excluded, and is the reasoning documented — not just the conclusion?
- 3.
For third-party LLM APIs in your model inventory: what validation has been completed, what limitations are documented in the model file, who is the designated model owner, and what does ongoing performance monitoring look like in practice?
- 4.
What is your current process for detecting and assessing model version changes from third-party AI providers? Who receives provider update notifications, who conducts the impact assessment when a model version changes, and what triggers formal re-validation?
- 5.
Has a fair lending analysis — disparate impact testing on protected class proxies — been completed for every AI tool currently used in credit decisioning, pricing, or customer communication affecting account status? If not, which tools are unanalyzed and what is the timeline for completing the analysis?
- 6.
When did your primary regulator last receive a briefing on your institution's AI deployment posture and model risk governance approach? If your AI inventory has grown materially since the last briefing, what is the plan to proactively update your supervisor before examination?
If this memo belongs in your next executive meeting or board pack, send it along. One click opens a pre-drafted email — edit or send as-is.
The ATO Bottleneck: What Federal Agencies Discover When AI Procurement Meets the Authorization Process
Federal agencies are deploying AI tools across procurement, benefits processing, and workforce operations — but the ATO process was written for static systems. FedRAMP authorizes cloud infrastructure, not AI behavior. Most frontier AI APIs lack FedRAMP authorization, and most federal ATOs are stale by the time the model updates.
Read memo →The Algorithmic Underwriting Audit: What NAIC AI Requirements Mean for Every Insurer Using AI in Pricing and Claims
State insurance regulators have moved. The NAIC Model Bulletin on AI has been adopted in 38+ states. Colorado mandates external algorithmic audits for life insurance AI. California CDI has challenged AI-generated property risk scores. Most carriers have deployed AI in claims and underwriting without building the governance documentation regulators are now requiring.
Read memo →The Shop Floor AI Bet: What Siemens' Industrial Copilot at BMW Means for Every Manufacturing CIO
Siemens Industrial Copilot is live at BMW plants — reading PLC logs, generating maintenance recommendations in real time. The data portability clause your current contract doesn't include, the safety governance process your EHS team hasn't defined for AI-generated maintenance steps.
Read memo →