The AI Clinical Note Your Physician Didn't Write — and Signed Anyway
Ambient AI clinical documentation tools — Microsoft Dragon Copilot, Abridge, Ambience Healthcare, Suki — are now in production at health systems ranging from Cleveland Clinic to community hospitals. The physician receives a draft note, reviews it, and signs it. When the note contains a hallucinated medication, an omitted finding, or an incorrect procedure code, the liability does not transfer to the AI vendor. It stays with the signing physician and the health system. Most enterprise deployments have not updated their liability framework, consent workflow, or specialty performance requirements to reflect what they have actually deployed.
Key Numbers
Background
Ambient AI clinical documentation works like this: a physician activates the tool at the start of a patient encounter. The system listens to the conversation. When the encounter ends, it produces a structured clinical note — the SOAP note, the HPI, the assessment and plan — ready for the physician to review and sign into the electronic health record. The note is drafted by an AI. The physician reads it, edits if necessary, and signs it. The signed note becomes the legal medical record.
The market has consolidated around four enterprise platforms. Microsoft Dragon Copilot — the rebrand of Nuance DAX Copilot, merged with Dragon Medical One in March 2025 — is the largest deployment, used by more than 40,000 physicians and backed by deep Epic EHR integration. Abridge, backed by UPMC and integrated directly into Epic’s native workflow, targets health systems that want an Epic-native path. Ambience Healthcare was selected by Cleveland Clinic in February 2025 after the health system piloted ambient AI tools across more than 80 specialties and subspecialties — the most rigorous enterprise evaluation published to date. Suki, athenaAmbient, and a long tail of point solutions round out the market.
The clinical evidence is genuinely positive on some dimensions. A JAMA study across five academic medical centers — using Ambience, Nuance DAX Copilot, and Abridge with Epic — found that ambient AI reduced EHR time by 13.4 minutes per session and documentation time by 16.0 minutes, and was associated with 0.49 additional patient visits per week. A separate study found physician burnout rates dropped from 51.9% to 38.8% after 30 days of use. These are real, measurable improvements. They are also the numbers in every vendor RFP response.
The clinical evidence is more complicated on productivity. A randomized controlled trial at Atrium Health involving 112 clinicians, published in NEJM AI, concluded that “widespread implementation of DAX in its current form is unlikely to generate appreciable gains for healthcare systems looking to increase productivity.” The gap between the burnout study and the productivity study is the gap between what physicians report feeling and what the visit volume data actually shows. Both are real. They measure different things. Most health system CFOs who approved ambient AI investment built their business case on the productivity story. The clinical validation literature does not consistently support it.
Specialty performance varies in ways that enterprise-wide deployments have not adequately addressed. Ambient AI performs best in structured, high-frequency encounter types: primary care, outpatient internal medicine, straightforward surgical follow-up. It performs least reliably in psychiatry, behavioral health, complex multimorbidity patients, and highly specialized subspecialties where terminology and documentation requirements diverge from the training distribution. Cleveland Clinic’s 80-specialty pilot was explicitly designed to identify these gaps before deployment. Health systems that skip that process are deploying based on primary care benchmarks across specialties that will produce materially different results.
Decision Required
When an ambient AI system drafts a clinical note that contains an error — a hallucinated medication, an omitted diagnosis, an incorrect procedure code — who is liable, and does your current framework reflect what you have actually deployed?
The legal answer is clear: the signing physician is responsible for the accuracy of the note. The AI vendor is not liable for documentation errors. The HIPAA Business Associate Agreement your health system executed with the vendor protects data handling — it does not allocate clinical liability. This has always been true for human scribes, and it is true for ambient AI scribes. The physician reviews the draft, edits it, and signs it. Whatever is in the signed note is the physician’s attestation.
The operational question is whether your physicians are actually reviewing the AI draft at the depth the liability framework assumes. The wellbeing literature shows physicians feel less burdened by documentation. The implicit risk in that finding is automation bias: the draft arrives complete and structurally correct, reducing the cognitive pressure to read it carefully. A physician who spends eight seconds reviewing a three-page AI-generated note before signing has not reviewed it. They have counter-signed it. That gap — between the review the liability framework assumes and the review that time-pressured physicians in a high-volume practice actually perform — is the governance question most health systems have not answered.
Options
Roll out the ambient AI platform to all physicians using the vendor’s standard training and onboarding workflow. Treat note review as a physician professional responsibility, covered by existing attestation policies. This is the fastest path to scale and the most common current approach. It does not address automation bias, does not include specialty-specific accuracy baselines, and does not update the liability and consent framework to reflect the new documentation workflow. It defers the governance question until the first adverse event raises it.
Pilot the platform in two or three high-volume outpatient specialties before enterprise expansion. Measure note accuracy against the department’s own documentation standards — not the vendor’s published benchmarks. Use the pilot data to set minimum accuracy thresholds for each specialty before activation. Slower than enterprise-wide deployment; produces specialty-specific performance data that enterprise-wide rollout cannot generate. Cleveland Clinic’s approach before selecting Ambience.
Combine deployment with three governance deliverables: (1) updated patient consent workflow that explicitly addresses AI-assisted documentation and audio recording; (2) defined minimum review time and edit rate standards that constitute adequate review for liability purposes; (3) updated malpractice carrier notification disclosing ambient AI use. More governance overhead than most health systems have applied to this deployment category. The correct answer for regulated settings and high-risk specialty environments, and the correct target state for any enterprise deployment over 500 physicians.
Recommendation
Run the specialty pilot before enterprise commitment. The productivity evidence is mixed. The wellbeing evidence is consistent. The business case that survives board scrutiny is the wellbeing and retention case — physician burnout is a material financial and operational risk at every health system. That case does not require productivity numbers. Make it honestly: this investment is about retaining physicians and reducing the documentation burden that contributes to burnout. Do not build the CFO approval on a productivity number the peer-reviewed literature will not support.
Before any enterprise deployment, complete three governance deliverables that most health systems have not touched. First, update the patient consent workflow. Audio recording of a patient encounter is not implicitly covered by a standard HIPAA notice. The BAA with the AI vendor covers data handling between the health system and vendor. Patient consent for ambient recording is a separate obligation that varies by state — and in some states, recording without explicit consent creates independent liability exposure. This is a legal review task, not a technology task. It should be completed before the first patient encounter.
Second, define what “adequate review” means in your malpractice context. Talk to your carrier. Disclose that ambient AI is drafting notes. Ask whether your current attestation policy — the physician reviews and signs — is sufficient, or whether the carrier expects a documented minimum review process. Some carriers are beginning to ask for this. Health systems that have the conversation now, before an adverse event, are in a materially different position than those that raise it after.
Third, measure accuracy by specialty before signing enterprise contracts. Ask every vendor shortlisted in your RFP for accuracy data by the specific specialties you intend to deploy — not aggregate accuracy scores. Psychiatry and behavioral health are the highest-risk categories. If the vendor cannot produce specialty-specific accuracy data for your use cases, your pilot needs to generate it. Deploy based on your data.
Enjoying this brief? The next one ships Tuesday.
One enterprise AI deployment, dissected weekly. Free during beta · No credit card · Unsubscribe anytime
Risks
A structurally complete AI-generated draft reduces the cognitive signal that prompts careful review. Physicians who reviewed handwritten or dictated notes for obvious errors now receive a polished, well-formatted document. The wellbeing improvement is real — but it partially reflects reduced cognitive load on documentation, including the cognitive load of careful error detection. Health systems that have measured edit rates post-deployment find them declining over time, not rising with familiarity. A declining edit rate is not evidence the AI is improving. It may be evidence that review depth is decreasing. This is the primary liability accumulation mechanism in ambient AI deployments.
Most health systems have addressed HIPAA by executing a BAA with their ambient AI vendor. Patient consent for audio recording is a separate legal question. Illinois, California, and several other states have specific consent requirements for audio recording that standard HIPAA notice does not satisfy. Health systems operating across multiple states face a patchwork. A patient who discovers that their encounter was audio-recorded under a privacy notice that did not explicitly disclose it has a complaint that is independent of the HIPAA framework. Legal review of consent workflow by state is required before deployment, not after the first complaint.
Published accuracy benchmarks come primarily from primary care and general outpatient settings — the use cases with the highest training data density and the most structured encounter formats. Psychiatry, behavioral health, oncology subspecialties, and complex multimorbidity cases produce materially lower accuracy. Health systems that deploy enterprise-wide based on primary care pilots are applying a performance assumption that does not transfer. Cleveland Clinic’s 80-specialty pilot is the exception. Most enterprise deployments lack equivalent specialty coverage data, meaning the accuracy floor in their highest-risk specialty environments is unknown.
The NEJM AI trial at Atrium Health, the largest randomized controlled trial of ambient AI clinical documentation published to date (112 clinicians), found that widespread implementation was “unlikely to generate appreciable gains for healthcare systems looking to increase productivity.” This is not a marginal finding. It is a direct contradiction of the productivity-framed business cases in most enterprise RFP approvals. Health systems that have committed to ambient AI on a productivity basis face a board-level credibility gap when the NEJM AI result is raised at renewal. Build the case on wellbeing and retention — where the evidence is consistent.
Questions Your Team Should Be Answering
These are the questions that distinguish organizations that get this right from those that do not. If your team cannot answer them, that is your first deliverable.
- 1.
Has your health system updated its patient consent workflow to explicitly address ambient AI audio recording — by state — before deploying? If not, what is the legal exposure in each state where you operate?
- 2.
Has your malpractice carrier been notified that ambient AI is drafting clinical notes? Have you received guidance on what constitutes adequate physician review for liability purposes under your current policy?
- 3.
What is the edit rate on AI-drafted notes at your health system — and has it been trending up or down over the past six months? A declining edit rate is a governance signal, not a quality signal.
- 4.
For which specialties have you measured accuracy against your own documentation standards — not the vendor's published benchmarks? Psychiatry, behavioral health, and high-complexity subspecialties require specialty-specific accuracy data before enterprise activation.
- 5.
Which vendor did you select — and did your RFP include a head-to-head specialty accuracy evaluation using your own patient encounter types, or did you rely on vendor-provided case studies and aggregate accuracy scores?
- 6.
What is your incident response process when a physician reports that an AI-drafted note contained a clinical error? Who receives the report, how is it investigated, and does your current adverse event reporting framework classify AI documentation errors as a reportable category?
If this memo belongs in your next executive meeting or board pack, send it along. One click opens a pre-drafted email — edit or send as-is.
The ATO Bottleneck: What Federal Agencies Discover When AI Procurement Meets the Authorization Process
Federal agencies are deploying AI tools across procurement, benefits processing, and workforce operations — but the ATO process was written for static systems. FedRAMP authorizes cloud infrastructure, not AI behavior. Most frontier AI APIs lack FedRAMP authorization, and most federal ATOs are stale by the time the model updates.
Read memo →The Algorithmic Underwriting Audit: What NAIC AI Requirements Mean for Every Insurer Using AI in Pricing and Claims
State insurance regulators have moved. The NAIC Model Bulletin on AI has been adopted in 38+ states. Colorado mandates external algorithmic audits for life insurance AI. California CDI has challenged AI-generated property risk scores. Most carriers have deployed AI in claims and underwriting without building the governance documentation regulators are now requiring.
Read memo →The SR 11-7 Blind Spot: What Banks Discover When AI Hits Model Risk Management
Banks are deploying AI in credit underwriting, fraud detection, compliance monitoring, and customer service — but SR 11-7, the OCC/Fed model risk framework, was written in 2011 for statistical models. The validation gap for third-party LLM APIs, the model version change management problem, and what bank examiners are beginning to ask.
Read memo →