When ChatGPT Becomes Your Doctor: Crisis of Medical Trust

When ChatGPT Becomes Your Doctor: The Regulatory Blind Spot Harming the Most Vulnerable

Main takeaway: People use general‑purpose AI to self‑diagnose. Current rules don’t clearly say who is responsible when that goes wrong. We need proportionate accountability and simple safety guardrails.

1) Regulatory reality — in one minute

At 2 a.m., two people upload the same mole photo to an AI.
Near a hospital: the AI is a convenience. In a primary‑care desert: the AI becomes the decision.

Today’s law:

The gap: People still use general AIs for medical decisions, but the product isn’t classified as a medical device. That leaves unclear responsibility when harm happens.

2) Why this matters

Clinical accuracy is uneven

  • Med‑PaLM 2: up to 86.5% on USMLE‑style multiple choice; open‑ended clinical prompts perform lower (often ≈ ~65% depending on task and scoring). [Nature 2023]
  • Symptom‑checker apps: ~57% appropriate triage (95% CI 52–61%). [BMJ Open 2019]

Access is unequal

  • 86.9 million Americans live in primary‑care shortage areas (HPSAs). [HRSA Q4 FY 2025]
  • For many, AI advice becomes de facto medical counsel.
A fatal scenario (illustrative)
A 58‑year‑old with chest tightness reads “could be anxiety or muscle strain,” waits, and dies of a heart attack. No doctor–patient relationship. No app marketed as medical. No obvious liable “manufacturer.” 1

3) The liability landscape (plain English)

  • Malpractice? Usually no—there’s no clinician involved.
  • Product liability? Possibly—design defect or failure to warn (foreseeable misuse). [Restatement (Second) § 402A; Restatement (Third) § 2]
  • Reality check: Success is uncertain for multi‑use tools that don’t claim medical purpose. As of Oct 17, 2025, no U.S. court has finally decided a case on injury from a general‑purpose AI used without health claims; several are pending.

4) Existing levers (regulation and “soft law”)

U.S.

  • Cures Act § 3060 / FDCA § 520(o): Some software sits outside device rules; FDA also uses enforcement discretion. [FDA Digital Health]
  • Q‑Submission (Pre‑Sub): Non‑binding FDA feedback on red‑flag routing and uncertainty displays. [FDA Q‑Submission]
  • Multiple‑Function Device Policy (2020): For apps mixing medical and non‑medical features. [FDA Guidance 2020]

EU AI Act (dates to know)

  • Feb 2, 2025: Prohibitions & AI‑literacy measures start.
  • Aug 2, 2025: General‑purpose AI (GPAI) provider duties begin.
  • Aug 2, 2026: Most obligations apply.
  • Aug 2, 2027: High‑risk obligations for AI in regulated products ramp in.
  • Tools: Art. 95 (voluntary codes of conduct) and Art. 53 (regulatory sandboxes). [Art. 95; Art. 53; Art. 113]

Other jurisdictions (very short)

  • Japan (PMDA): Treats qualifying AI as SaMD; sandbox “DASH” and change‑management pathways. [PMDA DASH]
  • China (NMPA): AI/algorithms regulated under existing medical‑device law. [NMPA]

5) A simple “Tier 2” middle ground

When usage shows meaningful health risk, attach proportionate duties—even if the AI isn’t a regulated device.

When Tier 2 turns on
TriggerThresholdHow to measure
Health‑query volume≥ 1,000,000 health queries/month per jurisdictionInternal, privacy‑safe metrics
Red‑flag prevalence≥ 5% include sentinel clusters (e.g., chest pain + shortness of breath)Audited intent‑classifier logs
Escalation friction≥ 30% drop‑off after a red‑flag promptUX funnel analytics
Harm signalAny SAE linked to advice and failed escalationComplaint/SAE log + root‑cause

What Tier 2 requires (guardrails)

  • Detect medical intent (with cautious thresholds).
  • Communicate uncertainty (no faux certainty; link to evidence).
  • Route red flags (one‑click to nurse lines/telehealth).
  • Track equity (HPSA escalation parity, ≤ 8th‑grade reading level, multilingual support).

6) Making safety real (how to build it)

  • Intent classifier: Use conservative thresholds to avoid missing red flags.
  • Adversarial defenses: Apply OWASP LLM Top‑10 mitigations (e.g., strip jailbreak instructions). [OWASP LLM Top 10]
  • Output handling: Always say “not a diagnosis,” present ranges, and avoid definitive labels.
  • Calibration monitoring: Post calibration cards quarterly; alarm on drift; default to escalation if uncertain. [Kadavath et al., 2022]
  • Privacy: Keep telemetry GDPR‑compliant—data minimization, purpose limitation, short retention, DPIA where required. [GDPR Art. 5]

7) Broader fixes (AI isn’t the only answer)

  • Telehealth reimbursement parity and after‑hours coverage
  • Community health‑worker and nurse‑line expansion
  • Mobile clinics and zero‑rated links to local services
  • Public health‑literacy campaigns

8) What to do next

Lawmakers & regulators

  • Clarify how “intended use” applies to general AIs.
  • Use sandboxes (EU) and pre‑subs (U.S.) to test guardrails fast.

Platforms & developers

  • Implement Tier‑2 guardrails where usage shows risk.
  • Publish calibration and equity metrics.
  • Document incident response and red‑flag routing in your QMS.

Health systems & payers

  • Offer low‑friction escalation endpoints (nurse lines, telehealth).
  • Partner with platforms to close the loop for red‑flag users.

Conclusion

People in care deserts will keep asking general AIs health questions. Without proportionate guardrails, they carry the risk alone. We can keep innovation moving and add basic protections. The fixes are practical: detect intent, show uncertainty, route red flags, and measure equity—then prove it with data.

Transparency note: Composite scenarios used; links point to official sources. 1 Composite myocardial‑infarction vignette reflects patterns in rural‑care delay literature; no specific platform identified.

Sources (selection)