AI Governance for Healthcare Software

HIPAA and Why Healthcare Code Is Different

HIPAA (Health Insurance Portability and Accountability Act) governs how Protected Health Information (PHI) is handled in software systems. PHI includes: patient names, dates of birth, Social Security numbers, medical record numbers, diagnoses, treatment records, insurance information, and any data that could identify a patient. Violations carry penalties from $100 to $50,000 per violation, up to $1.5 million per year per category.

AI-generated healthcare code must comply with HIPAA by default. The AI cannot generate: a patient search endpoint without access control, a report that includes PHI without authorization checks, a log message that contains patient names or diagnoses, or a database query that returns more PHI than needed (violation of minimum necessary principle). Every AI rule for healthcare encodes: what data is PHI, who can access it, how it must be stored, and how access must be logged.

The Business Associate Agreement (BAA): any third-party service that handles PHI must have a BAA with the covered entity. AI rule: 'Before integrating a third-party service that will process PHI (cloud storage, email, analytics, AI/ML): verify the provider offers a BAA. AWS, GCP, Azure: offer BAAs. Most SaaS tools: do not. Never send PHI to a service without a BAA.'

PHI Data Classification and Handling

The 18 HIPAA identifiers that make data PHI: names, geographic data (smaller than state), dates (except year), phone numbers, fax numbers, email addresses, Social Security numbers, medical record numbers, health plan beneficiary numbers, account numbers, certificate numbers, vehicle identifiers, device identifiers, web URLs, IP addresses, biometric identifiers, full-face photographs, and any other unique identifying number or code.

AI rule for data models: 'Every field in a patient-related data model must be classified as PHI or non-PHI. PHI fields require: encryption at rest (AES-256), encryption in transit (TLS 1.2+), access control (role-based), audit logging on every read and write, and minimum necessary exposure (only return fields the requester needs).' The AI should not generate a Patient model without these annotations or protections.

De-identification: data with all 18 identifiers removed is no longer PHI and can be used for analytics, research, and AI training. AI rule: 'For analytics and reporting endpoints: use de-identified data. Strip all 18 HIPAA identifiers. Use statistical de-identification (expert determination method) or safe harbor method (remove all identifiers plus verify remaining data cannot re-identify individuals).'

⚠️ Log Messages Must Not Contain PHI

The most common HIPAA violation in AI-generated code: console.log('Patient: ' + patient.name + ' diagnosed with: ' + diagnosis). Log messages flow to log aggregators (Datadog, Splunk, CloudWatch) that may not be HIPAA-compliant. AI rule: log patient ID (UUID) only, never names, diagnoses, or other PHI. The audit trail (HIPAA-compliant, encrypted, access-controlled) is where PHI access is recorded — not application logs.

Access Control and Audit Trails

Minimum necessary principle: users should only access the minimum amount of PHI required to perform their job function. A nurse: sees patient vitals and medication schedules. A billing clerk: sees insurance information and procedure codes. A researcher: sees de-identified data only. AI rule: 'Every PHI endpoint: implement role-based access control. Define roles (clinician, nurse, admin, billing, researcher) and the PHI fields each role can access. The AI must not generate a generic /patients endpoint that returns all fields to all users.'

Audit trail requirements: HIPAA requires logging of every access to PHI. The log must include: who accessed the data (user ID, role), what data was accessed (patient ID, record type), when (timestamp), from where (IP address, device), and why (purpose, if available). The audit log must be: tamper-evident (immutable), retained for 6 years minimum, and available for compliance audits. AI rule: 'Every PHI read/write: emit an immutable audit log entry. No PHI access without a corresponding audit record.'

Break-the-glass access: emergency situations may require access beyond normal role permissions. The system should: allow the access but flag it prominently in the audit trail, require a reason code (emergency, patient consent, legal), and trigger an alert to compliance officers. AI rule: 'Emergency access: allow but audit aggressively. Require reason code. Alert compliance team. This is better than blocking a clinician during a medical emergency.'

💡 Role-Based PHI Views Prevent Over-Exposure

Instead of one /api/patients/:id endpoint returning all fields: create role-specific views. /api/patients/:id/clinical (for clinicians: vitals, diagnoses, medications). /api/patients/:id/billing (for billing: insurance, procedure codes). /api/patients/:id/demographics (for registration: name, DOB, contact). Each view returns only the minimum necessary PHI for that role. The AI should generate role-specific DTOs, not a single Patient response.

Encryption and Infrastructure Requirements

Encryption at rest: all PHI stored in databases, file systems, or backups must be encrypted using AES-256 or equivalent. Database-level encryption (TDE) or application-level encryption (encrypt before storage). AI rule: 'PHI database columns: encrypted at rest. Use database-level TDE (Transparent Data Encryption) for broad coverage. Use application-level encryption for high-sensitivity fields (SSN, diagnosis codes) where even database admins should not see plaintext.'

Encryption in transit: all PHI transmission must use TLS 1.2 or higher. This includes: API calls, database connections, internal service communication, file transfers, and email (if PHI is emailed, which should be avoided). AI rule: 'All PHI endpoints: HTTPS only. Internal services: mutual TLS (mTLS) or encrypted service mesh. Database connections: require SSL/TLS. Never transmit PHI over unencrypted channels.'

Infrastructure: healthcare applications should run in HIPAA-eligible environments. AWS: use HIPAA-eligible services (listed in AWS BAA). GCP: use HIPAA-covered services. Azure: use HIPAA-compliant services. AI rule: 'Before selecting infrastructure services: verify HIPAA eligibility. Not all services from major cloud providers are HIPAA-eligible. Use only services covered under the BAA. Common gaps: managed AI/ML services, some database offerings, certain storage tiers.'

ℹ️ Not All Cloud Services Are HIPAA-Eligible

AWS has 150+ services but only ~90 are HIPAA-eligible (covered under the BAA). Amazon S3: eligible. Amazon Comprehend Medical: eligible. Amazon Lex: not eligible for PHI. Before the AI generates infrastructure code using a cloud service: verify it appears on the provider's HIPAA-eligible services list. The AI rule file should include a whitelist of approved services for the project.

Healthcare AI Governance Summary

Summary of AI governance rules for healthcare software teams building HIPAA-compliant applications.

PHI: 18 HIPAA identifiers. Classify every field. PHI fields need encryption + access control + audit
Minimum necessary: role-based access. Each role sees only the PHI fields needed for their function
Audit trails: every PHI access logged. Immutable, 6-year retention, includes who/what/when/where
Encryption at rest: AES-256 for all PHI. TDE for databases. App-level for high-sensitivity fields
Encryption in transit: TLS 1.2+ everywhere. mTLS for internal services. No unencrypted PHI channels
BAA: third-party services handling PHI must have a Business Associate Agreement
De-identification: remove all 18 identifiers for analytics and research data
Break-the-glass: allow emergency access but audit aggressively and alert compliance