CLAUDE.md for SQLAlchemy

Why SQLAlchemy Needs Version-Specific Rules

SQLAlchemy 2.0 is a fundamental API rewrite — the query() method is deprecated, Column() is replaced by mapped_column(), declarative_base() is replaced by DeclarativeBase, and the async API is first-class. AI assistants trained on a decade of SQLAlchemy 1.x content generate the old patterns: session.query(User).filter_by(email=email) instead of select(User).where(User.email == email), legacy Column types instead of Mapped annotations, and synchronous sessions in async applications.

The migration from 1.x to 2.0 is significant enough that code written in one style doesn't look like the other. Your CLAUDE.md must specify '2.0 style' or the AI will default to the patterns that dominate its training data — which are overwhelmingly 1.x.

These rules target SQLAlchemy 2.0+ with Alembic for migrations. They work with FastAPI, Flask, and standalone Python projects.

Rule 1: select() Over session.query()

The rule: 'Use the select() construct for all queries: stmt = select(User).where(User.email == email); result = await session.execute(stmt); user = result.scalar_one_or_none(). Never use session.query() — it's the 1.x API, deprecated in 2.0. select() is composable, type-safe with Mapped types, and works identically in sync and async sessions.'

For common patterns: 'Single record: result.scalar_one_or_none() (returns None or the object). Multiple records: result.scalars().all() (returns list). Count: select(func.count()).select_from(User). Exists: select(exists().where(User.email == email)). Pagination: select(User).offset(skip).limit(take).'

AI generates session.query(User).filter(User.email == email).first() — the 1.x pattern. The 2.0 equivalent is more explicit and composable: stmt = select(User).where(...); result = session.execute(stmt); user = result.scalar_one_or_none(). The extra line buys you composability and async compatibility.

select(Model).where() — never session.query(Model).filter()
session.execute(stmt) — then result.scalar_one_or_none() or .scalars().all()
Composable: stmt = select(User); if filter: stmt = stmt.where(...); session.execute(stmt)
func.count(), exists(), union(), except_ — SQL constructs in Python
select() works identically in sync and async sessions

⚠️ query() Is Deprecated

session.query(User).filter() is SQLAlchemy 1.x — deprecated in 2.0. The 2.0 way: select(User).where(). AI generates the old pattern from a decade of 1.x training data. Specify '2.0 style' in your rules.

Rule 2: Mapped Types and DeclarativeBase

The rule: 'Use DeclarativeBase and Mapped types for model definitions: class Base(DeclarativeBase): pass. class User(Base): __tablename__ = "users"; id: Mapped[int] = mapped_column(primary_key=True); name: Mapped[str] = mapped_column(String(100)); email: Mapped[str] = mapped_column(unique=True); created_at: Mapped[datetime] = mapped_column(default=func.now()). Never use legacy Column() or declarative_base() function.'

For optional fields: 'Use Mapped[str | None] for nullable columns: bio: Mapped[str | None] = mapped_column(Text, default=None). The Mapped annotation tells Python's type checker whether the field can be None — your IDE flags incorrect null handling before runtime.'

For relationships: 'Use Mapped with relationship: posts: Mapped[list["Post"]] = relationship(back_populates="author"). The string annotation ("Post") handles forward references. Use back_populates on both sides. Use lazy="selectin" for eager loading, lazy="select" (default) for lazy.'

ℹ️ Mapped = Type Safety

Mapped[str] tells Python's type checker the field is a string. Mapped[str | None] says it's nullable. Your IDE catches null handling errors before runtime. Legacy Column() provides zero type information.

Rule 3: AsyncSession for Async Applications

The rule: 'For FastAPI and other async frameworks: use AsyncSession with asyncpg or aiosqlite. Create engine: engine = create_async_engine(url). Create session factory: async_session = async_sessionmaker(engine, class_=AsyncSession). Use in endpoints: async with async_session() as session: result = await session.execute(select(User)). Never use synchronous Session in async endpoints — it blocks the event loop.'

For session management: 'Use async context manager: async with async_session() as session: async with session.begin(): ... Commit is automatic at the end of the begin() block. Use session.add() for new objects, session.merge() for detached objects. Explicitly await session.flush() if you need generated values (auto-increment IDs) before commit.'

For relationship loading in async: 'Lazy loading doesn't work with AsyncSession — accessing an unloaded relationship raises MissingGreenlet error. Always use eager loading: selectin_polymorphic, selectinload(), or joinedload() in the query. Or use await session.run_sync(lambda s: s.user.posts) for one-off lazy loads (not recommended for regular use).'

create_async_engine + async_sessionmaker + AsyncSession
async with session.begin() for auto-commit transactions
Never sync Session in async endpoints — blocks event loop
Eager loading required: selectinload(), joinedload() — no lazy in async
asyncpg for PostgreSQL, aiosqlite for SQLite in async

💡 Lazy Loading Breaks Async

Accessing an unloaded relationship in AsyncSession raises MissingGreenlet — a cryptic error. Always use selectinload() or joinedload() in async queries. Lazy loading only works with synchronous sessions.

Rule 4: Alembic for Migrations

The rule: 'Use Alembic for all schema migrations: alembic revision --autogenerate -m "description" creates a migration from model changes. alembic upgrade head applies migrations. Migration files in alembic/versions/ — committed to git. Review autogenerated migrations — Alembic detects most changes but misses some (data migrations, column renames).'

For configuration: 'Configure env.py to import your models: from app.models import Base; target_metadata = Base.metadata. For async: use run_async_migrations pattern in env.py. Set sqlalchemy.url in alembic.ini or override from environment variable in env.py.'

For data migrations: 'Add data migration logic in upgrade() and downgrade(): op.execute("UPDATE users SET role = 'member' WHERE role IS NULL"). Use op.bulk_insert for seeding. For complex data transformations, use a Session within the migration: session = Session(bind=op.get_bind()).'

Rule 5: SQLAlchemy-Specific Patterns

The rule: 'Use hybrid_property for computed attributes that work in both Python and SQL: @hybrid_property def full_name(self): return f"{self.first_name} {self.last_name}". Use association_proxy for simplified access through relationships. Use events (before_insert, after_update) for model lifecycle hooks — but keep them simple, like setting timestamps. Complex logic belongs in the service layer.'

For query optimization: 'Use joinedload() for single-row eager loading (one query with JOIN). Use selectinload() for collection eager loading (separate SELECT IN query — prevents Cartesian product). Use subqueryload() when selectinload generates too large an IN clause. Profile with echo=True on the engine or SQLAlchemy event listeners.'

For testing: 'Use a test database with Alembic migrations applied. Use session.begin_nested() (SAVEPOINT) for test isolation — roll back after each test. Use factories (factory_boy with SQLAlchemy integration) for test data. Never use an in-memory SQLite for PostgreSQL-specific tests — use Testcontainers with a real PostgreSQL.'

Complete SQLAlchemy Rules Template

Consolidated rules for SQLAlchemy 2.0+ projects.

2.0 API: select() over query() — session.execute(stmt) — never legacy patterns
Mapped types: DeclarativeBase, Mapped[type], mapped_column() — never Column()
AsyncSession for async apps: create_async_engine, asyncpg/aiosqlite — never sync in async
Eager loading in async: selectinload(), joinedload() — lazy raises MissingGreenlet
Alembic: autogenerate for migrations, review SQL, commit to git, data migrations manual
hybrid_property for computed — association_proxy for shortcuts — events for lifecycle
joinedload for single-row, selectinload for collections — profile with echo=True
factory_boy for test data — begin_nested for test isolation — Testcontainers for real DB