AI Rules for Search Implementation

AI Searches with LIKE and Hopes for the Best

AI generates search with: LIKE '%query%' on every text column (full table scan, no index usage, O(n) per query), no relevance ranking (results in insertion order, not by relevance), no typo tolerance (searching 'typscript' returns zero results for 'typescript'), no faceted filtering (users cannot narrow by category, date, or type), and synchronous blocking queries (search blocks the main thread while scanning millions of rows). At 1000 rows it works. At 100,000 rows the search takes 3 seconds. At 1 million rows the page times out.

Modern search implementation is: index-backed (tsvector in Postgres or inverted index in Elasticsearch), relevance-ranked (BM25 or TF-IDF scoring — best matches first), typo-tolerant (fuzzy matching catches misspellings within edit distance), facet-enabled (filter by category, date range, price range alongside text search), and performant (sub-100ms response at any scale with proper indexing). AI generates none of these.

These rules cover: Postgres full-text search with tsvector, Elasticsearch for advanced use cases, relevance scoring and boosting, fuzzy matching, faceted search, and search-as-you-type patterns.

Rule 1: Postgres Full-Text Search with tsvector

The rule: 'For most applications, Postgres full-text search is sufficient and avoids adding Elasticsearch infrastructure. Create a tsvector column: ALTER TABLE articles ADD COLUMN search_vector tsvector. Populate it: UPDATE articles SET search_vector = to_tsvector('english', title || ' ' || coalesce(description, '') || ' ' || coalesce(content, '')). Create a GIN index: CREATE INDEX idx_articles_search ON articles USING GIN(search_vector). Query: SELECT * FROM articles WHERE search_vector @@ plainto_tsquery('english', 'react hooks') ORDER BY ts_rank(search_vector, query) DESC.'

For relevance ranking: 'ts_rank scores results by term frequency and document length. ts_rank_cd adds proximity scoring (terms closer together rank higher). Boost important fields: setweight(to_tsvector('english', title), 'A') || setweight(to_tsvector('english', description), 'B') || setweight(to_tsvector('english', content), 'C'). Title matches (weight A) rank higher than content matches (weight C). This gives results a natural relevance order without Elasticsearch.'

AI generates: WHERE title LIKE '%react%' OR description LIKE '%react%' OR content LIKE '%react%' — no index usage (leading wildcard prevents B-tree index), no word boundary awareness ('react' matches 'overreact' and 'reactionary'), and no ranking (all matches are equal). tsvector: indexed (sub-10ms), word-aware (matches 'react' as a word), and ranked (title matches beat content matches). Same query, fundamentally different implementation.

tsvector column + GIN index: sub-10ms full-text search at any table size
to_tsvector('english', ...) for language-aware stemming: 'running' matches 'run'
setweight for field boosting: title (A) > description (B) > content (C)
ts_rank for relevance scoring: term frequency, proximity, field weight
Postgres FTS handles 80% of search needs without additional infrastructure

💡 Postgres FTS Handles 80%

tsvector column + GIN index: sub-10ms full-text search at any table size. Language-aware stemming ('running' matches 'run'), field boosting (title > description > content), and relevance ranking. No Elasticsearch infrastructure needed for most applications.

Rule 2: Elasticsearch for Advanced Search

The rule: 'Use Elasticsearch (or OpenSearch, Typesense, Meilisearch) when you need: fuzzy matching with typo tolerance, faceted search with aggregations, complex relevance tuning with custom scoring, synonym support (searching 'JS' finds 'JavaScript'), autocomplete with edge n-grams, or search across millions of documents with sub-50ms latency. Elasticsearch is a dedicated search engine — it excels at search in ways a relational database cannot.'

For sync strategy: 'The database is the source of truth; Elasticsearch is a read-optimized projection. Sync patterns: (1) Dual-write: application writes to both database and Elasticsearch (simple but risks inconsistency on failure). (2) Change Data Capture: Debezium captures database changes and feeds Elasticsearch (consistent but more infrastructure). (3) Async events: on article.created/updated, publish an event that an indexer consumes (eventual consistency, decoupled). Reindex from database on schema changes.'

AI generates: Elasticsearch as the primary data store — no ACID transactions, no relational integrity, no reliable point-in-time recovery. Elasticsearch is a search index, not a database. The database stores the data; Elasticsearch indexes it for search. If the Elasticsearch index corrupts, reindex from the database. If the database corrupts, you have a real problem.

Rule 3: Fuzzy Matching and Typo Tolerance

The rule: 'Implement fuzzy matching for user-facing search: users make typos, and zero results for 'typscript' when 'typescript' exists is a terrible experience. Elasticsearch: { "match": { "title": { "query": "typscript", "fuzziness": "AUTO" } } } — AUTO allows 1 edit for 3-5 char terms, 2 edits for 6+ char terms. Postgres: pg_trgm extension with similarity() function and GIN trigram index: WHERE similarity(title, 'typscript') > 0.3.'

For the pg_trgm approach: 'CREATE EXTENSION pg_trgm; CREATE INDEX idx_articles_trgm ON articles USING GIN(title gin_trgm_ops). Query: SELECT *, similarity(title, 'typscript') as sim FROM articles WHERE title % 'typscript' ORDER BY sim DESC. The % operator uses the similarity threshold (default 0.3). Trigram matching: 'typscript' shares most trigrams with 'typescript' — the match is found despite the missing 'e'. No Elasticsearch required for basic fuzzy search.'

AI generates: exact-match search only. User searches 'recact hooks' (common typo): zero results. User assumes the content does not exist and leaves. Fuzzy matching: 'recact' matches 'react' (edit distance 1), results appear, user finds what they need. One feature prevents one of the highest-friction search failure modes.

⚠️ Zero Results = User Leaves

User searches 'recact hooks' (common typo): exact match returns zero results. User assumes the content doesn't exist. Fuzzy matching: 'recact' matches 'react' with edit distance 1, results appear. One feature prevents the highest-friction search failure mode.

Rule 4: Faceted Search and Filtering

The rule: 'Combine text search with structured filtering: search for 'react hooks' AND category = 'Best Practices' AND published_after = '2025-01-01'. Facets show available filter values with counts: Best Practices (42), Framework Guide (28), Tool Comparison (15). Clicking a facet narrows the results without a new search. Implementation: Elasticsearch aggregations for facets, or Postgres with GROUP BY on filter columns alongside the text search.'

For the user experience: 'Faceted search follows the pattern: (1) user types a query, (2) results appear with facets in a sidebar, (3) user clicks a facet to narrow results, (4) facets update to reflect the narrowed set (counts change, zero-count facets optionally hidden), (5) user can combine multiple facets (category + date range + author). Each facet click refines the results without losing the text query. The URL encodes the state: /search?q=react+hooks&category=best-practices&after=2025-01-01.'

AI generates: a text input that returns a flat list of results with no way to filter. Users with specific needs (newest articles about React testing) must: search, scroll through 100 results, give up, try a more specific query, repeat. Facets: search 'react', click 'Testing' facet, click '2025' facet — 3 clicks to the exact result set. Faceted search reduces search abandonment by 30-50%.

Text search + structured filters: query + category + date + author in one request
Facets with counts: Best Practices (42), Framework Guide (28) — shown in sidebar
Facet clicks refine results without new search — additive filtering
URL-encoded state: /search?q=react&category=best-practices — shareable, bookmarkable
Elasticsearch aggregations or Postgres GROUP BY for facet generation

Rule 5: Search-As-You-Type with Debounce

The rule: 'Implement search-as-you-type with: (1) debounce of 200-300ms (do not fire a search on every keystroke — wait for the user to pause), (2) minimum query length of 2-3 characters (single characters return too many results), (3) cancel previous request when a new keystroke arrives (AbortController to cancel in-flight fetch), (4) show results in a dropdown overlay (not a page navigation), (5) highlight matching terms in results (bold the matched portion of the result text).'

For the implementation: 'const [query, setQuery] = useState(""); const debouncedQuery = useDebounce(query, 300); useEffect(() => { if (debouncedQuery.length < 2) return; const controller = new AbortController(); fetchResults(debouncedQuery, controller.signal).then(setResults); return () => controller.abort(); }, [debouncedQuery]). The debounce prevents firing on every keystroke. The AbortController cancels stale requests. The minimum length prevents broad queries. The cleanup function prevents race conditions.'

AI generates: onInput={() => fetch('/api/search?q=' + input.value)} — a fetch on every keystroke. Typing 'react hooks' fires 11 requests. The responses arrive out of order: the result for 'react h' overwrites the result for 'react hooks'. With debounce + AbortController: 1-2 requests total, no race conditions, no wasted bandwidth. Same feature, 10x fewer requests, correct results.

ℹ️ 11 Requests to 1

onInput fires a fetch on every keystroke: typing 'react hooks' = 11 requests, responses arrive out of order. Debounce (300ms) + AbortController: 1-2 requests total, no race conditions, correct results. Same feature, 10x fewer requests.

Complete Search Implementation Rules Template

Consolidated rules for search implementation.

Postgres tsvector + GIN index for 80% of search needs — no extra infrastructure
setweight for field boosting: title (A) > description (B) > content (C)
Elasticsearch for: fuzzy matching, facets, synonyms, autocomplete, millions of docs
Database is source of truth, Elasticsearch is search index — sync via CDC or events
Fuzzy matching: pg_trgm for Postgres, fuzziness: AUTO for Elasticsearch
Faceted search: text query + structured filters + sidebar counts
Debounce 200-300ms + AbortController for search-as-you-type
Minimum query length 2-3 chars, highlight matches, dropdown overlay for results