Best Practices

AI Rules for Search Implementation

AI implements search with LIKE '%query%' on every column. Rules for full-text search indexes, Elasticsearch integration, search relevance tuning, fuzzy matching, faceted search, and search-as-you-type patterns.

8 min read·March 2, 2025

LIKE '%query%' on every column — full table scan, no ranking, no typo tolerance, 3-second response

Postgres tsvector, Elasticsearch, relevance scoring, fuzzy matching, faceted search, search-as-you-type

AI Searches with LIKE and Hopes for the Best

AI generates search with: LIKE '%query%' on every text column (full table scan, no index usage, O(n) per query), no relevance ranking (results in insertion order, not by relevance), no typo tolerance (searching 'typscript' returns zero results for 'typescript'), no faceted filtering (users cannot narrow by category, date, or type), and synchronous blocking queries (search blocks the main thread while scanning millions of rows). At 1000 rows it works. At 100,000 rows the search takes 3 seconds. At 1 million rows the page times out.

Modern search implementation is: index-backed (tsvector in Postgres or inverted index in Elasticsearch), relevance-ranked (BM25 or TF-IDF scoring — best matches first), typo-tolerant (fuzzy matching catches misspellings within edit distance), facet-enabled (filter by category, date range, price range alongside text search), and performant (sub-100ms response at any scale with proper indexing). AI generates none of these.

These rules cover: Postgres full-text search with tsvector, Elasticsearch for advanced use cases, relevance scoring and boosting, fuzzy matching, faceted search, and search-as-you-type patterns.

Rule 2: Elasticsearch for Advanced Search

The rule: 'Use Elasticsearch (or OpenSearch, Typesense, Meilisearch) when you need: fuzzy matching with typo tolerance, faceted search with aggregations, complex relevance tuning with custom scoring, synonym support (searching 'JS' finds 'JavaScript'), autocomplete with edge n-grams, or search across millions of documents with sub-50ms latency. Elasticsearch is a dedicated search engine — it excels at search in ways a relational database cannot.'

For sync strategy: 'The database is the source of truth; Elasticsearch is a read-optimized projection. Sync patterns: (1) Dual-write: application writes to both database and Elasticsearch (simple but risks inconsistency on failure). (2) Change Data Capture: Debezium captures database changes and feeds Elasticsearch (consistent but more infrastructure). (3) Async events: on article.created/updated, publish an event that an indexer consumes (eventual consistency, decoupled). Reindex from database on schema changes.'

AI generates: Elasticsearch as the primary data store — no ACID transactions, no relational integrity, no reliable point-in-time recovery. Elasticsearch is a search index, not a database. The database stores the data; Elasticsearch indexes it for search. If the Elasticsearch index corrupts, reindex from the database. If the database corrupts, you have a real problem.

Rule 3: Fuzzy Matching and Typo Tolerance

The rule: 'Implement fuzzy matching for user-facing search: users make typos, and zero results for 'typscript' when 'typescript' exists is a terrible experience. Elasticsearch: { "match": { "title": { "query": "typscript", "fuzziness": "AUTO" } } } — AUTO allows 1 edit for 3-5 char terms, 2 edits for 6+ char terms. Postgres: pg_trgm extension with similarity() function and GIN trigram index: WHERE similarity(title, 'typscript') > 0.3.'

For the pg_trgm approach: 'CREATE EXTENSION pg_trgm; CREATE INDEX idx_articles_trgm ON articles USING GIN(title gin_trgm_ops). Query: SELECT *, similarity(title, 'typscript') as sim FROM articles WHERE title % 'typscript' ORDER BY sim DESC. The % operator uses the similarity threshold (default 0.3). Trigram matching: 'typscript' shares most trigrams with 'typescript' — the match is found despite the missing 'e'. No Elasticsearch required for basic fuzzy search.'

AI generates: exact-match search only. User searches 'recact hooks' (common typo): zero results. User assumes the content does not exist and leaves. Fuzzy matching: 'recact' matches 'react' (edit distance 1), results appear, user finds what they need. One feature prevents one of the highest-friction search failure modes.

⚠️ Zero Results = User Leaves

User searches 'recact hooks' (common typo): exact match returns zero results. User assumes the content doesn't exist. Fuzzy matching: 'recact' matches 'react' with edit distance 1, results appear. One feature prevents the highest-friction search failure mode.

Rule 5: Search-As-You-Type with Debounce

The rule: 'Implement search-as-you-type with: (1) debounce of 200-300ms (do not fire a search on every keystroke — wait for the user to pause), (2) minimum query length of 2-3 characters (single characters return too many results), (3) cancel previous request when a new keystroke arrives (AbortController to cancel in-flight fetch), (4) show results in a dropdown overlay (not a page navigation), (5) highlight matching terms in results (bold the matched portion of the result text).'

For the implementation: 'const [query, setQuery] = useState(""); const debouncedQuery = useDebounce(query, 300); useEffect(() => { if (debouncedQuery.length < 2) return; const controller = new AbortController(); fetchResults(debouncedQuery, controller.signal).then(setResults); return () => controller.abort(); }, [debouncedQuery]). The debounce prevents firing on every keystroke. The AbortController cancels stale requests. The minimum length prevents broad queries. The cleanup function prevents race conditions.'

AI generates: onInput={() => fetch('/api/search?q=' + input.value)} — a fetch on every keystroke. Typing 'react hooks' fires 11 requests. The responses arrive out of order: the result for 'react h' overwrites the result for 'react hooks'. With debounce + AbortController: 1-2 requests total, no race conditions, no wasted bandwidth. Same feature, 10x fewer requests, correct results.

ℹ️ 11 Requests to 1

onInput fires a fetch on every keystroke: typing 'react hooks' = 11 requests, responses arrive out of order. Debounce (300ms) + AbortController: 1-2 requests total, no race conditions, correct results. Same feature, 10x fewer requests.

Complete Search Implementation Rules Template

Consolidated rules for search implementation.

  • Postgres tsvector + GIN index for 80% of search needs — no extra infrastructure
  • setweight for field boosting: title (A) > description (B) > content (C)
  • Elasticsearch for: fuzzy matching, facets, synonyms, autocomplete, millions of docs
  • Database is source of truth, Elasticsearch is search index — sync via CDC or events
  • Fuzzy matching: pg_trgm for Postgres, fuzziness: AUTO for Elasticsearch
  • Faceted search: text query + structured filters + sidebar counts
  • Debounce 200-300ms + AbortController for search-as-you-type
  • Minimum query length 2-3 chars, highlight matches, dropdown overlay for results