ufraan
← Back

Full-Text Search Engine

Inverted index-based search with BM25 ranking and real-time indexing.

GoSearchIndexingInformation Retrieval
Sep 2024GitHub

Built a production search engine supporting full-text queries with BM25 relevance ranking. Implements inverted indexes with compression techniques for efficient storage and fast lookups. Real-time indexing allows documents to be searchable immediately after ingestion. Supports complex queries including phrase search, wildcards, boolean operators, and field-specific searches. Indexes 100k+ documents with millisecond query latencies.

Inverted index showing term-to-document mappings
Inverted index showing term-to-document mappings

BM25 scoring accounts for term frequency and document length normalization, producing better relevance than simple TF-IDF. Index compression uses dictionary encoding and bit-level packing to reduce memory footprint. The system includes query parsing, tokenization, and spell-checking to handle user input robustly.