Jeopardy! Search — David Souther

A technique in semantic search is to enrich documents using LLM generation. This approach uses LLMs and prompt engineering to create “similar” documents to the source document†. The intent is that these similar documents will expand the document’s area in the “source” embedding space (a subset of the entire embedding space) by establishing (likely nearby) related nodes, so that a cosine similarity lookup on query will result in more likely matches in the source space. A similar technique works during query, to create similar queries for semantic lookup into the source space.

Precise Zero-Shot Dense Retrieval without Relevance Labels

These techniques are expensive and limited success. Similar documents are likely to confabulate on the one hand, increasing the likelihood of negative matches (matched documents with incorrect contents for the query). On the other, expanding the query space at request time is very expensive, incurring multiple real-time LLM generation calls on the response path.

Jeopardy! search is a combination of these techniques, with an extra pinch of salt in the form of user personas. In Jeopardy! search, when a document is added to the knowledge base, instead of HyDE generating additional documents in the “source” embedding space, it generates queries which the document would answer, moving the semantic expansion into the “query” embedding space. The query side then moves its indirection - instead of semantic similarity against the source space, it does semantic similarity in the query space, which transparently returns the same original documents. (For production search cases, this would return nearest-neighbors and use additional ranking techniques; for graph retrieval augmented generation, it would expand a number of hops through the knowledge base and do the same.)

The pinch of salt for Jeopardy! search comes from having a history of personas and their searches. When prompting to generate the query set for a document, examples are taken from past good searches based on a persona. For instance, “The Executive Persona used ABC query and chose XYZ document.”, “The Designer Persona used JKL query and chose QWE document.” For either persona, then include the task “Generate 10 queries the persona would ask that document VBN would answer.”

† Technically, document chunk, where a document will be broken into chunks of appropriate size. Chunking is itself a bit of an art form, but typically ~512 tokens per chunk, ~64 token overlap, and some context-aware (heading level, prose vs code block) sections works well. [Citation needed, but I’m fairly certain I remember the gist of that right] The point being, Document Length is the O(n) for these operations.