Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
251 changes: 249 additions & 2 deletions notebooks/02_hybrid_queries.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1335,6 +1335,252 @@
"In some cases, you may need to use a filter expression to execute a `CountQuery` that simply returns the count of the number of entities in the pertaining set. The `count()` method provides efficient server-side counting without retrieving the actual documents."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## True Hybrid Search: Text + Vector Scoring\n",
"\n",
"The queries above demonstrate **filtered vector search** — they use metadata filters (tag, numeric, geo) to narrow down candidates, but scoring is based solely on vector similarity.\n",
"\n",
"**True hybrid search** combines **text relevance (BM25)** AND **vector similarity** scoring in a single query. This means results are ranked by both how well they match keywords and how semantically similar they are to the query vector.\n",
"\n",
"RedisVL provides two hybrid search implementations:\n",
"- **`HybridQuery`** — Uses the native `FT.HYBRID` command (Redis 8.4+). Supports RRF and LINEAR score fusion.\n",
"- **`AggregateHybridQuery`** — Uses `FT.AGGREGATE` (Redis 7.4+). Weighted linear score combination."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"HybridQuery and AggregateHybridQuery imported successfully!\n"
]
}
],
"source": [
"// Import HybridQuery and AggregateHybridQuery classes\n",
"import com.redis.vl.query.HybridQuery;\n",
"import com.redis.vl.query.AggregateHybridQuery;\n",
"\n",
"System.out.println(\"HybridQuery and AggregateHybridQuery imported successfully!\");"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"HybridQuery results (text='engineer', default RRF combination):\n",
"┌────────────┬───────────────┬─────────────┬──────────────┬─────────────┬─────────────┬───────┬─────────────────┬───┐\n",
"│credit_score│vector_distance│text_score │hybrid_score │job │__score │user │vector_similarity│age│\n",
"├────────────┼───────────────┼─────────────┼──────────────┼─────────────┼─────────────┼───────┼─────────────────┼───┤\n",
"│high │0 │2.32630164941│1.39789049482 │engineer │2.32630164941│john │1 │18 │\n",
"│high │0.109129190445 │2.32630164941│1.35969527817 │engineer │2.32630164941│tyler │0.945435404778 │100│\n",
"│low │0 │0 │0.7 │doctor │0 │derrick│1 │14 │\n",
"│high │0.158808887005 │0 │0.644416889548│dermatologist│0 │tim │0.920595556498 │12 │\n",
"│low │0.217881977558 │0 │0.623741307855│CEO │0 │taimur │0.891059011221 │15 │\n",
"└────────────┴───────────────┴─────────────┴──────────────┴─────────────┴─────────────┴───────┴─────────────────┴───┘\n"
]
}
],
"source": [
"// HybridQuery with default combination (RRF)\n",
"// Searches the \"job\" text field and \"user_embedding\" vector field simultaneously\n",
"HybridQuery hybridQuery = HybridQuery.builder()\n",
" .text(\"engineer\")\n",
" .textFieldName(\"job\")\n",
" .vector(new float[]{0.1f, 0.1f, 0.5f})\n",
" .vectorFieldName(\"user_embedding\")\n",
" .returnFields(List.of(\"user\", \"job\", \"age\", \"credit_score\"))\n",
" .numResults(5)\n",
" .build();\n",
"\n",
"List<Map<String, Object>> hybridResults = index.query(hybridQuery);\n",
"System.out.println(\"HybridQuery results (text='engineer', default RRF combination):\");\n",
"resultPrint(hybridResults);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### RRF vs LINEAR Combination\n",
"\n",
"**RRF (Reciprocal Rank Fusion)** is the default combination method. It merges ranked result lists from text and vector search without requiring score normalization. Controlled by `rrfWindow` and `rrfConstant`.\n",
"\n",
"**LINEAR** combination directly weights the text and vector scores. The `linearAlpha` parameter controls the text weight:\n",
"- `linearAlpha = 0.3` (default) → 30% text, 70% vector\n",
"- `linearAlpha = 0.5` → 50/50 balanced\n",
"- `linearAlpha = 0.7` → 70% text, 30% vector\n",
"\n",
"> **Note:** In `HybridQuery`, `linearAlpha` is the **text** weight. This is the opposite of `AggregateHybridQuery.alpha`, which is the **vector** weight."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"HybridQuery results (text='doctor', LINEAR with alpha=0.5):\n",
"┌────────────┬───────────────┬─────────────┬──────────────┬─────────────┬─────────────┬───────┬─────────────────┬───┐\n",
"│credit_score│vector_distance│text_score │hybrid_score │job │__score │user │vector_similarity│age│\n",
"├────────────┼───────────────┼─────────────┼──────────────┼─────────────┼─────────────┼───────┼─────────────────┼───┤\n",
"│low │0 │1.16315082471│1.08157541235 │doctor │1.16315082471│derrick│1 │14 │\n",
"│high │0 │0 │0.5 │engineer │0 │john │1 │18 │\n",
"│high │0.109129190445 │0 │0.472717702389│engineer │0 │tyler │0.945435404778 │100│\n",
"│high │0.158808887005 │0 │0.460297778249│dermatologist│0 │tim │0.920595556498 │12 │\n",
"│low │0.217881977558 │0 │0.44552950561 │CEO │0 │taimur │0.891059011221 │15 │\n",
"└────────────┴───────────────┴─────────────┴──────────────┴─────────────┴─────────────┴───────┴─────────────────┴───┘\n"
]
}
],
"source": [
"// HybridQuery with LINEAR combination and custom alpha\n",
"HybridQuery linearQuery = HybridQuery.builder()\n",
" .text(\"doctor\")\n",
" .textFieldName(\"job\")\n",
" .vector(new float[]{0.1f, 0.1f, 0.5f})\n",
" .vectorFieldName(\"user_embedding\")\n",
" .combinationMethod(HybridQuery.CombinationMethod.LINEAR)\n",
" .linearAlpha(0.5f) // 50% text weight, 50% vector weight\n",
" .returnFields(List.of(\"user\", \"job\", \"age\", \"credit_score\"))\n",
" .numResults(5)\n",
" .build();\n",
"\n",
"List<Map<String, Object>> linearResults = index.query(linearQuery);\n",
"System.out.println(\"HybridQuery results (text='doctor', LINEAR with alpha=0.5):\");\n",
"resultPrint(linearResults);"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"HybridQuery with filter (text='engineer', credit_score=high):\n",
"┌────────────┬───────────────┬──────────────┬──────────────┬─────────────┬──────────────┬─────┬─────────────────┬───┐\n",
"│credit_score│vector_distance│text_score │hybrid_score │job │__score │user │vector_similarity│age│\n",
"├────────────┼───────────────┼──────────────┼──────────────┼─────────────┼──────────────┼─────┼─────────────────┼───┤\n",
"│high │0 │2.90166580177 │1.57049974053 │engineer │2.90166580177 │john │1 │18 │\n",
"│high │0.109129190445 │2.90166580177 │1.53230452387 │engineer │2.90166580177 │tyler│0.945435404778 │100│\n",
"│high │0.158808887005 │0.575364152354│0.817026135255│dermatologist│0.575364152354│tim │0.920595556498 │12 │\n",
"│high │0.266666650772 │0.575364152354│0.779275917936│doctor │0.575364152354│nancy│0.866666674614 │94 │\n",
"└────────────┴───────────────┴──────────────┴──────────────┴─────────────┴──────────────┴─────┴─────────────────┴───┘\n"
]
}
],
"source": [
"// HybridQuery with filter expression\n",
"// Combine hybrid text+vector search with metadata filtering\n",
"HybridQuery filteredHybrid = HybridQuery.builder()\n",
" .text(\"engineer\")\n",
" .textFieldName(\"job\")\n",
" .vector(new float[]{0.1f, 0.1f, 0.5f})\n",
" .vectorFieldName(\"user_embedding\")\n",
" .filterExpression(Filter.tag(\"credit_score\", \"high\"))\n",
" .returnFields(List.of(\"user\", \"job\", \"age\", \"credit_score\"))\n",
" .numResults(5)\n",
" .build();\n",
"\n",
"List<Map<String, Object>> filteredHybridResults = index.query(filteredHybrid);\n",
"System.out.println(\"HybridQuery with filter (text='engineer', credit_score=high):\");\n",
"resultPrint(filteredHybridResults);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### AggregateHybridQuery (FT.AGGREGATE)\n",
"\n",
"`AggregateHybridQuery` uses the `FT.AGGREGATE` command and works with **Redis 7.4+**. It computes a weighted linear combination of BM25 text scores and vector similarity scores:\n",
"\n",
"```\n",
"hybrid_score = (1 - alpha) * text_score + alpha * vector_similarity\n",
"```\n",
"\n",
"Where `alpha` is the **vector weight** (default 0.7 = 70% vector, 30% text).\n",
"\n",
"Use `AggregateHybridQuery` when:\n",
"- Your Redis version is 7.4–8.3 (before `FT.HYBRID` was available)\n",
"- You need explicit control over the aggregation pipeline"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"AggregateHybridQuery results (text='doctor', alpha=0.7):\n",
"┌────────────┬───────────────┬─────────────┬──────────────┬─────────────┬─────────────┬───────┬─────────────────┬───┐\n",
"│credit_score│vector_distance│text_score │hybrid_score │job │__score │user │vector_similarity│age│\n",
"├────────────┼───────────────┼─────────────┼──────────────┼─────────────┼─────────────┼───────┼─────────────────┼───┤\n",
"│low │0 │1.16315082471│1.04894524741 │doctor │1.16315082471│derrick│1 │14 │\n",
"│high │0 │0 │0.7 │engineer │0 │john │1 │18 │\n",
"│high │0.109129190445 │0 │0.661804783344│engineer │0 │tyler │0.945435404778 │100│\n",
"│high │0.158808887005 │0 │0.644416889548│dermatologist│0 │tim │0.920595556498 │12 │\n",
"│low │0.217881977558 │0 │0.623741307855│CEO │0 │taimur │0.891059011221 │15 │\n",
"└────────────┴───────────────┴─────────────┴──────────────┴─────────────┴─────────────┴───────┴─────────────────┴───┘\n"
]
}
],
"source": [
"// AggregateHybridQuery example\n",
"AggregateHybridQuery aggQuery = AggregateHybridQuery.builder()\n",
" .text(\"doctor\")\n",
" .textFieldName(\"job\")\n",
" .vector(new float[]{0.1f, 0.1f, 0.5f})\n",
" .vectorFieldName(\"user_embedding\")\n",
" .alpha(0.7f) // 70% vector weight, 30% text weight\n",
" .returnFields(List.of(\"user\", \"job\", \"age\", \"credit_score\"))\n",
" .numResults(5)\n",
" .build();\n",
"\n",
"List<Map<String, Object>> aggResults = index.query(aggQuery);\n",
"System.out.println(\"AggregateHybridQuery results (text='doctor', alpha=0.7):\");\n",
"resultPrint(aggResults);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### HybridQuery vs AggregateHybridQuery\n",
"\n",
"| Feature | HybridQuery | AggregateHybridQuery |\n",
"|---|---|---|\n",
"| Redis command | `FT.HYBRID` | `FT.AGGREGATE` |\n",
"| Min Redis version | 8.4+ | 7.4+ |\n",
"| Combination methods | RRF (default), LINEAR | Weighted linear only |\n",
"| Alpha semantics | `linearAlpha` = text weight (default 0.3) | `alpha` = vector weight (default 0.7) |\n",
"| Score aliases | Yes (`yieldTextScoreAs`, etc.) | No |\n",
"| Fallback | Auto-falls back to AggregateHybridQuery | N/A |\n",
"\n",
"**Recommendation:** Use `HybridQuery` when possible — it automatically falls back to `AggregateHybridQuery` on older Redis versions via `toAggregateHybridQuery()`."
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand All @@ -1346,7 +1592,7 @@
},
{
"cell_type": "code",
"execution_count": 28,
"execution_count": 33,
"metadata": {
"vscode": {
"languageId": "java"
Expand Down Expand Up @@ -1387,8 +1633,9 @@
"5. **Filter Combinations**: AND/OR logic for complex queries\n",
"6. **Pure Filter Queries**: Traditional database-style queries without vector similarity\n",
"7. **Raw Query Strings**: Direct Redis query execution\n",
"8. **Hybrid Search**: True text + vector scoring with `HybridQuery` (FT.HYBRID, Redis 8.4+) and `AggregateHybridQuery` (FT.AGGREGATE, Redis 7.4+)\n",
"\n",
"These hybrid query capabilities make RedisVL a powerful tool for building sophisticated search applications that combine traditional database operations with vector similarity search."
"These query capabilities make RedisVL a powerful tool for building sophisticated search applications that combine traditional database operations with vector similarity search and hybrid text+vector scoring."
]
}
],
Expand Down
Loading