|
5 | 5 | "cell_type": "markdown", |
6 | 6 | "metadata": {}, |
7 | 7 | "source": [ |
8 | | - "# Complex Queries\n", |
| 8 | + "# Query\n", |
9 | 9 | "\n", |
10 | 10 | "In this notebook, we will explore more complex queries that can be performed with ``redisvl``\n", |
11 | 11 | "\n", |
|
95 | 95 | "name": "stdout", |
96 | 96 | "output_type": "stream", |
97 | 97 | "text": [ |
98 | | - "\u001b[32m19:55:11\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m Indices:\n", |
99 | | - "\u001b[32m19:55:11\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m 1. user_index\n" |
| 98 | + "\u001b[32m17:09:16\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m Indices:\n", |
| 99 | + "\u001b[32m17:09:16\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m 1. user_index\n" |
100 | 100 | ] |
101 | 101 | } |
102 | 102 | ], |
|
120 | 120 | "cell_type": "markdown", |
121 | 121 | "metadata": {}, |
122 | 122 | "source": [ |
123 | | - "## Executing Hybrid Queries\n", |
| 123 | + "## Hybrid Queries\n", |
124 | 124 | "\n", |
125 | 125 | "Hybrid queries are queries that combine multiple types of filters. For example, you may want to search for a user that is a certain age, has a certain job, and is within a certain distance of a location. This is a hybrid query that combines numeric, tag, and geographic filters." |
126 | 126 | ] |
|
544 | 544 | "result_print(index.query(v))" |
545 | 545 | ] |
546 | 546 | }, |
| 547 | + { |
| 548 | + "cell_type": "markdown", |
| 549 | + "metadata": {}, |
| 550 | + "source": [ |
| 551 | + "## Filter Queries\n", |
| 552 | + "\n", |
| 553 | + "In some cases, you may not want to run a vector query, but just use a ``FilterExpression`` similar to a SQL query. The ``FilterQuery`` class enable this functionality. It is similar to the ``VectorQuery`` class but soley takes a ``FilterExpression``." |
| 554 | + ] |
| 555 | + }, |
| 556 | + { |
| 557 | + "cell_type": "code", |
| 558 | + "execution_count": 19, |
| 559 | + "metadata": {}, |
| 560 | + "outputs": [ |
| 561 | + { |
| 562 | + "data": { |
| 563 | + "text/html": [ |
| 564 | + "<table><tr><th>user</th><th>credit_score</th><th>age</th><th>job</th></tr><tr><td>derrick</td><td>low</td><td>14</td><td>doctor</td></tr><tr><td>taimur</td><td>low</td><td>15</td><td>CEO</td></tr></table>" |
| 565 | + ], |
| 566 | + "text/plain": [ |
| 567 | + "<IPython.core.display.HTML object>" |
| 568 | + ] |
| 569 | + }, |
| 570 | + "metadata": {}, |
| 571 | + "output_type": "display_data" |
| 572 | + } |
| 573 | + ], |
| 574 | + "source": [ |
| 575 | + "from redisvl.query import FilterQuery\n", |
| 576 | + "\n", |
| 577 | + "has_low_credit = Tag(\"credit_score\") == \"low\"\n", |
| 578 | + "\n", |
| 579 | + "filter_query = FilterQuery(\n", |
| 580 | + " return_fields=[\"user\", \"credit_score\", \"age\", \"job\", \"location\"],\n", |
| 581 | + " filter_expression=has_low_credit\n", |
| 582 | + ")\n", |
| 583 | + "\n", |
| 584 | + "results = index.query(filter_query)\n", |
| 585 | + "\n", |
| 586 | + "result_print(results)" |
| 587 | + ] |
| 588 | + }, |
| 589 | + { |
| 590 | + "cell_type": "markdown", |
| 591 | + "metadata": {}, |
| 592 | + "source": [ |
| 593 | + "## Range Queries\n", |
| 594 | + "\n", |
| 595 | + "Range Queries are a useful method to perform a vector search where only results within a vector ``distance_threshold`` are returned. This enables the user to find all records within their dataset that are similar to a query vector where \"similar\" is defined by a quantitative value." |
| 596 | + ] |
| 597 | + }, |
| 598 | + { |
| 599 | + "cell_type": "code", |
| 600 | + "execution_count": 20, |
| 601 | + "metadata": {}, |
| 602 | + "outputs": [ |
| 603 | + { |
| 604 | + "data": { |
| 605 | + "text/html": [ |
| 606 | + "<table><tr><th>vector_distance</th><th>user</th><th>credit_score</th><th>age</th><th>job</th></tr><tr><td>0</td><td>john</td><td>high</td><td>18</td><td>engineer</td></tr><tr><td>0</td><td>derrick</td><td>low</td><td>14</td><td>doctor</td></tr><tr><td>0.109129190445</td><td>tyler</td><td>high</td><td>100</td><td>engineer</td></tr><tr><td>0.158809006214</td><td>tim</td><td>high</td><td>12</td><td>dermatologist</td></tr></table>" |
| 607 | + ], |
| 608 | + "text/plain": [ |
| 609 | + "<IPython.core.display.HTML object>" |
| 610 | + ] |
| 611 | + }, |
| 612 | + "metadata": {}, |
| 613 | + "output_type": "display_data" |
| 614 | + } |
| 615 | + ], |
| 616 | + "source": [ |
| 617 | + "from redisvl.query import RangeQuery\n", |
| 618 | + "\n", |
| 619 | + "range_query = RangeQuery(\n", |
| 620 | + " vector=[0.1, 0.1, 0.5],\n", |
| 621 | + " vector_field_name=\"user_embedding\",\n", |
| 622 | + " return_fields=[\"user\", \"credit_score\", \"age\", \"job\", \"location\"],\n", |
| 623 | + " distance_threshold=0.2\n", |
| 624 | + ")\n", |
| 625 | + "\n", |
| 626 | + "# same as the vector query or filter query\n", |
| 627 | + "results = index.query(range_query)\n", |
| 628 | + "\n", |
| 629 | + "result_print(results)" |
| 630 | + ] |
| 631 | + }, |
| 632 | + { |
| 633 | + "cell_type": "markdown", |
| 634 | + "metadata": {}, |
| 635 | + "source": [ |
| 636 | + "We can also change the distance threshold of the query object between uses if we like. Here we will set ``distance_threshold==0.1``. This means that the query object will return all matches that are within 0.1 of the query object. This is a small distance, so we expect to get fewer matches than before." |
| 637 | + ] |
| 638 | + }, |
| 639 | + { |
| 640 | + "cell_type": "code", |
| 641 | + "execution_count": 21, |
| 642 | + "metadata": {}, |
| 643 | + "outputs": [ |
| 644 | + { |
| 645 | + "data": { |
| 646 | + "text/html": [ |
| 647 | + "<table><tr><th>vector_distance</th><th>user</th><th>credit_score</th><th>age</th><th>job</th></tr><tr><td>0</td><td>john</td><td>high</td><td>18</td><td>engineer</td></tr><tr><td>0</td><td>derrick</td><td>low</td><td>14</td><td>doctor</td></tr></table>" |
| 648 | + ], |
| 649 | + "text/plain": [ |
| 650 | + "<IPython.core.display.HTML object>" |
| 651 | + ] |
| 652 | + }, |
| 653 | + "metadata": {}, |
| 654 | + "output_type": "display_data" |
| 655 | + } |
| 656 | + ], |
| 657 | + "source": [ |
| 658 | + "range_query.set_distance_threshold(0.1)\n", |
| 659 | + "\n", |
| 660 | + "result_print(index.query(range_query))" |
| 661 | + ] |
| 662 | + }, |
| 663 | + { |
| 664 | + "cell_type": "markdown", |
| 665 | + "metadata": {}, |
| 666 | + "source": [ |
| 667 | + "Range queries can also be used with filters like any other query type. The following limits the results to only include records with a ``job`` of ``engineer`` while also being within the vector range (aka distance)." |
| 668 | + ] |
| 669 | + }, |
| 670 | + { |
| 671 | + "cell_type": "code", |
| 672 | + "execution_count": 22, |
| 673 | + "metadata": {}, |
| 674 | + "outputs": [ |
| 675 | + { |
| 676 | + "data": { |
| 677 | + "text/html": [ |
| 678 | + "<table><tr><th>vector_distance</th><th>user</th><th>credit_score</th><th>age</th><th>job</th></tr><tr><td>0</td><td>john</td><td>high</td><td>18</td><td>engineer</td></tr></table>" |
| 679 | + ], |
| 680 | + "text/plain": [ |
| 681 | + "<IPython.core.display.HTML object>" |
| 682 | + ] |
| 683 | + }, |
| 684 | + "metadata": {}, |
| 685 | + "output_type": "display_data" |
| 686 | + } |
| 687 | + ], |
| 688 | + "source": [ |
| 689 | + "is_engineer = Text(\"job\") == \"engineer\"\n", |
| 690 | + "\n", |
| 691 | + "range_query.set_filter(is_engineer)\n", |
| 692 | + "\n", |
| 693 | + "result_print(index.query(range_query))" |
| 694 | + ] |
| 695 | + }, |
547 | 696 | { |
548 | 697 | "cell_type": "markdown", |
549 | 698 | "metadata": {}, |
|
559 | 708 | }, |
560 | 709 | { |
561 | 710 | "cell_type": "code", |
562 | | - "execution_count": 19, |
| 711 | + "execution_count": 23, |
563 | 712 | "metadata": {}, |
564 | 713 | "outputs": [ |
565 | 714 | { |
|
598 | 747 | }, |
599 | 748 | { |
600 | 749 | "cell_type": "code", |
601 | | - "execution_count": 20, |
| 750 | + "execution_count": 24, |
602 | 751 | "metadata": {}, |
603 | 752 | "outputs": [ |
604 | 753 | { |
|
607 | 756 | "'@credit_score:{high}'" |
608 | 757 | ] |
609 | 758 | }, |
610 | | - "execution_count": 20, |
| 759 | + "execution_count": 24, |
611 | 760 | "metadata": {}, |
612 | 761 | "output_type": "execute_result" |
613 | 762 | } |
|
620 | 769 | }, |
621 | 770 | { |
622 | 771 | "cell_type": "code", |
623 | | - "execution_count": 21, |
| 772 | + "execution_count": 25, |
624 | 773 | "metadata": {}, |
625 | 774 | "outputs": [ |
626 | 775 | { |
627 | 776 | "name": "stdout", |
628 | 777 | "output_type": "stream", |
629 | 778 | "text": [ |
630 | | - "{'id': 'v1:dc45946a8bc74f47858617c91d593b43', 'payload': None, 'user': 'john', 'age': '18', 'job': 'engineer', 'credit_score': 'high', 'office_location': '-122.4194,37.7749', 'user_embedding': '==\\x00\\x00\\x00?'}\n", |
631 | | - "{'id': 'v1:5c628fdfbba247c6843955de04e3a00c', 'payload': None, 'user': 'nancy', 'age': '94', 'job': 'doctor', 'credit_score': 'high', 'office_location': '-122.4194,37.7749', 'user_embedding': '333?=\\x00\\x00\\x00?'}\n", |
632 | | - "{'id': 'v1:4f1cb6dd167149d59c9c108e09407fc9', 'payload': None, 'user': 'tyler', 'age': '100', 'job': 'engineer', 'credit_score': 'high', 'office_location': '-122.0839,37.3861', 'user_embedding': '=>\\x00\\x00\\x00?'}\n", |
633 | | - "{'id': 'v1:f1720dbeb81c4316bedf21ca60357fdf', 'payload': None, 'user': 'tim', 'age': '12', 'job': 'dermatologist', 'credit_score': 'high', 'office_location': '-122.0839,37.3861', 'user_embedding': '>>\\x00\\x00\\x00?'}\n" |
| 779 | + "{'id': 'v1:d78adb45342c4404a9c40afd4e65f51b', 'payload': None, 'user': 'john', 'age': '18', 'job': 'engineer', 'credit_score': 'high', 'office_location': '-122.4194,37.7749', 'user_embedding': '==\\x00\\x00\\x00?'}\n", |
| 780 | + "{'id': 'v1:a0a202b6398840c5ab2263b1fd4e704a', 'payload': None, 'user': 'nancy', 'age': '94', 'job': 'doctor', 'credit_score': 'high', 'office_location': '-122.4194,37.7749', 'user_embedding': '333?=\\x00\\x00\\x00?'}\n", |
| 781 | + "{'id': 'v1:1f3b15dfb4ed490186859c1b2cb3df82', 'payload': None, 'user': 'tyler', 'age': '100', 'job': 'engineer', 'credit_score': 'high', 'office_location': '-122.0839,37.3861', 'user_embedding': '=>\\x00\\x00\\x00?'}\n", |
| 782 | + "{'id': 'v1:465de540d9d54501b09b8e47a0116620', 'payload': None, 'user': 'tim', 'age': '12', 'job': 'dermatologist', 'credit_score': 'high', 'office_location': '-122.0839,37.3861', 'user_embedding': '>>\\x00\\x00\\x00?'}\n" |
634 | 783 | ] |
635 | 784 | } |
636 | 785 | ], |
|
653 | 802 | }, |
654 | 803 | { |
655 | 804 | "cell_type": "code", |
656 | | - "execution_count": 22, |
| 805 | + "execution_count": 26, |
657 | 806 | "metadata": {}, |
658 | 807 | "outputs": [ |
659 | 808 | { |
|
662 | 811 | "'((@credit_score:{high} @age:[18 +inf]) @age:[-inf 100])=>[KNN 10 @user_embedding $vector AS vector_distance] RETURN 6 user credit_score age job office_location vector_distance SORTBY vector_distance ASC DIALECT 2 LIMIT 0 10'" |
663 | 812 | ] |
664 | 813 | }, |
665 | | - "execution_count": 22, |
| 814 | + "execution_count": 26, |
666 | 815 | "metadata": {}, |
667 | 816 | "output_type": "execute_result" |
668 | 817 | } |
|
0 commit comments