<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>且听书吟</title>
        <link>https://stage.yufan.me</link>
        <description>诗与梦想的远方</description>
        <lastBuildDate>Sat, 09 May 2026 12:25:32 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>WordPress 3.2.1</generator>
        <language>zh-CN</language>
        <image>
            <title>且听书吟</title>
            <url>https://stage.yufan.me/logo.svg</url>
            <link>https://stage.yufan.me</link>
        </image>
        <copyright>All rights reserved 2011, 雨帆</copyright>
        <category>文章</category>
        <category>杂思</category>
        <category>杂谈</category>
        <category>编程</category>
        <category>笔记</category>
        <category>小说</category>
        <atom:link href="https://stage.yufan.me/tags/elasticsearch/feed" rel="self" type="application/rss+xml"/>
        <item>
            <title><![CDATA[Compare the Milvus with Elasticsearch]]></title>
            <link>https://stage.yufan.me/posts/vector-db-research</link>
            <guid isPermaLink="false">https://stage.yufan.me/posts/vector-db-research</guid>
            <pubDate>Wed, 15 Jan 2025 16:13:32 GMT</pubDate>
            <description><![CDATA[This article explains through relevant technical research whether splitting data on Elasticsearch will affect query results in AI search scenarios.]]></description>
            <content:encoded><![CDATA[<link rel="preload" as="image" href="https://cat.yufan.me/images/2025/01/2025011601423200.jpg"/><link rel="preload" as="image" href="https://cat.yufan.me/images/recaps/vector-db-research/elasticsearch-nodes.png"/><link rel="preload" as="image" href="https://cat.yufan.me/images/recaps/vector-db-research/multiple-milvus-architecture.png"/><link rel="preload" as="image" href="https://cat.yufan.me/images/recaps/vector-db-research/different-milvus-clusters.png"/><link rel="preload" as="image" href="https://cat.yufan.me/images/recaps/vector-db-research/milvus-distributed-architecture.png"/><link rel="preload" as="image" href="https://cat.yufan.me/images/recaps/vector-db-research/index-flow-in-elasticsearch.png"/><link rel="preload" as="image" href="https://cat.yufan.me/images/recaps/vector-db-research/elasticsearch-index-request-flow.png"/><link rel="preload" as="image" href="https://cat.yufan.me/images/recaps/vector-db-research/data-writing-flow-in-milvus.png"/><link rel="preload" as="image" href="https://cat.yufan.me/images/recaps/vector-db-research/milvus-data-writing-overview.png"/><link rel="preload" as="image" href="https://cat.yufan.me/images/recaps/vector-db-research/data-model-elasticsearch.png"/><link rel="preload" as="image" href="https://cat.yufan.me/images/recaps/vector-db-research/data-model-in-milvus.png"/><link rel="preload" as="image" href="https://cat.yufan.me/images/recaps/vector-db-research/milvus-shard.png"/><link rel="preload" as="image" href="https://cat.yufan.me/images/recaps/vector-db-research/milvus-segments.png"/><link rel="preload" as="image" href="https://cat.yufan.me/images/recaps/vector-db-research/query-flow-in-elasticsearch.png"/><link rel="preload" as="image" href="https://cat.yufan.me/images/recaps/vector-db-research/fetch-flow-in-elasticsearch.png"/><link rel="preload" as="image" href="https://cat.yufan.me/images/recaps/vector-db-research/elasticsearch-knn-flow.jpg"/><link rel="preload" as="image" href="https://cat.yufan.me/images/recaps/vector-db-research/query-flow-in-milvus.png"/><link rel="preload" as="image" href="https://cat.yufan.me/images/recaps/vector-db-research/tf-idf-explain.png"/><link rel="preload" as="image" href="https://cat.yufan.me/images/recaps/vector-db-research/sparse-bm25-in-milvus.png"/><img src="https://cat.yufan.me/images/2025/01/2025011601423200.jpg" alt=""/>
<h2 id="background">Background<a href="#background"><span class="icon icon-link"></span></a></h2>
<p>In the application scenarios of Elasticsearch, the storage of large amounts of data may significantly impact the read and write performance of Elasticsearch. Therefore, it is necessary to split indexes according to certain data types. This article explains through relevant technical research whether splitting data on Elasticsearch will affect query results in AI search scenarios. It also compares the implementation principles of other vector databases currently available in the industry with those currently using Elasticsearch.</p>
<h2 id="goals">Goals<a href="#goals"><span class="icon icon-link"></span></a></h2>
<ol>
<li>
<p>Elasticsearch vs. Milvus: Comparison in AIC use cases</p>
<p>Investigate the data storage mechanisms and query processes of mainstream vector databases in the current industry (Qdrant, Milvus). Conduct an in-depth analysis of how they handle data updates (such as incremental updates and deletion operations) and compare them with Elasticsearch.</p>
</li>
<li>
<p>The impact of single-table and multi-table design on similarity calculation in the Elasticsearch BM25 model</p>
<p>Study the Elasticsearch differences between single-index and multi-index structures in the BM25 calculation, particularly their impact on efficiency and accuracy during calculations.</p>
</li>
</ol>
<h2 id="elasticsearch-vs-milvus-comparison-in-storage-query-etc">Elasticsearch vs. Milvus: Comparison in storage, query, etc<a href="#elasticsearch-vs-milvus-comparison-in-storage-query-etc"><span class="icon icon-link"></span></a></h2>
<h3 id="overall-architecture">Overall Architecture<a href="#overall-architecture"><span class="icon icon-link"></span></a></h3>
<h4 id="elasticsearch-architecture">Elasticsearch Architecture<a href="#elasticsearch-architecture"><span class="icon icon-link"></span></a></h4>
<img src="https://cat.yufan.me/images/recaps/vector-db-research/elasticsearch-nodes.png" alt=""/>
<p>Elasticsearch architecture is straightforward. Each node in a cluster can handle requests and redirect them to the appropriate data nodes for searching. We use blue-green deployment for scaling up or down, which enhances stability requirements.</p>
<p><strong>Cons</strong>: Currently, we only use two types of Elasticsearch nodes: data nodes and master nodes. Every data node serves all roles, which may not be as clear-cut as Milvus&#x27;s architecture.</p>
<h4 id="multiple-milvus-architecture">Multiple Milvus Architecture<a href="#multiple-milvus-architecture"><span class="icon icon-link"></span></a></h4>
<img src="https://cat.yufan.me/images/recaps/vector-db-research/multiple-milvus-architecture.png" alt=""/>
<p>The Milvus Lite is the core search engine part with the embedded storage for local prototype verification. It&#x27;s written in Python and can be integrated into any AI python project.</p>
<p>The Milvus standalone is based on Docker compose with a milvus instance, a MinIO instance and an etcd instance. The Milvus Distributed is used in Cloud and production with all the required modules. In the most case, we are talking about the Milvus Distributed in this report.</p>
<img src="https://cat.yufan.me/images/recaps/vector-db-research/different-milvus-clusters.png" alt=""/>
<h4 id="milvus-distributed-architecture">Milvus Distributed Architecture<a href="#milvus-distributed-architecture"><span class="icon icon-link"></span></a></h4>
<img src="https://cat.yufan.me/images/recaps/vector-db-research/milvus-distributed-architecture.png" alt=""/>
<p>Milvus has a shared storage massively parallel processing (MPP) architecture, with storage and computing resources independent of one another. The data and the control plane are disaggregated, and its architecture comprises four layers: access layer, coordinator services, worker nodes, and storage. Each layer is independent of the others for better disaster recovery and scalability.</p>
<ul>
<li><strong>Access Layer</strong>: This layer serves as the endpoint for the users. Composed of stateless proxies, the access layer validates client requests before returning the final results to the client. The proxy uses load-balancing components like Nginx and NodePort to provide a unified service address.</li>
<li><strong>Coordinator Service</strong>: This layer serves as the system’s brain, assigning tasks to worker nodes. The coordinator service layer performs critical operations, including data management, load balancing, data declaration, cluster topology management, and timestamp generation.</li>
<li><strong>Worker Nodes</strong>: The worker nodes follow the instructions from the coordinator service layer and execute data manipulation language (DML) commands. Due to the separation of computing and storage, these nodes are stateless in nature. When deployed on Kubernetes, the worker nodes facilitate disaster recovery and system scale-out.</li>
<li><strong>Storage</strong>: Responsible for data persistence, the storage layer consists of meta storage, log broker, and object storage. Meta storage stores snapshots of metadata, such as message consumption checkpoints and node status. On the other hand, object storage stores snapshots of index files, logs, and intermediate query results. The log broker functions as a pub-sub system supporting data playback and recovery.</li>
</ul>
<p>Even in a minimal standalone Milvus deployment. We need an OSS service like Minio or S3, A etcd standalone cluster and a milvus instance. It&#x27;s quite complex architecture and mainly deployed and used on K8S.</p>
<h4 id="summary">Summary<a href="#summary"><span class="icon icon-link"></span></a></h4>
<table><tbody><tr><td></td><td>Elasticsearch</td><td>Milvus</td></tr><tr><td>Complexity</td><td>Simple, only master nodes and data nodes.</td><td><p>Complex, require OSS, etcd and different types of milvus nodes.</p><br/><p>But can be deployed by using Amazon EKS.</p></td></tr><tr><td>Potential Bottleneck</td><td><p>As the increase of the number of Elasticsearch cluster. We may need more replicas to balance the query for
avoiding hot zone.</p></td><td><p>Etcd requires high performance disk for better serving metadata. It could be a bottleneck when the query
increases.</p><br/><p>Files on object storage need to be pulled to the local disk and eventually loaded into memory for querying. If
this process switches frequently, the performance may not necessarily be good.</p></td></tr><tr><td>Scaling</td><td>Require blue-green deployment to get the online cluster to be scaled</td><td>Easy to scale on k8s. The compute node instance number can be changed on demand.</td></tr><tr><td>Storage</td><td><p>Every data node&#x27;s hard disk. Require to add new data node to increase the storage. S3 is only used as the
backup storage.</p></td><td>OSS based. S3 can be used for storage all the metrics.</td></tr><tr><td>AA Switch</td><td>Require two identical Elasticsearch cluster.</td><td>No need to AA switch. Just reload the query nodes or add more query nodes.</td></tr><tr><td>Upgrade</td><td>Same as the scaling.</td><td>Use helm command on k8s cluster.</td></tr></tbody></table>
<h3 id="data-writing-flow">Data Writing Flow<a href="#data-writing-flow"><span class="icon icon-link"></span></a></h3>
<h4 id="index-flow-in-elasticsearch">Index Flow in Elasticsearch<a href="#index-flow-in-elasticsearch"><span class="icon icon-link"></span></a></h4>
<img src="https://cat.yufan.me/images/recaps/vector-db-research/index-flow-in-elasticsearch.png" alt=""/>
<p>In this diagram, we can see how a new document is stored by Elasticsearch. As soon as it “arrives”, it is committed to a transaction log called “translog” and to a memory buffer. The translog is how Elasticsearch can recover data that was only in memory in case of a crash.</p>
<p>All the documents in the memory buffer will generate a single in-memory Lucene segment when the “refresh” operation happens. This operation is used to make new documents available for search.</p>
<p>Depending on different triggers, eventually, all of those segments are merged into a single segment and saved to disk and the translog is cleared.</p>
<img src="https://cat.yufan.me/images/recaps/vector-db-research/elasticsearch-index-request-flow.png" alt=""/>
<p>This diagram shows the whole routine for a simple index request.</p>
<h4 id="data-writing-flow-in-milvus">Data Writing Flow in Milvus<a href="#data-writing-flow-in-milvus"><span class="icon icon-link"></span></a></h4>
<img src="https://cat.yufan.me/images/recaps/vector-db-research/data-writing-flow-in-milvus.png" alt=""/>
<p>The picture above shows all the modules used in data writing. All the data writing requests are triggered in the SDK. The SDK send the request through the Load Balancer to the proxy node. The number of the proxy node instances could be varied. The Proxy node cached data and request the segment information for writing the data into the message storage.</p>
<p>Message storage is mainly a Pulsar based platform for persistence the data. It is the same as the translog in Elasticsearch. The main difference is that Milvus don&#x27;t need a MQ service in the frontend. You can directly write data through it&#x27;s interface. And don&#x27;t need bulk request in Elasticsearch.</p>
<p>The data node consumes the data through message storage and flush it into the object storage finally.</p>
<img src="https://cat.yufan.me/images/recaps/vector-db-research/milvus-data-writing-overview.png" alt=""/>
<h3 id="data-model-in-vector">Data model in Vector<a href="#data-model-in-vector"><span class="icon icon-link"></span></a></h3>
<h4 id="data-model-elasticsearch">Data Model Elasticsearch<a href="#data-model-elasticsearch"><span class="icon icon-link"></span></a></h4>
<img src="https://cat.yufan.me/images/recaps/vector-db-research/data-model-elasticsearch.png" alt=""/>
<p>As we can see from the diagram, Elasticsearch shards each Lucene index across the available nodes. A shard can be a primary shard or replica shard. Each shard is a Lucene Index, each one of those indexes can have multiple segments, each segment is an complete HNSW graph.</p>
<h4 id="data-model-in-milvus">Data Model in Milvus<a href="#data-model-in-milvus"><span class="icon icon-link"></span></a></h4>
<img src="https://cat.yufan.me/images/recaps/vector-db-research/data-model-in-milvus.png" alt=""/>
<p>Milvus provides users with the largest concept called Collection, which can be mapped to a table in a traditional database and is equivalent to an Index in Elasticsearch. Each Collection is divided into multiple Shards, with two Shards by default. The number of Shards depends on how much data you need to write and how many nodes you want to distribute the writing across for processing.</p>
<p>Each Shard contains many Partitions, which have their own data attributes. A Shard itself is divided based on the hash of the primary key, while Partitions are often divided based on fields or Partition Tags that you specify. Common ways of partitioning include dividing by the date of data entry, by user gender, or by user age. One major advantage of using Partitions during queries is that if you add a Partition tag, it can help filter out a lot of data.</p>
<img src="https://cat.yufan.me/images/recaps/vector-db-research/milvus-shard.png" alt=""/>
<p>Shard is more about helping you expand write operations, while Partition helps improve read performance during read operations. Each Partition within a Shard corresponds to many small Segments. A Segment is the smallest unit of scheduling in our entire system and is divided into Growing Segments and Sealed Segments. A Growing Segment is subscribed by the Query Node, where users continuously write data until it becomes large enough; once it reaches the default limit of 512MB, writing is prohibited, turning it into a Sealed Segment, upon which some vector indexes are built for the Sealed Segment.</p>
<img src="https://cat.yufan.me/images/recaps/vector-db-research/milvus-segments.png" alt=""/>
<p>A stored procedure is organized by segments and uses a columnar storage method, where each primary key, column, and vector is stored in a separate file.</p>
<h3 id="vector-query">Vector Query<a href="#vector-query"><span class="icon icon-link"></span></a></h3>
<h4 id="index-types">Index Types<a href="#index-types"><span class="icon icon-link"></span></a></h4>
<p>Both Elasticsearch and Milvus require memory to load vector files and perform queries. But Milvus offers a file-based index type named DiskANN for large datasets, which doesn&#x27;t require loading all the data but indexes into memory for reducing the memory consumption.</p>
<p>As for Elasticsearch, the dense vector on HNSW is the only solution. The default dimension is float. But Elasticsearch provides the optimized HNSW for reducing the size or increase the performance. To use a quantized index, you can set your index type to <code>int8_hnsw</code>, <code>int4_hnsw</code>, or <code>bbq_hnsw</code>.</p>
<table><tbody><tr><td>Supported index</td><td>Classification</td><td>Scenario</td></tr><tr><td>FLAT</td><td>N/A</td><td><ul><li>Relatively small dataset</li><li>Requires a 100% recall rate</li></ul></td></tr><tr><td>IVF_FLAT</td><td>N/A</td><td><ul><li>High-speed query</li><li>Requires a recall rate as high as possible</li></ul></td></tr><tr><td>IVF_SQ8</td><td>Quantization-based index</td><td><ul><li>Very high-speed query</li><li>Limited memory resources</li><li>Accepts minor compromise in recall rate</li></ul></td></tr><tr><td>IVF_PQ</td><td>Quantization-based index</td><td><ul><li>High-speed query</li><li>Limited memory resources</li><li>Accepts minor compromise in recall rate</li></ul></td></tr><tr><td>HNSW</td><td>Graph-based index</td><td><ul><li>Very high-speed query</li><li>Requires a recall rate as high as possible</li><li>Large memory resources</li></ul></td></tr><tr><td>HNSW_SQ</td><td>Quantization-based index</td><td><ul><li>Very high-speed query</li><li>Limited memory resources</li><li>Accepts minor compromise in recall rate</li></ul></td></tr><tr><td>HNSW_PQ</td><td>Quantization-based index</td><td><ul><li>Medium speed query</li><li>Very limited memory resources</li><li>Accepts minor compromise in recall rate</li></ul></td></tr><tr><td>HNSW_PRQ</td><td>Quantization-based index</td><td><ul><li>Medium speed query</li><li>Very limited memory resources</li><li>Accepts minor compromise in recall rate</li></ul></td></tr><tr><td>SCANN</td><td>Quantization-based index</td><td><ul><li>Very high-speed query</li><li>Requires a recall rate as high as possible</li><li>Large memory resources</li></ul></td></tr></tbody></table>
<h4 id="query-flow-in-elasticsearch">Query Flow in Elasticsearch<a href="#query-flow-in-elasticsearch"><span class="icon icon-link"></span></a></h4>
<img src="https://cat.yufan.me/images/recaps/vector-db-research/query-flow-in-elasticsearch.png" alt=""/>
<p>The query phase above consists of the following three steps:</p>
<ol>
<li>The client sends a <strong>search</strong> request to <strong>Node 3</strong>, which creates an empty priority queue of size <strong>from + size</strong>.</li>
<li><strong>Node 3</strong> forwards the search request to a primary or replica copy of every shard in the index. Each shard executes the query locally and adds the results into a local sorted priority queue of size <strong>from + size</strong>.</li>
<li>Each shard returns the doc IDs and sort values of all the docs in its priority queue to the coordinating node, <strong>Node 3</strong>, which merges these values into its own priority queue to produce a globally sorted list of results.</li>
</ol>
<img src="https://cat.yufan.me/images/recaps/vector-db-research/fetch-flow-in-elasticsearch.png" alt=""/>
<p>The distributed fetch phase consists of the following steps:</p>
<ol>
<li>The coordinating node identifies which documents need to be fetched and issues a multi <code>GET</code> request to the relevant shards.</li>
<li>Each shard loads the documents and enriches them, if required, and then returns the documents to the coordinating node.</li>
<li>Once all documents have been fetched, the coordinating node returns the results to the client.</li>
</ol>
<img src="https://cat.yufan.me/images/recaps/vector-db-research/elasticsearch-knn-flow.jpg" alt=""/>
<h4 id="query-flow-in-milvus">Query Flow in Milvus<a href="#query-flow-in-milvus"><span class="icon icon-link"></span></a></h4>
<img src="https://cat.yufan.me/images/recaps/vector-db-research/query-flow-in-milvus.png" alt=""/>
<p>In the reading path, query requests are broadcast through DqRequestChannel, and query results are aggregated to the proxy via gRPC.</p>
<p>As a producer, the proxy writes query requests into DqRequestChannel. The way Query Node consumes DqRequestChannel is quite special: each Query Node subscribes to this Channel so that every message in the Channel is broadcasted to all Query Nodes.</p>
<p>After receiving a request, the Query Node performs a local query and aggregates at the Segment level before sending the aggregated result back to the corresponding Proxy via gRPC. It should be noted that there is a unique ProxyID in the query request identifying its originator. Based on this, different query results are routed by Query Nodes to their respective Proxies.</p>
<p>Once it determines that it has collected all of the Query Nodes&#x27; results, Proxy performs global aggregation to obtain the final query result and returns it to the client. It should be noted that both in queries and results there exists an identical and unique RequestID which marks each individual query; based on this ID, Proxy distinguishes which set of results belong to one specific request.</p>
<h2 id="compare-bm25-between-elasticsearch-and-milvus">Compare BM25 between Elasticsearch and Milvus<a href="#compare-bm25-between-elasticsearch-and-milvus"><span class="icon icon-link"></span></a></h2>
<h3 id="why-we-still-care-about-bm25-in-rag">Why we still care about BM25 in RAG<a href="#why-we-still-care-about-bm25-in-rag"><span class="icon icon-link"></span></a></h3>
<p>Hybrid Search has long been an important method for improving the quality of Retrieval-Augmented Generation (RAG) search. Despite the remarkable performance of dense embedding-based search techniques, which have demonstrated significant progress in building deep semantic interactions between queries and documents as the model scale and pre-training datasets have expanded, there are still notable limitations. These include issues such as poor interoperability and suboptimal performance when dealing with long-tail queries and rare terms.</p>
<p>For many RAG applications, pre-trained models often lack domain-specific corpus support, and in some scenarios, their performance is even inferior to BM25-based keyword matching retrieval. Against this backdrop, Hybrid Search combines the semantic understanding capabilities of dense vector search with the precision of keyword matching, offering a more efficient solution to address these challenges. It has become a key technology for enhancing search effectiveness.</p>
<h3 id="how-to-calculate-bm25">How to calculate BM25<a href="#how-to-calculate-bm25"><span class="icon icon-link"></span></a></h3>
<p>BM25 (best matching) is a ranking function used by search engine to estimate the relevance of documents to a given search query.</p>
<mjx-container class="MathJax" jax="SVG" display="true"><svg style="vertical-align:-3.98ex" xmlns="http://www.w3.org/2000/svg" width="60.083ex" height="7.515ex" role="img" focusable="false" viewBox="0 -1562.5 26556.9 3321.5" xmlns:xlink="http://www.w3.org/1999/xlink"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="scale(1,-1)"><g data-mml-node="math"><g data-mml-node="mtext"><use data-c="73" xlink:href="#MJX-TEX-N-73"></use><use data-c="63" xlink:href="#MJX-TEX-N-63" transform="translate(394,0)"></use><use data-c="6F" xlink:href="#MJX-TEX-N-6F" transform="translate(838,0)"></use><use data-c="72" xlink:href="#MJX-TEX-N-72" transform="translate(1338,0)"></use><use data-c="65" xlink:href="#MJX-TEX-N-65" transform="translate(1730,0)"></use></g><g data-mml-node="mo" transform="translate(2174,0)"><use data-c="28" xlink:href="#MJX-TEX-N-28"></use></g><g data-mml-node="mi" transform="translate(2563,0)"><use data-c="1D437" xlink:href="#MJX-TEX-I-1D437"></use></g><g data-mml-node="mo" transform="translate(3391,0)"><use data-c="2C" xlink:href="#MJX-TEX-N-2C"></use></g><g data-mml-node="mi" transform="translate(3835.7,0)"><use data-c="1D444" xlink:href="#MJX-TEX-I-1D444"></use></g><g data-mml-node="mo" transform="translate(4626.7,0)"><use data-c="29" xlink:href="#MJX-TEX-N-29"></use></g><g data-mml-node="mo" transform="translate(5293.4,0)"><use data-c="3D" xlink:href="#MJX-TEX-N-3D"></use></g><g data-mml-node="munderover" transform="translate(6349.2,0)"><g data-mml-node="mo"><use data-c="2211" xlink:href="#MJX-TEX-LO-2211"></use></g><g data-mml-node="TeXAtom" transform="translate(148.2,-1087.9) scale(0.707)" data-mjx-texclass="ORD"><g data-mml-node="mi"><use data-c="1D456" xlink:href="#MJX-TEX-I-1D456"></use></g><g data-mml-node="mo" transform="translate(345,0)"><use data-c="3D" xlink:href="#MJX-TEX-N-3D"></use></g><g data-mml-node="mn" transform="translate(1123,0)"><use data-c="31" xlink:href="#MJX-TEX-N-31"></use></g></g><g data-mml-node="TeXAtom" transform="translate(509.9,1150) scale(0.707)" data-mjx-texclass="ORD"><g data-mml-node="mi"><use data-c="1D45B" xlink:href="#MJX-TEX-I-1D45B"></use></g></g></g><g data-mml-node="mtext" transform="translate(7959.9,0)"><use data-c="49" xlink:href="#MJX-TEX-N-49"></use><use data-c="44" xlink:href="#MJX-TEX-N-44" transform="translate(361,0)"></use><use data-c="46" xlink:href="#MJX-TEX-N-46" transform="translate(1125,0)"></use></g><g data-mml-node="mo" transform="translate(9737.9,0)"><use data-c="28" xlink:href="#MJX-TEX-N-28"></use></g><g data-mml-node="msub" transform="translate(10126.9,0)"><g data-mml-node="mi"><use data-c="1D45E" xlink:href="#MJX-TEX-I-1D45E"></use></g><g data-mml-node="mi" transform="translate(479,-150) scale(0.707)"><use data-c="1D456" xlink:href="#MJX-TEX-I-1D456"></use></g></g><g data-mml-node="mo" transform="translate(10899.8,0)"><use data-c="29" xlink:href="#MJX-TEX-N-29"></use></g><g data-mml-node="mo" transform="translate(11511.1,0)"><use data-c="22C5" xlink:href="#MJX-TEX-N-22C5"></use></g><g data-mml-node="mfrac" transform="translate(12011.3,0)"><g data-mml-node="mrow" transform="translate(3495.8,710)"><g data-mml-node="mi"><use data-c="1D453" xlink:href="#MJX-TEX-I-1D453"></use></g><g data-mml-node="mo" transform="translate(550,0)"><use data-c="28" xlink:href="#MJX-TEX-N-28"></use></g><g data-mml-node="msub" transform="translate(939,0)"><g data-mml-node="mi"><use data-c="1D45E" xlink:href="#MJX-TEX-I-1D45E"></use></g><g data-mml-node="mi" transform="translate(479,-150) scale(0.707)"><use data-c="1D456" xlink:href="#MJX-TEX-I-1D456"></use></g></g><g data-mml-node="mo" transform="translate(1712,0)"><use data-c="2C" xlink:href="#MJX-TEX-N-2C"></use></g><g data-mml-node="mi" transform="translate(2156.6,0)"><use data-c="1D437" xlink:href="#MJX-TEX-I-1D437"></use></g><g data-mml-node="mo" transform="translate(2984.6,0)"><use data-c="29" xlink:href="#MJX-TEX-N-29"></use></g><g data-mml-node="mo" transform="translate(3595.8,0)"><use data-c="22C5" xlink:href="#MJX-TEX-N-22C5"></use></g><g data-mml-node="mo" transform="translate(4096.1,0)"><use data-c="28" xlink:href="#MJX-TEX-N-28"></use></g><g data-mml-node="msub" transform="translate(4485.1,0)"><g data-mml-node="mi"><use data-c="1D458" xlink:href="#MJX-TEX-I-1D458"></use></g><g data-mml-node="mn" transform="translate(554,-150) scale(0.707)"><use data-c="31" xlink:href="#MJX-TEX-N-31"></use></g></g><g data-mml-node="mo" transform="translate(5664.8,0)"><use data-c="2B" xlink:href="#MJX-TEX-N-2B"></use></g><g data-mml-node="mn" transform="translate(6665.1,0)"><use data-c="31" xlink:href="#MJX-TEX-N-31"></use></g><g data-mml-node="mo" transform="translate(7165.1,0)"><use data-c="29" xlink:href="#MJX-TEX-N-29"></use></g></g><g data-mml-node="mrow" transform="translate(220,-1109.5)"><g data-mml-node="mi"><use data-c="1D453" xlink:href="#MJX-TEX-I-1D453"></use></g><g data-mml-node="mo" transform="translate(550,0)"><use data-c="28" xlink:href="#MJX-TEX-N-28"></use></g><g data-mml-node="msub" transform="translate(939,0)"><g data-mml-node="mi"><use data-c="1D45E" xlink:href="#MJX-TEX-I-1D45E"></use></g><g data-mml-node="mi" transform="translate(479,-150) scale(0.707)"><use data-c="1D456" xlink:href="#MJX-TEX-I-1D456"></use></g></g><g data-mml-node="mo" transform="translate(1712,0)"><use data-c="2C" xlink:href="#MJX-TEX-N-2C"></use></g><g data-mml-node="mi" transform="translate(2156.6,0)"><use data-c="1D437" xlink:href="#MJX-TEX-I-1D437"></use></g><g data-mml-node="mo" transform="translate(2984.6,0)"><use data-c="29" xlink:href="#MJX-TEX-N-29"></use></g><g data-mml-node="mo" transform="translate(3595.8,0)"><use data-c="2B" xlink:href="#MJX-TEX-N-2B"></use></g><g data-mml-node="msub" transform="translate(4596.1,0)"><g data-mml-node="mi"><use data-c="1D458" xlink:href="#MJX-TEX-I-1D458"></use></g><g data-mml-node="mn" transform="translate(554,-150) scale(0.707)"><use data-c="31" xlink:href="#MJX-TEX-N-31"></use></g></g><g data-mml-node="mo" transform="translate(5775.8,0)"><use data-c="22C5" xlink:href="#MJX-TEX-N-22C5"></use></g><g data-mml-node="mrow" transform="translate(6276.1,0)"><g data-mml-node="mo" transform="translate(0 -0.5)"><use data-c="28" xlink:href="#MJX-TEX-LO-28"></use></g><g data-mml-node="mn" transform="translate(597,0)"><use data-c="31" xlink:href="#MJX-TEX-N-31"></use></g><g data-mml-node="mo" transform="translate(1319.2,0)"><use data-c="2212" xlink:href="#MJX-TEX-N-2212"></use></g><g data-mml-node="mi" transform="translate(2319.4,0)"><use data-c="1D44F" xlink:href="#MJX-TEX-I-1D44F"></use></g><g data-mml-node="mo" transform="translate(2970.7,0)"><use data-c="2B" xlink:href="#MJX-TEX-N-2B"></use></g><g data-mml-node="mi" transform="translate(3970.9,0)"><use data-c="1D44F" xlink:href="#MJX-TEX-I-1D44F"></use></g><g data-mml-node="mo" transform="translate(4622.1,0)"><use data-c="22C5" xlink:href="#MJX-TEX-N-22C5"></use></g><g data-mml-node="mfrac" transform="translate(5122.3,0)"><g data-mml-node="mrow" transform="translate(565.8,516.4) scale(0.707)"><g data-mml-node="mo" transform="translate(0 -0.5)"><use data-c="7C" xlink:href="#MJX-TEX-N-7C"></use></g><g data-mml-node="mi" transform="translate(278,0)"><use data-c="1D437" xlink:href="#MJX-TEX-I-1D437"></use></g><g data-mml-node="mo" transform="translate(1106,0) translate(0 -0.5)"><use data-c="7C" xlink:href="#MJX-TEX-N-7C"></use></g></g><g data-mml-node="mtext" transform="translate(220,-345) scale(0.707)"><use data-c="61" xlink:href="#MJX-TEX-N-61"></use><use data-c="76" xlink:href="#MJX-TEX-N-76" transform="translate(500,0)"></use><use data-c="67" xlink:href="#MJX-TEX-N-67" transform="translate(1028,0)"></use><use data-c="64" xlink:href="#MJX-TEX-N-64" transform="translate(1528,0)"></use><use data-c="6C" xlink:href="#MJX-TEX-N-6C" transform="translate(2084,0)"></use></g><rect width="1870.2" height="60" x="120" y="220"></rect></g><g data-mml-node="mo" transform="translate(7232.5,0) translate(0 -0.5)"><use data-c="29" xlink:href="#MJX-TEX-LO-29"></use></g></g></g><rect width="14305.6" height="60" x="120" y="220"></rect></g></g></g></svg></mjx-container>
<p>Here is BM25 calculation formula for a query <mjx-container class="MathJax" jax="SVG"><svg style="vertical-align:-0.439ex" xmlns="http://www.w3.org/2000/svg" width="1.79ex" height="2.032ex" role="img" focusable="false" viewBox="0 -704 791 898" xmlns:xlink="http://www.w3.org/1999/xlink"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="scale(1,-1)"><g data-mml-node="math"><g data-mml-node="mi"><use data-c="1D444" xlink:href="#MJX-TEX-I-1D444"></use></g></g></g></svg></mjx-container> on document <mjx-container class="MathJax" jax="SVG"><svg style="vertical-align:-0.439ex" xmlns="http://www.w3.org/2000/svg" width="1.79ex" height="2.032ex" role="img" focusable="false" viewBox="0 -704 791 898" xmlns:xlink="http://www.w3.org/1999/xlink"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="scale(1,-1)"><g data-mml-node="math"><g data-mml-node="mi"><use data-c="1D444" xlink:href="#MJX-TEX-I-1D444"></use></g></g></g></svg></mjx-container>. <mjx-container class="MathJax" jax="SVG"><svg style="vertical-align:-0.439ex" xmlns="http://www.w3.org/2000/svg" width="1.79ex" height="2.032ex" role="img" focusable="false" viewBox="0 -704 791 898" xmlns:xlink="http://www.w3.org/1999/xlink"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="scale(1,-1)"><g data-mml-node="math"><g data-mml-node="mi"><use data-c="1D444" xlink:href="#MJX-TEX-I-1D444"></use></g></g></g></svg></mjx-container> contains keywords <mjx-container class="MathJax" jax="SVG"><svg style="vertical-align:-0.439ex" xmlns="http://www.w3.org/2000/svg" width="2.172ex" height="1.946ex" role="img" focusable="false" viewBox="0 -666 960 860" xmlns:xlink="http://www.w3.org/1999/xlink"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="scale(1,-1)"><g data-mml-node="math"><g data-mml-node="mi"><use data-c="1D45E" xlink:href="#MJX-TEX-I-1D45E"></use></g><g data-mml-node="mn" transform="translate(460,0)"><use data-c="31" xlink:href="#MJX-TEX-N-31"></use></g></g></g></svg></mjx-container>, <mjx-container class="MathJax" jax="SVG"><svg style="vertical-align:-0.439ex" xmlns="http://www.w3.org/2000/svg" width="2.172ex" height="1.946ex" role="img" focusable="false" viewBox="0 -666 960 860" xmlns:xlink="http://www.w3.org/1999/xlink"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="scale(1,-1)"><g data-mml-node="math"><g data-mml-node="mi"><use data-c="1D45E" xlink:href="#MJX-TEX-I-1D45E"></use></g><g data-mml-node="mn" transform="translate(460,0)"><use data-c="32" xlink:href="#MJX-TEX-N-32"></use></g></g></g></svg></mjx-container>, … , <mjx-container class="MathJax" jax="SVG"><svg style="vertical-align:-0.439ex" xmlns="http://www.w3.org/2000/svg" width="2.398ex" height="1.439ex" role="img" focusable="false" viewBox="0 -442 1060 636" xmlns:xlink="http://www.w3.org/1999/xlink"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="scale(1,-1)"><g data-mml-node="math"><g data-mml-node="mi"><use data-c="1D45E" xlink:href="#MJX-TEX-I-1D45E"></use></g><g data-mml-node="mi" transform="translate(460,0)"><use data-c="1D45B" xlink:href="#MJX-TEX-I-1D45B"></use></g></g></g></svg></mjx-container>.</p>
<ol>
<li><mjx-container class="MathJax" jax="SVG"><svg style="vertical-align:-0.566ex" xmlns="http://www.w3.org/2000/svg" width="7.633ex" height="2.262ex" role="img" focusable="false" viewBox="0 -750 3373.6 1000" xmlns:xlink="http://www.w3.org/1999/xlink"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="scale(1,-1)"><g data-mml-node="math"><g data-mml-node="mi"><use data-c="1D453" xlink:href="#MJX-TEX-I-1D453"></use></g><g data-mml-node="mo" transform="translate(550,0)"><use data-c="28" xlink:href="#MJX-TEX-N-28"></use></g><g data-mml-node="msub" transform="translate(939,0)"><g data-mml-node="mi"><use data-c="1D45E" xlink:href="#MJX-TEX-I-1D45E"></use></g><g data-mml-node="mi" transform="translate(479,-150) scale(0.707)"><use data-c="1D456" xlink:href="#MJX-TEX-I-1D456"></use></g></g><g data-mml-node="mo" transform="translate(1712,0)"><use data-c="2C" xlink:href="#MJX-TEX-N-2C"></use></g><g data-mml-node="mi" transform="translate(2156.6,0)"><use data-c="1D437" xlink:href="#MJX-TEX-I-1D437"></use></g><g data-mml-node="mo" transform="translate(2984.6,0)"><use data-c="29" xlink:href="#MJX-TEX-N-29"></use></g></g></g></svg></mjx-container> is the number of the times that the keyword <mjx-container class="MathJax" jax="SVG"><svg style="vertical-align:-0.439ex" xmlns="http://www.w3.org/2000/svg" width="1.749ex" height="1.439ex" role="img" focusable="false" viewBox="0 -442 773 636" xmlns:xlink="http://www.w3.org/1999/xlink"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="scale(1,-1)"><g data-mml-node="math"><g data-mml-node="msub"><g data-mml-node="mi"><use data-c="1D45E" xlink:href="#MJX-TEX-I-1D45E"></use></g><g data-mml-node="mi" transform="translate(479,-150) scale(0.707)"><use data-c="1D456" xlink:href="#MJX-TEX-I-1D456"></use></g></g></g></g></svg></mjx-container> occurs in the document <mjx-container class="MathJax" jax="SVG"><svg style="vertical-align:0" xmlns="http://www.w3.org/2000/svg" width="1.873ex" height="1.545ex" role="img" focusable="false" viewBox="0 -683 828 683" xmlns:xlink="http://www.w3.org/1999/xlink"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="scale(1,-1)"><g data-mml-node="math"><g data-mml-node="mi"><use data-c="1D437" xlink:href="#MJX-TEX-I-1D437"></use></g></g></g></svg></mjx-container>.</li>
<li><mjx-container class="MathJax" jax="SVG"><svg style="vertical-align:-0.564ex" xmlns="http://www.w3.org/2000/svg" width="3.131ex" height="2.26ex" role="img" focusable="false" viewBox="0 -749.5 1384 999" xmlns:xlink="http://www.w3.org/1999/xlink"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="scale(1,-1)"><g data-mml-node="math"><g data-mml-node="mo" transform="translate(0 -0.5)"><use data-c="7C" xlink:href="#MJX-TEX-N-7C"></use></g><g data-mml-node="mi" transform="translate(278,0)"><use data-c="1D437" xlink:href="#MJX-TEX-I-1D437"></use></g><g data-mml-node="mo" transform="translate(1106,0) translate(0 -0.5)"><use data-c="7C" xlink:href="#MJX-TEX-N-7C"></use></g></g></g></svg></mjx-container> is the length of the document <mjx-container class="MathJax" jax="SVG"><svg style="vertical-align:0" xmlns="http://www.w3.org/2000/svg" width="1.873ex" height="1.545ex" role="img" focusable="false" viewBox="0 -683 828 683" xmlns:xlink="http://www.w3.org/1999/xlink"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="scale(1,-1)"><g data-mml-node="math"><g data-mml-node="mi"><use data-c="1D437" xlink:href="#MJX-TEX-I-1D437"></use></g></g></g></svg></mjx-container> in words.</li>
<li><mjx-container class="MathJax" jax="SVG"><svg style="vertical-align:-0.464ex" xmlns="http://www.w3.org/2000/svg" width="5.224ex" height="2.034ex" role="img" focusable="false" viewBox="0 -694 2309 899" xmlns:xlink="http://www.w3.org/1999/xlink"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="scale(1,-1)"><g data-mml-node="math"><g data-mml-node="mi"><use data-c="1D44E" xlink:href="#MJX-TEX-I-1D44E"></use></g><g data-mml-node="mi" transform="translate(529,0)"><use data-c="1D463" xlink:href="#MJX-TEX-I-1D463"></use></g><g data-mml-node="mi" transform="translate(1014,0)"><use data-c="1D454" xlink:href="#MJX-TEX-I-1D454"></use></g><g data-mml-node="mi" transform="translate(1491,0)"><use data-c="1D451" xlink:href="#MJX-TEX-I-1D451"></use></g><g data-mml-node="mi" transform="translate(2011,0)"><use data-c="1D459" xlink:href="#MJX-TEX-I-1D459"></use></g></g></g></svg></mjx-container> (average document length) is the average document length in the text collection from which documents are drawn.</li>
<li><mjx-container class="MathJax" jax="SVG"><svg style="vertical-align:-0.339ex" xmlns="http://www.w3.org/2000/svg" width="2.166ex" height="1.91ex" role="img" focusable="false" viewBox="0 -694 957.6 844" xmlns:xlink="http://www.w3.org/1999/xlink"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="scale(1,-1)"><g data-mml-node="math"><g data-mml-node="msub"><g data-mml-node="mi"><use data-c="1D458" xlink:href="#MJX-TEX-I-1D458"></use></g><g data-mml-node="mn" transform="translate(554,-150) scale(0.707)"><use data-c="31" xlink:href="#MJX-TEX-N-31"></use></g></g></g></g></svg></mjx-container> and <mjx-container class="MathJax" jax="SVG"><svg style="vertical-align:-0.025ex" xmlns="http://www.w3.org/2000/svg" width="0.971ex" height="1.595ex" role="img" focusable="false" viewBox="0 -694 429 705" xmlns:xlink="http://www.w3.org/1999/xlink"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="scale(1,-1)"><g data-mml-node="math"><g data-mml-node="mi"><use data-c="1D44F" xlink:href="#MJX-TEX-I-1D44F"></use></g></g></g></svg></mjx-container> are free parameters, used for advanced optimization. In common case, <mjx-container class="MathJax" jax="SVG"><svg style="vertical-align:-0.312ex" xmlns="http://www.w3.org/2000/svg" width="8.218ex" height="1.882ex" role="img" focusable="false" viewBox="0 -694 3632.6 832" xmlns:xlink="http://www.w3.org/1999/xlink"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="scale(1,-1)"><g data-mml-node="math"><g data-mml-node="mi"><use data-c="1D458" xlink:href="#MJX-TEX-I-1D458"></use></g><g data-mml-node="mn" transform="translate(521,0)"><use data-c="31" xlink:href="#MJX-TEX-N-31"></use></g><g data-mml-node="mo" transform="translate(1298.8,0)"><use data-c="2264" xlink:href="#MJX-TEX-N-2264"></use></g><g data-mml-node="mn" transform="translate(2354.6,0)"><use data-c="32" xlink:href="#MJX-TEX-N-32"></use><use data-c="2E" xlink:href="#MJX-TEX-N-2E" transform="translate(500,0)"></use><use data-c="30" xlink:href="#MJX-TEX-N-30" transform="translate(778,0)"></use></g></g></g></svg></mjx-container> &amp;&amp; <mjx-container class="MathJax" jax="SVG"><svg style="vertical-align:-0.312ex" xmlns="http://www.w3.org/2000/svg" width="8.218ex" height="1.882ex" role="img" focusable="false" viewBox="0 -694 3632.6 832" xmlns:xlink="http://www.w3.org/1999/xlink"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="scale(1,-1)"><g data-mml-node="math"><g data-mml-node="mi"><use data-c="1D458" xlink:href="#MJX-TEX-I-1D458"></use></g><g data-mml-node="mn" transform="translate(521,0)"><use data-c="31" xlink:href="#MJX-TEX-N-31"></use></g><g data-mml-node="mo" transform="translate(1298.8,0)"><use data-c="2265" xlink:href="#MJX-TEX-N-2265"></use></g><g data-mml-node="mn" transform="translate(2354.6,0)"><use data-c="31" xlink:href="#MJX-TEX-N-31"></use><use data-c="2E" xlink:href="#MJX-TEX-N-2E" transform="translate(500,0)"></use><use data-c="32" xlink:href="#MJX-TEX-N-32" transform="translate(778,0)"></use></g></g></g></svg></mjx-container> and <mjx-container class="MathJax" jax="SVG"><svg style="vertical-align:-0.186ex" xmlns="http://www.w3.org/2000/svg" width="8.01ex" height="1.756ex" role="img" focusable="false" viewBox="0 -694 3540.6 776" xmlns:xlink="http://www.w3.org/1999/xlink"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="scale(1,-1)"><g data-mml-node="math"><g data-mml-node="mi"><use data-c="1D44F" xlink:href="#MJX-TEX-I-1D44F"></use></g><g data-mml-node="mo" transform="translate(706.8,0)"><use data-c="3D" xlink:href="#MJX-TEX-N-3D"></use></g><g data-mml-node="mn" transform="translate(1762.6,0)"><use data-c="30" xlink:href="#MJX-TEX-N-30"></use><use data-c="2E" xlink:href="#MJX-TEX-N-2E" transform="translate(500,0)"></use><use data-c="37" xlink:href="#MJX-TEX-N-37" transform="translate(778,0)"></use><use data-c="35" xlink:href="#MJX-TEX-N-35" transform="translate(1278,0)"></use></g></g></g></svg></mjx-container>.</li>
</ol>
<mjx-container class="MathJax" jax="SVG" display="true"><svg style="vertical-align:-2.172ex" xmlns="http://www.w3.org/2000/svg" width="36.334ex" height="5.475ex" role="img" focusable="false" viewBox="0 -1460 16059.5 2420" xmlns:xlink="http://www.w3.org/1999/xlink"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="scale(1,-1)"><g data-mml-node="math"><g data-mml-node="mtext"><use data-c="49" xlink:href="#MJX-TEX-N-49"></use><use data-c="44" xlink:href="#MJX-TEX-N-44" transform="translate(361,0)"></use><use data-c="46" xlink:href="#MJX-TEX-N-46" transform="translate(1125,0)"></use></g><g data-mml-node="mo" transform="translate(1778,0)"><use data-c="28" xlink:href="#MJX-TEX-N-28"></use></g><g data-mml-node="msub" transform="translate(2167,0)"><g data-mml-node="mi"><use data-c="1D45E" xlink:href="#MJX-TEX-I-1D45E"></use></g><g data-mml-node="mi" transform="translate(479,-150) scale(0.707)"><use data-c="1D456" xlink:href="#MJX-TEX-I-1D456"></use></g></g><g data-mml-node="mo" transform="translate(2940,0)"><use data-c="29" xlink:href="#MJX-TEX-N-29"></use></g><g data-mml-node="mo" transform="translate(3606.7,0)"><use data-c="3D" xlink:href="#MJX-TEX-N-3D"></use></g><g data-mml-node="mi" transform="translate(4662.5,0)"><use data-c="6C" xlink:href="#MJX-TEX-N-6C"></use><use data-c="6E" xlink:href="#MJX-TEX-N-6E" transform="translate(278,0)"></use></g><g data-mml-node="mo" transform="translate(5496.5,0)"><use data-c="2061" xlink:href="#MJX-TEX-N-2061"></use></g><g data-mml-node="mrow" transform="translate(5663.2,0)"><g data-mml-node="mo" transform="translate(0 -0.5)"><use data-c="28" xlink:href="#MJX-TEX-S3-28"></use></g><g data-mml-node="mfrac" transform="translate(736,0)"><g data-mml-node="mrow" transform="translate(220,710)"><g data-mml-node="mi"><use data-c="1D441" xlink:href="#MJX-TEX-I-1D441"></use></g><g data-mml-node="mo" transform="translate(1110.2,0)"><use data-c="2212" xlink:href="#MJX-TEX-N-2212"></use></g><g data-mml-node="mi" transform="translate(2110.4,0)"><use data-c="1D45B" xlink:href="#MJX-TEX-I-1D45B"></use></g><g data-mml-node="mo" transform="translate(2710.4,0)"><use data-c="28" xlink:href="#MJX-TEX-N-28"></use></g><g data-mml-node="msub" transform="translate(3099.4,0)"><g data-mml-node="mi"><use data-c="1D45E" xlink:href="#MJX-TEX-I-1D45E"></use></g><g data-mml-node="mi" transform="translate(479,-150) scale(0.707)"><use data-c="1D456" xlink:href="#MJX-TEX-I-1D456"></use></g></g><g data-mml-node="mo" transform="translate(3872.4,0)"><use data-c="29" xlink:href="#MJX-TEX-N-29"></use></g><g data-mml-node="mo" transform="translate(4483.6,0)"><use data-c="2B" xlink:href="#MJX-TEX-N-2B"></use></g><g data-mml-node="mn" transform="translate(5483.8,0)"><use data-c="30" xlink:href="#MJX-TEX-N-30"></use><use data-c="2E" xlink:href="#MJX-TEX-N-2E" transform="translate(500,0)"></use><use data-c="35" xlink:href="#MJX-TEX-N-35" transform="translate(778,0)"></use></g></g><g data-mml-node="mrow" transform="translate(1275.2,-710)"><g data-mml-node="mi"><use data-c="1D45B" xlink:href="#MJX-TEX-I-1D45B"></use></g><g data-mml-node="mo" transform="translate(600,0)"><use data-c="28" xlink:href="#MJX-TEX-N-28"></use></g><g data-mml-node="msub" transform="translate(989,0)"><g data-mml-node="mi"><use data-c="1D45E" xlink:href="#MJX-TEX-I-1D45E"></use></g><g data-mml-node="mi" transform="translate(479,-150) scale(0.707)"><use data-c="1D456" xlink:href="#MJX-TEX-I-1D456"></use></g></g><g data-mml-node="mo" transform="translate(1762,0)"><use data-c="29" xlink:href="#MJX-TEX-N-29"></use></g><g data-mml-node="mo" transform="translate(2373.2,0)"><use data-c="2B" xlink:href="#MJX-TEX-N-2B"></use></g><g data-mml-node="mn" transform="translate(3373.4,0)"><use data-c="30" xlink:href="#MJX-TEX-N-30"></use><use data-c="2E" xlink:href="#MJX-TEX-N-2E" transform="translate(500,0)"></use><use data-c="35" xlink:href="#MJX-TEX-N-35" transform="translate(778,0)"></use></g></g><rect width="6961.8" height="60" x="120" y="220"></rect></g><g data-mml-node="mo" transform="translate(8160.1,0)"><use data-c="2B" xlink:href="#MJX-TEX-N-2B"></use></g><g data-mml-node="mn" transform="translate(9160.3,0)"><use data-c="31" xlink:href="#MJX-TEX-N-31"></use></g><g data-mml-node="mo" transform="translate(9660.3,0) translate(0 -0.5)"><use data-c="29" xlink:href="#MJX-TEX-S3-29"></use></g></g></g></g></svg></mjx-container>
<p>IDF (inverse document frequency) weight of the query term <mjx-container class="MathJax" jax="SVG"><svg style="vertical-align:-0.439ex" xmlns="http://www.w3.org/2000/svg" width="1.041ex" height="1.439ex" role="img" focusable="false" viewBox="0 -442 460 636" xmlns:xlink="http://www.w3.org/1999/xlink"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="scale(1,-1)"><g data-mml-node="math"><g data-mml-node="mi"><use data-c="1D45E" xlink:href="#MJX-TEX-I-1D45E"></use></g></g></g></svg></mjx-container>, where <mjx-container class="MathJax" jax="SVG"><svg style="vertical-align:0" xmlns="http://www.w3.org/2000/svg" width="2.009ex" height="1.545ex" role="img" focusable="false" viewBox="0 -683 888 683" xmlns:xlink="http://www.w3.org/1999/xlink"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="scale(1,-1)"><g data-mml-node="math"><g data-mml-node="mi"><use data-c="1D441" xlink:href="#MJX-TEX-I-1D441"></use></g></g></g></svg></mjx-container> is the total number of documents in the collection, and <mjx-container class="MathJax" jax="SVG"><svg style="vertical-align:-0.566ex" xmlns="http://www.w3.org/2000/svg" width="4.866ex" height="2.262ex" role="img" focusable="false" viewBox="0 -750 2151 1000" xmlns:xlink="http://www.w3.org/1999/xlink"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="scale(1,-1)"><g data-mml-node="math"><g data-mml-node="mi"><use data-c="1D45B" xlink:href="#MJX-TEX-I-1D45B"></use></g><g data-mml-node="mo" transform="translate(600,0)"><use data-c="28" xlink:href="#MJX-TEX-N-28"></use></g><g data-mml-node="msub" transform="translate(989,0)"><g data-mml-node="mi"><use data-c="1D45E" xlink:href="#MJX-TEX-I-1D45E"></use></g><g data-mml-node="mi" transform="translate(479,-150) scale(0.707)"><use data-c="1D456" xlink:href="#MJX-TEX-I-1D456"></use></g></g><g data-mml-node="mo" transform="translate(1762,0)"><use data-c="29" xlink:href="#MJX-TEX-N-29"></use></g></g></g></svg></mjx-container> is the number of documents containing <mjx-container class="MathJax" jax="SVG"><svg style="vertical-align:-0.439ex" xmlns="http://www.w3.org/2000/svg" width="1.749ex" height="1.439ex" role="img" focusable="false" viewBox="0 -442 773 636" xmlns:xlink="http://www.w3.org/1999/xlink"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="scale(1,-1)"><g data-mml-node="math"><g data-mml-node="msub"><g data-mml-node="mi"><use data-c="1D45E" xlink:href="#MJX-TEX-I-1D45E"></use></g><g data-mml-node="mi" transform="translate(479,-150) scale(0.707)"><use data-c="1D456" xlink:href="#MJX-TEX-I-1D456"></use></g></g></g></g></svg></mjx-container>.</p>
<h3 id="why-tf-idf-bm25-as-the-main-calculation">Why TF-IDF (BM25) as the main calculation<a href="#why-tf-idf-bm25-as-the-main-calculation"><span class="icon icon-link"></span></a></h3>
<img src="https://cat.yufan.me/images/recaps/vector-db-research/tf-idf-explain.png" alt=""/>
<p>A term that appears in many documents does not provide as much information about the relevance of a document. Using a logarithmic scale ensures that as the document frequency of a term increases, its influence on the BM25 score grows <strong>more slowly</strong>. Without a logarithmic function, common terms would disproportionately affect the score.</p>
<h3 id="how-elasticsearch-calculate-the-bm25">How Elasticsearch calculate the BM25<a href="#how-elasticsearch-calculate-the-bm25"><span class="icon icon-link"></span></a></h3>
<p>By default, Elasticsearch calculates scores on a per-shard basis by leveraging the Lucene built-in function <code>org.apache.lucene.search.similarities.BM25Similarity</code>. It&#x27;s also the default similarity algorithm in the Lucene&#x27;s <code>IndexSearcher</code>. If we want to get the index level score calculation, we need to change the <code>search_type</code> from <code>query_then_fetch</code> to <code>dfs_query_then_fetch</code>.</p>
<p>In <code>dfs_query_then_fetch</code> search, we will add the <code>org.elasticsearch.search.dfs.DfsPhase</code> in searching. It will collect all the status in <code>DfsSearchResult</code> which contains the shards document information and hits, etc. The <code>SearchPhaseController</code> will aggregate all the dfs search results into a <code>AggregatedDfs</code> to calculate the score. We can use this search type to get a consistent BM25 score across multiple index.</p>
<h3 id="do-we-need-use-dfs_query_then_fetch-in-cross-indexes-query">Do we need use dfs_query_then_fetch in cross-indexes query<a href="#do-we-need-use-dfs_query_then_fetch-in-cross-indexes-query"><span class="icon icon-link"></span></a></h3>
<p>The only difference between multiple indexes or shard based BM25 calculation is the <strong>IDF</strong>. But if the data are well distributed among all the indexes and the document count are large enough in every shard. The difference for <strong>IDF</strong> could be tiny because we use logarithmic. You can get the growth trend in the second chart above. In this scenario, we don&#x27;t need to use <code>dfs_query_then_fetch</code> to calculate the global BM25 which requires more resource to cache and calculate.</p>
<h3 id="sparse-bm25-in-milvus">Sparse-BM25 in Milvus<a href="#sparse-bm25-in-milvus"><span class="icon icon-link"></span></a></h3>
<img src="https://cat.yufan.me/images/recaps/vector-db-research/sparse-bm25-in-milvus.png" alt=""/>
<p>Starting from version 2.4, Milvus supports sparse vectors, and from version 2.5, it provides BM25 retrieval capabilities based on sparse vectors. With the built-in Sparse-BM25, Milvus offers native support for lexical retrieval. The specific features include:</p>
<ol>
<li><strong>Tokenization and Data Preprocessing</strong>: Implemented based on the open-source search library Tantivy, including features such as stemming, lemmatization, and stop-word filtering.</li>
<li><strong>Distributed Vocabulary and Term Frequency Management</strong>: Efficient support for managing and calculating term frequencies in large-scale corpora.</li>
<li><strong>Sparse Vector Generation and Similarity Calculation</strong>: Sparse vectors are constructed using the term frequency (Corpus TF) of the corpus, and query sparse vectors are built based on the query term frequency (Query TF) and global inverse document frequency (IDF). Similarity is then calculated using a specific BM25 distance function.</li>
<li><strong>Inverted Index Support</strong>: Implements an inverted index based on the WAND algorithm, with support for the Block-Max WAND algorithm and graph indexing currently under development.</li>
</ol>
<h4 id="pros-and-cons-of-sparse-bm25-in-milvus">Pros and Cons of Sparse-BM25 in Milvus<a href="#pros-and-cons-of-sparse-bm25-in-milvus"><span class="icon icon-link"></span></a></h4>
<ul>
<li>Full-text search in Milvus is still under heavy development which can see <a href="https://github.com/milvus-io/milvus/issues?q=full+text+search+" rel="nofollow" target="_blank">a lot of bugs in GitHub</a>.</li>
<li>Full-text search require creating extra Spare-Index on collections (the document set) which isn&#x27;t out of box like Elasticsearch.</li>
<li>Hybrid search on a collection with both ANN with BM25 can be ranked in a single requests and get the top K like Elasticsearch&#x27;s reciprocal rank fusion (RRF) <a href="https://github.com/elastic/elasticsearch/pull/93396" rel="nofollow" target="_blank">since 8.8.0</a>.</li>
</ul><style>
mjx-container[jax="SVG"] {
  direction: ltr;
}

mjx-container[jax="SVG"] > svg {
  overflow: visible;
  min-height: 1px;
  min-width: 1px;
}

mjx-container[jax="SVG"] > svg a {
  fill: blue;
  stroke: blue;
}

mjx-container[jax="SVG"][display="true"] {
  display: block;
  text-align: center;
  margin: 1em 0;
}

mjx-container[jax="SVG"][display="true"][width="full"] {
  display: flex;
}

mjx-container[jax="SVG"][justify="left"] {
  text-align: left;
}

mjx-container[jax="SVG"][justify="right"] {
  text-align: right;
}

g[data-mml-node="merror"] > g {
  fill: red;
  stroke: red;
}

g[data-mml-node="merror"] > rect[data-background] {
  fill: yellow;
  stroke: none;
}

g[data-mml-node="mtable"] > line[data-line], svg[data-table] > g > line[data-line] {
  stroke-width: 70px;
  fill: none;
}

g[data-mml-node="mtable"] > rect[data-frame], svg[data-table] > g > rect[data-frame] {
  stroke-width: 70px;
  fill: none;
}

g[data-mml-node="mtable"] > .mjx-dashed, svg[data-table] > g > .mjx-dashed {
  stroke-dasharray: 140;
}

g[data-mml-node="mtable"] > .mjx-dotted, svg[data-table] > g > .mjx-dotted {
  stroke-linecap: round;
  stroke-dasharray: 0,140;
}

g[data-mml-node="mtable"] > g > svg {
  overflow: visible;
}

[jax="SVG"] mjx-tool {
  display: inline-block;
  position: relative;
  width: 0;
  height: 0;
}

[jax="SVG"] mjx-tool > mjx-tip {
  position: absolute;
  top: 0;
  left: 0;
}

mjx-tool > mjx-tip {
  display: inline-block;
  padding: .2em;
  border: 1px solid #888;
  font-size: 70%;
  background-color: #F8F8F8;
  color: black;
  box-shadow: 2px 2px 5px #AAAAAA;
}

g[data-mml-node="maction"][data-toggle] {
  cursor: pointer;
}

mjx-status {
  display: block;
  position: fixed;
  left: 1em;
  bottom: 1em;
  min-width: 25%;
  padding: .2em .4em;
  border: 1px solid #888;
  font-size: 90%;
  background-color: #F8F8F8;
  color: black;
}

foreignObject[data-mjx-xml] {
  font-family: initial;
  line-height: normal;
  overflow: visible;
}

mjx-container[jax="SVG"] path[data-c], mjx-container[jax="SVG"] use[data-c] {
  stroke-width: 3;
}
</style><svg style="display:none" id="MJX-SVG-global-cache"><defs><path id="MJX-TEX-N-73" d="M295 316Q295 356 268 385T190 414Q154 414 128 401Q98 382 98 349Q97 344 98 336T114 312T157 287Q175 282 201 278T245 269T277 256Q294 248 310 236T342 195T359 133Q359 71 321 31T198 -10H190Q138 -10 94 26L86 19L77 10Q71 4 65 -1L54 -11H46H42Q39 -11 33 -5V74V132Q33 153 35 157T45 162H54Q66 162 70 158T75 146T82 119T101 77Q136 26 198 26Q295 26 295 104Q295 133 277 151Q257 175 194 187T111 210Q75 227 54 256T33 318Q33 357 50 384T93 424T143 442T187 447H198Q238 447 268 432L283 424L292 431Q302 440 314 448H322H326Q329 448 335 442V310L329 304H301Q295 310 295 316Z"></path><path id="MJX-TEX-N-63" d="M370 305T349 305T313 320T297 358Q297 381 312 396Q317 401 317 402T307 404Q281 408 258 408Q209 408 178 376Q131 329 131 219Q131 137 162 90Q203 29 272 29Q313 29 338 55T374 117Q376 125 379 127T395 129H409Q415 123 415 120Q415 116 411 104T395 71T366 33T318 2T249 -11Q163 -11 99 53T34 214Q34 318 99 383T250 448T370 421T404 357Q404 334 387 320Z"></path><path id="MJX-TEX-N-6F" d="M28 214Q28 309 93 378T250 448Q340 448 405 380T471 215Q471 120 407 55T250 -10Q153 -10 91 57T28 214ZM250 30Q372 30 372 193V225V250Q372 272 371 288T364 326T348 362T317 390T268 410Q263 411 252 411Q222 411 195 399Q152 377 139 338T126 246V226Q126 130 145 91Q177 30 250 30Z"></path><path id="MJX-TEX-N-72" d="M36 46H50Q89 46 97 60V68Q97 77 97 91T98 122T98 161T98 203Q98 234 98 269T98 328L97 351Q94 370 83 376T38 385H20V408Q20 431 22 431L32 432Q42 433 60 434T96 436Q112 437 131 438T160 441T171 442H174V373Q213 441 271 441H277Q322 441 343 419T364 373Q364 352 351 337T313 322Q288 322 276 338T263 372Q263 381 265 388T270 400T273 405Q271 407 250 401Q234 393 226 386Q179 341 179 207V154Q179 141 179 127T179 101T180 81T180 66V61Q181 59 183 57T188 54T193 51T200 49T207 48T216 47T225 47T235 46T245 46H276V0H267Q249 3 140 3Q37 3 28 0H20V46H36Z"></path><path id="MJX-TEX-N-65" d="M28 218Q28 273 48 318T98 391T163 433T229 448Q282 448 320 430T378 380T406 316T415 245Q415 238 408 231H126V216Q126 68 226 36Q246 30 270 30Q312 30 342 62Q359 79 369 104L379 128Q382 131 395 131H398Q415 131 415 121Q415 117 412 108Q393 53 349 21T250 -11Q155 -11 92 58T28 218ZM333 275Q322 403 238 411H236Q228 411 220 410T195 402T166 381T143 340T127 274V267H333V275Z"></path><path id="MJX-TEX-N-28" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"></path><path id="MJX-TEX-I-1D437" d="M287 628Q287 635 230 637Q207 637 200 638T193 647Q193 655 197 667T204 682Q206 683 403 683Q570 682 590 682T630 676Q702 659 752 597T803 431Q803 275 696 151T444 3L430 1L236 0H125H72Q48 0 41 2T33 11Q33 13 36 25Q40 41 44 43T67 46Q94 46 127 49Q141 52 146 61Q149 65 218 339T287 628ZM703 469Q703 507 692 537T666 584T629 613T590 629T555 636Q553 636 541 636T512 636T479 637H436Q392 637 386 627Q384 623 313 339T242 52Q242 48 253 48T330 47Q335 47 349 47T373 46Q499 46 581 128Q617 164 640 212T683 339T703 469Z"></path><path id="MJX-TEX-N-2C" d="M78 35T78 60T94 103T137 121Q165 121 187 96T210 8Q210 -27 201 -60T180 -117T154 -158T130 -185T117 -194Q113 -194 104 -185T95 -172Q95 -168 106 -156T131 -126T157 -76T173 -3V9L172 8Q170 7 167 6T161 3T152 1T140 0Q113 0 96 17Z"></path><path id="MJX-TEX-I-1D444" d="M399 -80Q399 -47 400 -30T402 -11V-7L387 -11Q341 -22 303 -22Q208 -22 138 35T51 201Q50 209 50 244Q50 346 98 438T227 601Q351 704 476 704Q514 704 524 703Q621 689 680 617T740 435Q740 255 592 107Q529 47 461 16L444 8V3Q444 2 449 -24T470 -66T516 -82Q551 -82 583 -60T625 -3Q631 11 638 11Q647 11 649 2Q649 -6 639 -34T611 -100T557 -165T481 -194Q399 -194 399 -87V-80ZM636 468Q636 523 621 564T580 625T530 655T477 665Q429 665 379 640Q277 591 215 464T153 216Q153 110 207 59Q231 38 236 38V46Q236 86 269 120T347 155Q372 155 390 144T417 114T429 82T435 55L448 64Q512 108 557 185T619 334T636 468ZM314 18Q362 18 404 39L403 49Q399 104 366 115Q354 117 347 117Q344 117 341 117T337 118Q317 118 296 98T274 52Q274 18 314 18Z"></path><path id="MJX-TEX-N-29" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"></path><path id="MJX-TEX-N-3D" d="M56 347Q56 360 70 367H707Q722 359 722 347Q722 336 708 328L390 327H72Q56 332 56 347ZM56 153Q56 168 72 173H708Q722 163 722 153Q722 140 707 133H70Q56 140 56 153Z"></path><path id="MJX-TEX-LO-2211" d="M60 948Q63 950 665 950H1267L1325 815Q1384 677 1388 669H1348L1341 683Q1320 724 1285 761Q1235 809 1174 838T1033 881T882 898T699 902H574H543H251L259 891Q722 258 724 252Q725 250 724 246Q721 243 460 -56L196 -356Q196 -357 407 -357Q459 -357 548 -357T676 -358Q812 -358 896 -353T1063 -332T1204 -283T1307 -196Q1328 -170 1348 -124H1388Q1388 -125 1381 -145T1356 -210T1325 -294L1267 -449L666 -450Q64 -450 61 -448Q55 -446 55 -439Q55 -437 57 -433L590 177Q590 178 557 222T452 366T322 544L56 909L55 924Q55 945 60 948Z"></path><path id="MJX-TEX-I-1D456" d="M184 600Q184 624 203 642T247 661Q265 661 277 649T290 619Q290 596 270 577T226 557Q211 557 198 567T184 600ZM21 287Q21 295 30 318T54 369T98 420T158 442Q197 442 223 419T250 357Q250 340 236 301T196 196T154 83Q149 61 149 51Q149 26 166 26Q175 26 185 29T208 43T235 78T260 137Q263 149 265 151T282 153Q302 153 302 143Q302 135 293 112T268 61T223 11T161 -11Q129 -11 102 10T74 74Q74 91 79 106T122 220Q160 321 166 341T173 380Q173 404 156 404H154Q124 404 99 371T61 287Q60 286 59 284T58 281T56 279T53 278T49 278T41 278H27Q21 284 21 287Z"></path><path id="MJX-TEX-N-31" d="M213 578L200 573Q186 568 160 563T102 556H83V602H102Q149 604 189 617T245 641T273 663Q275 666 285 666Q294 666 302 660V361L303 61Q310 54 315 52T339 48T401 46H427V0H416Q395 3 257 3Q121 3 100 0H88V46H114Q136 46 152 46T177 47T193 50T201 52T207 57T213 61V578Z"></path><path id="MJX-TEX-I-1D45B" d="M21 287Q22 293 24 303T36 341T56 388T89 425T135 442Q171 442 195 424T225 390T231 369Q231 367 232 367L243 378Q304 442 382 442Q436 442 469 415T503 336T465 179T427 52Q427 26 444 26Q450 26 453 27Q482 32 505 65T540 145Q542 153 560 153Q580 153 580 145Q580 144 576 130Q568 101 554 73T508 17T439 -10Q392 -10 371 17T350 73Q350 92 386 193T423 345Q423 404 379 404H374Q288 404 229 303L222 291L189 157Q156 26 151 16Q138 -11 108 -11Q95 -11 87 -5T76 7T74 17Q74 30 112 180T152 343Q153 348 153 366Q153 405 129 405Q91 405 66 305Q60 285 60 284Q58 278 41 278H27Q21 284 21 287Z"></path><path id="MJX-TEX-N-49" d="M328 0Q307 3 180 3T32 0H21V46H43Q92 46 106 49T126 60Q128 63 128 342Q128 620 126 623Q122 628 118 630T96 635T43 637H21V683H32Q53 680 180 680T328 683H339V637H317Q268 637 254 634T234 623Q232 620 232 342Q232 63 234 60Q238 55 242 53T264 48T317 46H339V0H328Z"></path><path id="MJX-TEX-N-44" d="M130 622Q123 629 119 631T103 634T60 637H27V683H228Q399 682 419 682T461 676Q504 667 546 641T626 573T685 470T708 336Q708 210 634 116T442 3Q429 1 228 0H27V46H60Q102 47 111 49T130 61V622ZM593 338Q593 439 571 501T493 602Q439 637 355 637H322H294Q238 637 234 628Q231 624 231 344Q231 62 232 59Q233 49 248 48T339 46H350Q456 46 515 95Q561 133 577 191T593 338Z"></path><path id="MJX-TEX-N-46" d="M128 619Q121 626 117 628T101 631T58 634H25V680H582V676Q584 670 596 560T610 444V440H570V444Q563 493 561 501Q555 538 543 563T516 601T477 622T431 631T374 633H334H286Q252 633 244 631T233 621Q232 619 232 490V363H284Q287 363 303 363T327 364T349 367T372 373T389 385Q407 403 410 459V480H450V200H410V221Q407 276 389 296Q381 303 371 307T348 313T327 316T303 317T284 317H232V189L233 61Q240 54 245 52T270 48T333 46H360V0H348Q324 3 182 3Q51 3 36 0H25V46H58Q100 47 109 49T128 61V619Z"></path><path id="MJX-TEX-I-1D45E" d="M33 157Q33 258 109 349T280 441Q340 441 372 389Q373 390 377 395T388 406T404 418Q438 442 450 442Q454 442 457 439T460 434Q460 425 391 149Q320 -135 320 -139Q320 -147 365 -148H390Q396 -156 396 -157T393 -175Q389 -188 383 -194H370Q339 -192 262 -192Q234 -192 211 -192T174 -192T157 -193Q143 -193 143 -185Q143 -182 145 -170Q149 -154 152 -151T172 -148Q220 -148 230 -141Q238 -136 258 -53T279 32Q279 33 272 29Q224 -10 172 -10Q117 -10 75 30T33 157ZM352 326Q329 405 277 405Q242 405 210 374T160 293Q131 214 119 129Q119 126 119 118T118 106Q118 61 136 44T179 26Q233 26 290 98L298 109L352 326Z"></path><path id="MJX-TEX-N-22C5" d="M78 250Q78 274 95 292T138 310Q162 310 180 294T199 251Q199 226 182 208T139 190T96 207T78 250Z"></path><path id="MJX-TEX-I-1D453" d="M118 -162Q120 -162 124 -164T135 -167T147 -168Q160 -168 171 -155T187 -126Q197 -99 221 27T267 267T289 382V385H242Q195 385 192 387Q188 390 188 397L195 425Q197 430 203 430T250 431Q298 431 298 432Q298 434 307 482T319 540Q356 705 465 705Q502 703 526 683T550 630Q550 594 529 578T487 561Q443 561 443 603Q443 622 454 636T478 657L487 662Q471 668 457 668Q445 668 434 658T419 630Q412 601 403 552T387 469T380 433Q380 431 435 431Q480 431 487 430T498 424Q499 420 496 407T491 391Q489 386 482 386T428 385H372L349 263Q301 15 282 -47Q255 -132 212 -173Q175 -205 139 -205Q107 -205 81 -186T55 -132Q55 -95 76 -78T118 -61Q162 -61 162 -103Q162 -122 151 -136T127 -157L118 -162Z"></path><path id="MJX-TEX-I-1D458" d="M121 647Q121 657 125 670T137 683Q138 683 209 688T282 694Q294 694 294 686Q294 679 244 477Q194 279 194 272Q213 282 223 291Q247 309 292 354T362 415Q402 442 438 442Q468 442 485 423T503 369Q503 344 496 327T477 302T456 291T438 288Q418 288 406 299T394 328Q394 353 410 369T442 390L458 393Q446 405 434 405H430Q398 402 367 380T294 316T228 255Q230 254 243 252T267 246T293 238T320 224T342 206T359 180T365 147Q365 130 360 106T354 66Q354 26 381 26Q429 26 459 145Q461 153 479 153H483Q499 153 499 144Q499 139 496 130Q455 -11 378 -11Q333 -11 305 15T277 90Q277 108 280 121T283 145Q283 167 269 183T234 206T200 217T182 220H180Q168 178 159 139T145 81T136 44T129 20T122 7T111 -2Q98 -11 83 -11Q66 -11 57 -1T48 16Q48 26 85 176T158 471L195 616Q196 629 188 632T149 637H144Q134 637 131 637T124 640T121 647Z"></path><path id="MJX-TEX-N-2B" d="M56 237T56 250T70 270H369V420L370 570Q380 583 389 583Q402 583 409 568V270H707Q722 262 722 250T707 230H409V-68Q401 -82 391 -82H389H387Q375 -82 369 -68V230H70Q56 237 56 250Z"></path><path id="MJX-TEX-LO-28" d="M180 96T180 250T205 541T266 770T353 944T444 1069T527 1150H555Q561 1144 561 1141Q561 1137 545 1120T504 1072T447 995T386 878T330 721T288 513T272 251Q272 133 280 56Q293 -87 326 -209T399 -405T475 -531T536 -609T561 -640Q561 -643 555 -649H527Q483 -612 443 -568T353 -443T266 -270T205 -41Z"></path><path id="MJX-TEX-N-2212" d="M84 237T84 250T98 270H679Q694 262 694 250T679 230H98Q84 237 84 250Z"></path><path id="MJX-TEX-I-1D44F" d="M73 647Q73 657 77 670T89 683Q90 683 161 688T234 694Q246 694 246 685T212 542Q204 508 195 472T180 418L176 399Q176 396 182 402Q231 442 283 442Q345 442 383 396T422 280Q422 169 343 79T173 -11Q123 -11 82 27T40 150V159Q40 180 48 217T97 414Q147 611 147 623T109 637Q104 637 101 637H96Q86 637 83 637T76 640T73 647ZM336 325V331Q336 405 275 405Q258 405 240 397T207 376T181 352T163 330L157 322L136 236Q114 150 114 114Q114 66 138 42Q154 26 178 26Q211 26 245 58Q270 81 285 114T318 219Q336 291 336 325Z"></path><path id="MJX-TEX-N-7C" d="M139 -249H137Q125 -249 119 -235V251L120 737Q130 750 139 750Q152 750 159 735V-235Q151 -249 141 -249H139Z"></path><path id="MJX-TEX-N-61" d="M137 305T115 305T78 320T63 359Q63 394 97 421T218 448Q291 448 336 416T396 340Q401 326 401 309T402 194V124Q402 76 407 58T428 40Q443 40 448 56T453 109V145H493V106Q492 66 490 59Q481 29 455 12T400 -6T353 12T329 54V58L327 55Q325 52 322 49T314 40T302 29T287 17T269 6T247 -2T221 -8T190 -11Q130 -11 82 20T34 107Q34 128 41 147T68 188T116 225T194 253T304 268H318V290Q318 324 312 340Q290 411 215 411Q197 411 181 410T156 406T148 403Q170 388 170 359Q170 334 154 320ZM126 106Q126 75 150 51T209 26Q247 26 276 49T315 109Q317 116 318 175Q318 233 317 233Q309 233 296 232T251 223T193 203T147 166T126 106Z"></path><path id="MJX-TEX-N-76" d="M338 431Q344 429 422 429Q479 429 503 431H508V385H497Q439 381 423 345Q421 341 356 172T288 -2Q283 -11 263 -11Q244 -11 239 -2Q99 359 98 364Q93 378 82 381T43 385H19V431H25L33 430Q41 430 53 430T79 430T104 429T122 428Q217 428 232 431H240V385H226Q187 384 184 370Q184 366 235 234L286 102L377 341V349Q377 363 367 372T349 383T335 385H331V431H338Z"></path><path id="MJX-TEX-N-67" d="M329 409Q373 453 429 453Q459 453 472 434T485 396Q485 382 476 371T449 360Q416 360 412 390Q410 404 415 411Q415 412 416 414V415Q388 412 363 393Q355 388 355 386Q355 385 359 381T368 369T379 351T388 325T392 292Q392 230 343 187T222 143Q172 143 123 171Q112 153 112 133Q112 98 138 81Q147 75 155 75T227 73Q311 72 335 67Q396 58 431 26Q470 -13 470 -72Q470 -139 392 -175Q332 -206 250 -206Q167 -206 107 -175Q29 -140 29 -75Q29 -39 50 -15T92 18L103 24Q67 55 67 108Q67 155 96 193Q52 237 52 292Q52 355 102 398T223 442Q274 442 318 416L329 409ZM299 343Q294 371 273 387T221 404Q192 404 171 388T145 343Q142 326 142 292Q142 248 149 227T179 192Q196 182 222 182Q244 182 260 189T283 207T294 227T299 242Q302 258 302 292T299 343ZM403 -75Q403 -50 389 -34T348 -11T299 -2T245 0H218Q151 0 138 -6Q118 -15 107 -34T95 -74Q95 -84 101 -97T122 -127T170 -155T250 -167Q319 -167 361 -139T403 -75Z"></path><path id="MJX-TEX-N-64" d="M376 495Q376 511 376 535T377 568Q377 613 367 624T316 637H298V660Q298 683 300 683L310 684Q320 685 339 686T376 688Q393 689 413 690T443 693T454 694H457V390Q457 84 458 81Q461 61 472 55T517 46H535V0Q533 0 459 -5T380 -11H373V44L365 37Q307 -11 235 -11Q158 -11 96 50T34 215Q34 315 97 378T244 442Q319 442 376 393V495ZM373 342Q328 405 260 405Q211 405 173 369Q146 341 139 305T131 211Q131 155 138 120T173 59Q203 26 251 26Q322 26 373 103V342Z"></path><path id="MJX-TEX-N-6C" d="M42 46H56Q95 46 103 60V68Q103 77 103 91T103 124T104 167T104 217T104 272T104 329Q104 366 104 407T104 482T104 542T103 586T103 603Q100 622 89 628T44 637H26V660Q26 683 28 683L38 684Q48 685 67 686T104 688Q121 689 141 690T171 693T182 694H185V379Q185 62 186 60Q190 52 198 49Q219 46 247 46H263V0H255L232 1Q209 2 183 2T145 3T107 3T57 1L34 0H26V46H42Z"></path><path id="MJX-TEX-LO-29" d="M35 1138Q35 1150 51 1150H56H69Q113 1113 153 1069T243 944T330 771T391 541T416 250T391 -40T330 -270T243 -443T152 -568T69 -649H56Q43 -649 39 -647T35 -637Q65 -607 110 -548Q283 -316 316 56Q324 133 324 251Q324 368 316 445Q278 877 48 1123Q36 1137 35 1138Z"></path><path id="MJX-TEX-N-32" d="M109 429Q82 429 66 447T50 491Q50 562 103 614T235 666Q326 666 387 610T449 465Q449 422 429 383T381 315T301 241Q265 210 201 149L142 93L218 92Q375 92 385 97Q392 99 409 186V189H449V186Q448 183 436 95T421 3V0H50V19V31Q50 38 56 46T86 81Q115 113 136 137Q145 147 170 174T204 211T233 244T261 278T284 308T305 340T320 369T333 401T340 431T343 464Q343 527 309 573T212 619Q179 619 154 602T119 569T109 550Q109 549 114 549Q132 549 151 535T170 489Q170 464 154 447T109 429Z"></path><path id="MJX-TEX-I-1D44E" d="M33 157Q33 258 109 349T280 441Q331 441 370 392Q386 422 416 422Q429 422 439 414T449 394Q449 381 412 234T374 68Q374 43 381 35T402 26Q411 27 422 35Q443 55 463 131Q469 151 473 152Q475 153 483 153H487Q506 153 506 144Q506 138 501 117T481 63T449 13Q436 0 417 -8Q409 -10 393 -10Q359 -10 336 5T306 36L300 51Q299 52 296 50Q294 48 292 46Q233 -10 172 -10Q117 -10 75 30T33 157ZM351 328Q351 334 346 350T323 385T277 405Q242 405 210 374T160 293Q131 214 119 129Q119 126 119 118T118 106Q118 61 136 44T179 26Q217 26 254 59T298 110Q300 114 325 217T351 328Z"></path><path id="MJX-TEX-I-1D463" d="M173 380Q173 405 154 405Q130 405 104 376T61 287Q60 286 59 284T58 281T56 279T53 278T49 278T41 278H27Q21 284 21 287Q21 294 29 316T53 368T97 419T160 441Q202 441 225 417T249 361Q249 344 246 335Q246 329 231 291T200 202T182 113Q182 86 187 69Q200 26 250 26Q287 26 319 60T369 139T398 222T409 277Q409 300 401 317T383 343T365 361T357 383Q357 405 376 424T417 443Q436 443 451 425T467 367Q467 340 455 284T418 159T347 40T241 -11Q177 -11 139 22Q102 54 102 117Q102 148 110 181T151 298Q173 362 173 380Z"></path><path id="MJX-TEX-I-1D454" d="M311 43Q296 30 267 15T206 0Q143 0 105 45T66 160Q66 265 143 353T314 442Q361 442 401 394L404 398Q406 401 409 404T418 412T431 419T447 422Q461 422 470 413T480 394Q480 379 423 152T363 -80Q345 -134 286 -169T151 -205Q10 -205 10 -137Q10 -111 28 -91T74 -71Q89 -71 102 -80T116 -111Q116 -121 114 -130T107 -144T99 -154T92 -162L90 -164H91Q101 -167 151 -167Q189 -167 211 -155Q234 -144 254 -122T282 -75Q288 -56 298 -13Q311 35 311 43ZM384 328L380 339Q377 350 375 354T369 368T359 382T346 393T328 402T306 405Q262 405 221 352Q191 313 171 233T151 117Q151 38 213 38Q269 38 323 108L331 118L384 328Z"></path><path id="MJX-TEX-I-1D451" d="M366 683Q367 683 438 688T511 694Q523 694 523 686Q523 679 450 384T375 83T374 68Q374 26 402 26Q411 27 422 35Q443 55 463 131Q469 151 473 152Q475 153 483 153H487H491Q506 153 506 145Q506 140 503 129Q490 79 473 48T445 8T417 -8Q409 -10 393 -10Q359 -10 336 5T306 36L300 51Q299 52 296 50Q294 48 292 46Q233 -10 172 -10Q117 -10 75 30T33 157Q33 205 53 255T101 341Q148 398 195 420T280 442Q336 442 364 400Q369 394 369 396Q370 400 396 505T424 616Q424 629 417 632T378 637H357Q351 643 351 645T353 664Q358 683 366 683ZM352 326Q329 405 277 405Q242 405 210 374T160 293Q131 214 119 129Q119 126 119 118T118 106Q118 61 136 44T179 26Q233 26 290 98L298 109L352 326Z"></path><path id="MJX-TEX-I-1D459" d="M117 59Q117 26 142 26Q179 26 205 131Q211 151 215 152Q217 153 225 153H229Q238 153 241 153T246 151T248 144Q247 138 245 128T234 90T214 43T183 6T137 -11Q101 -11 70 11T38 85Q38 97 39 102L104 360Q167 615 167 623Q167 626 166 628T162 632T157 634T149 635T141 636T132 637T122 637Q112 637 109 637T101 638T95 641T94 647Q94 649 96 661Q101 680 107 682T179 688Q194 689 213 690T243 693T254 694Q266 694 266 686Q266 675 193 386T118 83Q118 81 118 75T117 65V59Z"></path><path id="MJX-TEX-N-2264" d="M674 636Q682 636 688 630T694 615T687 601Q686 600 417 472L151 346L399 228Q687 92 691 87Q694 81 694 76Q694 58 676 56H670L382 192Q92 329 90 331Q83 336 83 348Q84 359 96 365Q104 369 382 500T665 634Q669 636 674 636ZM84 -118Q84 -108 99 -98H678Q694 -104 694 -118Q694 -130 679 -138H98Q84 -131 84 -118Z"></path><path id="MJX-TEX-N-2E" d="M78 60Q78 84 95 102T138 120Q162 120 180 104T199 61Q199 36 182 18T139 0T96 17T78 60Z"></path><path id="MJX-TEX-N-30" d="M96 585Q152 666 249 666Q297 666 345 640T423 548Q460 465 460 320Q460 165 417 83Q397 41 362 16T301 -15T250 -22Q224 -22 198 -16T137 16T82 83Q39 165 39 320Q39 494 96 585ZM321 597Q291 629 250 629Q208 629 178 597Q153 571 145 525T137 333Q137 175 145 125T181 46Q209 16 250 16Q290 16 318 46Q347 76 354 130T362 333Q362 478 354 524T321 597Z"></path><path id="MJX-TEX-N-2265" d="M83 616Q83 624 89 630T99 636Q107 636 253 568T543 431T687 361Q694 356 694 346T687 331Q685 329 395 192L107 56H101Q83 58 83 76Q83 77 83 79Q82 86 98 95Q117 105 248 167Q326 204 378 228L626 346L360 472Q291 505 200 548Q112 589 98 597T83 616ZM84 -118Q84 -108 99 -98H678Q694 -104 694 -118Q694 -130 679 -138H98Q84 -131 84 -118Z"></path><path id="MJX-TEX-N-37" d="M55 458Q56 460 72 567L88 674Q88 676 108 676H128V672Q128 662 143 655T195 646T364 644H485V605L417 512Q408 500 387 472T360 435T339 403T319 367T305 330T292 284T284 230T278 162T275 80Q275 66 275 52T274 28V19Q270 2 255 -10T221 -22Q210 -22 200 -19T179 0T168 40Q168 198 265 368Q285 400 349 489L395 552H302Q128 552 119 546Q113 543 108 522T98 479L95 458V455H55V458Z"></path><path id="MJX-TEX-N-35" d="M164 157Q164 133 148 117T109 101H102Q148 22 224 22Q294 22 326 82Q345 115 345 210Q345 313 318 349Q292 382 260 382H254Q176 382 136 314Q132 307 129 306T114 304Q97 304 95 310Q93 314 93 485V614Q93 664 98 664Q100 666 102 666Q103 666 123 658T178 642T253 634Q324 634 389 662Q397 666 402 666Q410 666 410 648V635Q328 538 205 538Q174 538 149 544L139 546V374Q158 388 169 396T205 412T256 420Q337 420 393 355T449 201Q449 109 385 44T229 -22Q148 -22 99 32T50 154Q50 178 61 192T84 210T107 214Q132 214 148 197T164 157Z"></path><path id="MJX-TEX-N-6E" d="M41 46H55Q94 46 102 60V68Q102 77 102 91T102 122T103 161T103 203Q103 234 103 269T102 328V351Q99 370 88 376T43 385H25V408Q25 431 27 431L37 432Q47 433 65 434T102 436Q119 437 138 438T167 441T178 442H181V402Q181 364 182 364T187 369T199 384T218 402T247 421T285 437Q305 442 336 442Q450 438 463 329Q464 322 464 190V104Q464 66 466 59T477 49Q498 46 526 46H542V0H534L510 1Q487 2 460 2T422 3Q319 3 310 0H302V46H318Q379 46 379 62Q380 64 380 200Q379 335 378 343Q372 371 358 385T334 402T308 404Q263 404 229 370Q202 343 195 315T187 232V168V108Q187 78 188 68T191 55T200 49Q221 46 249 46H265V0H257L234 1Q210 2 183 2T145 3Q42 3 33 0H25V46H41Z"></path><path id="MJX-TEX-N-2061" d=""></path><path id="MJX-TEX-S3-28" d="M701 -940Q701 -943 695 -949H664Q662 -947 636 -922T591 -879T537 -818T475 -737T412 -636T350 -511T295 -362T250 -186T221 17T209 251Q209 962 573 1361Q596 1386 616 1405T649 1437T664 1450H695Q701 1444 701 1441Q701 1436 681 1415T629 1356T557 1261T476 1118T400 927T340 675T308 359Q306 321 306 250Q306 -139 400 -430T690 -924Q701 -936 701 -940Z"></path><path id="MJX-TEX-I-1D441" d="M234 637Q231 637 226 637Q201 637 196 638T191 649Q191 676 202 682Q204 683 299 683Q376 683 387 683T401 677Q612 181 616 168L670 381Q723 592 723 606Q723 633 659 637Q635 637 635 648Q635 650 637 660Q641 676 643 679T653 683Q656 683 684 682T767 680Q817 680 843 681T873 682Q888 682 888 672Q888 650 880 642Q878 637 858 637Q787 633 769 597L620 7Q618 0 599 0Q585 0 582 2Q579 5 453 305L326 604L261 344Q196 88 196 79Q201 46 268 46H278Q284 41 284 38T282 19Q278 6 272 0H259Q228 2 151 2Q123 2 100 2T63 2T46 1Q31 1 31 10Q31 14 34 26T39 40Q41 46 62 46Q130 49 150 85Q154 91 221 362L289 634Q287 635 234 637Z"></path><path id="MJX-TEX-S3-29" d="M34 1438Q34 1446 37 1448T50 1450H56H71Q73 1448 99 1423T144 1380T198 1319T260 1238T323 1137T385 1013T440 864T485 688T514 485T526 251Q526 134 519 53Q472 -519 162 -860Q139 -885 119 -904T86 -936T71 -949H56Q43 -949 39 -947T34 -937Q88 -883 140 -813Q428 -430 428 251Q428 453 402 628T338 922T245 1146T145 1309T46 1425Q44 1427 42 1429T39 1433T36 1436L34 1438Z"></path></defs></svg>]]></content:encoded>
            <author>syhily@gmail.com (雨帆)</author>
            <category domain="https://stage.yufan.me/tags/vectordb">VectorDB</category>
            <category domain="https://stage.yufan.me/tags/elasticsearch">Elasticsearch</category>
            <category domain="https://stage.yufan.me/cats/notes">笔记</category>
            <enclosure url="https://stage.yufan.me/images/og/vector-db-research.png" length="0" type="image/png"/>
        </item>
    </channel>
</rss>