Skip to content

Add Milvus 2.6 API support#112

Open
richzw wants to merge 4 commits intomilvus-io:mainfrom
richzw:main
Open

Add Milvus 2.6 API support#112
richzw wants to merge 4 commits intomilvus-io:mainfrom
richzw:main

Conversation

@richzw
Copy link
Copy Markdown

@richzw richzw commented Apr 1, 2026

Summary

  • Update milvus-proto submodule from v2.6.1 to v2.6.13 and regenerate all proto Rust files
  • Add full support for Milvus 2.6 new data types, RPCs, search features, and schema evolution

New Data Types

Type DataType Wire Format
Geometry 24 WKB (GeometryArray) / WKT (GeometryWktArray)
Text 25 String (used with BM25 function)
Timestamptz 26 i64 microseconds (TimestamptzArray)
Int8Vector 105 bytes (1 byte/dim)
Float16Vector 103 bytes (2 bytes/dim)
BFloat16Vector 104 bytes (2 bytes/dim)
SparseFloatVector -- SparseFloatArray
  • Value enum: added Int8Vector, Float16Vector, BFloat16Vector, Geometry, GeometryWkt, Timestamptz, SparseFloat variants
  • ValueVec: added Geometry, GeometryWkt, Timestamptz, SparseFloat, StructArray, VectorArray variants
  • Zero unimplemented!() panics remaining in value.rs and data.rs

New RPCs

  • truncate_collection() -- clear data without dropping collection
  • batch_describe_collections() -- describe multiple collections in one call
  • add_collection_field() -- schema evolution (add nullable field)
  • add/alter/drop_collection_function() -- server-side function management
  • run_analyzer() -- test text analyzers, returns tokenized results

Search & Query Enhancements

  • COSINE metric type + HNSW index support
  • BM25 full-text search via schema functions (add_function() on CollectionSchemaBuilder)
  • Highlighter support: SearchOptions::highlighter(), SearchResult::highlight_results
  • Namespace for multi-tenancy across insert/upsert/search/query/hybrid_search/iterators
  • Per-query highlight result slicing (correct behavior with nq > 1)

Schema Improvements

  • FieldSchema::set_nullable() -- required for schema evolution
  • FieldSchema::add_type_param() -- extra params like enable_analyzer, enable_match
  • CollectionSchemaBuilder::add_function() -- attach BM25/TextEmbedding/Rerank functions at creation
  • extra_type_params round-trip through describe/create (not dropped on describe)
  • Function output fields auto-marked with is_function_output

Index Types

Added: INVERTED, SPARSE_INVERTED_INDEX, SPARSE_WAND, RTREE, AutoIndex, DiskANN, GpuIvfFlat, GpuIvfPQ
Added metrics: COSINE, BM25

Test Plan

  • cargo build -- zero errors, deprecation warnings only
  • cargo test --lib -- 2/2 unit tests pass
  • cargo test --no-run -- all 21 test binaries compile
  • cargo run --example milvus26_features -- all 7 demos pass against live Milvus 2.6.13:
    1. COSINE search with HNSW + INVERTED scalar index
    2. Truncate collection
    3. Batch describe collections
    4. Partial upsert
    5. Int8 vector insert
    6. Timestamptz field insert + query
    7. BM25 full-text search (analyzer + schema function + text_match query)

  Update milvus-proto submodule to v2.6.13 and add comprehensive support
  for Milvus 2.6 features including new data types, RPCs, and search
  capabilities while maintaining backward compatibility with Milvus 2.5.

  Key changes:
  - Proto: update submodule to v2.6.13, handle oneof search_input breaking change
  - Data types: Geometry (WKB/WKT), Text, Timestamptz, Int8Vector, Float16Vector,
    BFloat16Vector, SparseFloatVector, ArrayOfVector, ArrayOfStruct
  - New RPCs: truncate_collection, batch_describe_collections, add_collection_field,
    add/alter/drop_collection_function, run_analyzer
  - Search: namespace, highlighter, COSINE metric, BM25 full-text search
  - Schema: add_function() builder, add_type_param(), nullable field, schema evolution
  - Index: INVERTED, SPARSE_INVERTED_INDEX, SPARSE_WAND, RTREE, DiskANN, AutoIndex
  - Mutate: UpsertOptions with partial_update, namespace support across all operations
  - Testing: docker-compose updated to v2.6.13, integration tests and examples

Signed-off-by: Wei Zang <[email protected]>
Copilot AI review requested due to automatic review settings April 1, 2026 10:04
@sre-ci-robot
Copy link
Copy Markdown
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: richzw
To complete the pull request process, please assign yah01 after the PR has been reviewed.
You can assign the PR to them by writing /assign @yah01 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mergify
Copy link
Copy Markdown

mergify bot commented Apr 1, 2026

@richzw Please associate the related issue to the body of your Pull Request. (eg. “issue: #187”)

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Milvus 2.6 API support to the Rust SDK by updating proto bindings and extending client/schema/value/query layers for new 2.6 types and RPCs.

Changes:

  • Regenerates/extends proto types and wires new Milvus 2.6 fields (namespace, highlighter, schema version, new datatypes).
  • Adds SDK support for new datatypes (e.g., Int8/Float16/BFloat16 vectors, Geometry, Timestamptz, SparseFloat, struct/vector arrays) and related serialization/deserialization.
  • Introduces new client APIs and options for Milvus 2.6 features (truncate, batch describe, schema evolution/function management, namespace/highlighter) plus new tests/examples and an updated docker-compose for Milvus 2.6.13.

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/milvus26.rs Adds Milvus 2.6-focused integration tests (new types/RPCs/search/index types).
src/value.rs Extends Value/ValueVec to represent new 2.6 scalar/vector/aggregate types and conversions.
src/schema.rs Adds schema support for nullable fields, extra type params, schema functions, and new field constructors.
src/query.rs Adds namespace/highlighter options and updates search request encoding (search_input oneof) + highlight slicing.
src/proto/milvus.proto.schema.rs Updates generated schema protos (schema version, highlight results, new DataType variants).
src/proto/milvus.proto.msg.rs Updates generated message protos (namespace on insert, create-collection schema field, etc.).
src/proto/milvus.proto.common.rs Updates generated common protos (highlighter types, WAL enums, misc formatting).
src/mutate.rs Adds namespace to insert, introduces UpsertOptions + partial update + new upsert options plumbing.
src/iterator.rs Adds namespace plumbing to iterators and adapts search request encoding (search_input).
src/index/mod.rs Adds new index/metric enums (e.g., INVERTED, COSINE, BM25) and makes index param parsing more tolerant.
src/data.rs Updates column encoding/decoding for new datatypes and fixes row length computation for new vector formats.
src/collection.rs Adds new Milvus 2.6 RPC helpers (truncate, batch describe, schema evolution/function mgmt, analyzer) and carries highlight results in SearchResult.
examples/milvus26_features.rs Adds an end-to-end Milvus 2.6 feature demo example.
docker-compose.yml Updates local dev stack to Milvus 2.6.13 and adds required dependencies (etcd).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/schema.rs Outdated
Comment on lines 229 to 233
chunk_size: (dim
* match dtype {
DataType::BinaryVector => dim / 8,
_ => dim,
}) as _,
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chunk_size computed in From<schema::FieldSchema> is incorrect for vector fields: for FloatVector it becomes dim*dim and for BinaryVector it becomes dim*(dim/8). This will report the wrong per-row byte/element width to callers inspecting FieldSchema. Consider setting chunk_size to the actual row width (e.g., FloatVector = dim, BinaryVector = dim/8, Float16/BFloat16 = dim*2, Int8Vector = dim, scalars = 1).

Copilot uses AI. Check for mistakes.
Comment on lines +272 to +282
pub async fn upsert<S, O>(
&self,
collection_name: S,
fields_data: Vec<FieldColumn>,
options: Option<InsertOptions>,
options: O,
) -> Result<crate::proto::milvus::MutationResult>
where
S: Into<String>,
O: IntoUpsertOptions,
{
let options = options.unwrap_or_default();
let options = options.into_upsert_options();
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing Client::upsert to take a generic O: IntoUpsertOptions breaks existing callers that pass Option<InsertOptions> (e.g., Some(InsertOptions::new()...)) and can also make None/option variables harder to type-infer. To preserve backwards compatibility, consider adding an IntoUpsertOptions impl for Option<InsertOptions> (or a generic impl<T: IntoUpsertOptions> IntoUpsertOptions for Option<T>) or keeping an overload that accepts Option<InsertOptions>.

Copilot uses AI. Check for mistakes.
Comment on lines +214 to +220
let _new_field = FieldSchema::new_varchar("description", "added later", 256);
// The field must be nullable for schema evolution
// We set nullable via the proto conversion path
// Currently the FieldSchema struct does not expose nullable directly
// but the AddCollectionField RPC requires it.
// This test verifies the RPC call works; full nullable support is a follow-up.

Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test currently does not exercise schema evolution: it creates _new_field but never calls client.add_collection_field(...), and the comments claim FieldSchema can’t set nullable even though FieldSchema::set_nullable() now exists. Update the test to set the new field nullable and actually invoke add_collection_field, then assert it appears in describe_collection/batch_describe_collections results.

Suggested change
let _new_field = FieldSchema::new_varchar("description", "added later", 256);
// The field must be nullable for schema evolution
// We set nullable via the proto conversion path
// Currently the FieldSchema struct does not expose nullable directly
// but the AddCollectionField RPC requires it.
// This test verifies the RPC call works; full nullable support is a follow-up.
let new_field = FieldSchema::new_varchar("description", "added later", 256)
.set_nullable(true);
// Perform schema evolution by adding the new field
client
.add_collection_field(&collection_name, new_field.clone())
.await?;
// Verify the new field appears in describe_collection results
let described_schema = client.describe_collection(&collection_name).await?;
assert!(described_schema.get_field("description").is_some());
// Verify the new field appears in batch_describe_collections results
let described_schemas = client
.batch_describe_collections(vec![collection_name.clone()])
.await?;
assert_eq!(1, described_schemas.len());
assert!(described_schemas[0].get_field("description").is_some());

Copilot uses AI. Check for mistakes.
Comment on lines +3 to +11
/// Demonstrates new capabilities introduced in Milvus 2.6:
/// - COSINE metric with HNSW index
/// - Truncate collection
/// - Batch describe collections
/// - Schema evolution (add_collection_field)
/// - Partial upsert
/// - BM25 full-text search function
/// - Int8 vector field
/// - Timestamptz field
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module-level docs list “Schema evolution (add_collection_field)”, but this example doesn’t demonstrate calling add_collection_field/schema evolution anywhere. Either add a short schema-evolution demo or remove/update the bullet to avoid misleading readers.

Copilot uses AI. Check for mistakes.
Comment on lines 828 to 832
name: this.name,
description: this.description,
enable_dynamic_field: self.enable_dynamic_field,
functions: this.functions,
})
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In CollectionSchemaBuilder::build(), std::mem::replace(self, CollectionSchemaBuilder::new("".into(), "")) uses a String temporary where new() expects &str, which will not compile. Use string literals (e.g., CollectionSchemaBuilder::new("", "")) or otherwise pass &str references when resetting the builder.

Copilot uses AI. Check for mistakes.
src/schema.rs Outdated
@@ -604,6 +828,7 @@ impl CollectionSchemaBuilder {
name: this.name,
description: this.description,
enable_dynamic_field: self.enable_dynamic_field,
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CollectionSchemaBuilder::build() uses enable_dynamic_field: self.enable_dynamic_field after self has been reset via std::mem::replace, so the built CollectionSchema will always have enable_dynamic_field = false even if enable_dynamic_field() was called. Use the moved-out builder state (this.enable_dynamic_field) instead.

Suggested change
enable_dynamic_field: self.enable_dynamic_field,
enable_dynamic_field: this.enable_dynamic_field,

Copilot uses AI. Check for mistakes.
@richzw
Copy link
Copy Markdown
Author

richzw commented Apr 1, 2026

Issue #113

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +297 to +307
pub async fn upsert<S, O>(
&self,
collection_name: S,
fields_data: Vec<FieldColumn>,
options: Option<InsertOptions>,
options: O,
) -> Result<crate::proto::milvus::MutationResult>
where
S: Into<String>,
O: IntoUpsertOptions,
{
let options = options.unwrap_or_default();
let options = options.into_upsert_options();
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Client::upsert now takes a generic options: O where O: IntoUpsertOptions. This makes common call sites like upsert(..., None) fail to compile due to type inference ambiguity (there are impls for both Option<InsertOptions> and Option<UpsertOptions>). Consider reverting the parameter to a concrete Option<UpsertOptions> (and keep From<InsertOptions> for UpsertOptions), or provide an additional overload/helper that preserves the previous Option<InsertOptions> signature while keeping None unambiguous.

Copilot uses AI. Check for mistakes.
Comment on lines +853 to +860
pub async fn alter_collection_function<S>(
&self,
collection_name: S,
function_name: S,
function: proto::schema::FunctionSchema,
) -> Result<()>
where
S: Into<String>,
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alter_collection_function uses a single generic type parameter S for both collection_name and function_name, which forces callers to pass both arguments as the same concrete type (e.g., both &str), reducing ergonomics and causing avoidable type mismatches. Use separate generics (e.g., C: Into<String>, F: Into<String>) for these two parameters.

Suggested change
pub async fn alter_collection_function<S>(
&self,
collection_name: S,
function_name: S,
function: proto::schema::FunctionSchema,
) -> Result<()>
where
S: Into<String>,
pub async fn alter_collection_function<C, F>(
&self,
collection_name: C,
function_name: F,
function: proto::schema::FunctionSchema,
) -> Result<()>
where
C: Into<String>,
F: Into<String>,

Copilot uses AI. Check for mistakes.
Comment on lines +881 to +888
pub async fn drop_collection_function<S>(
&self,
collection_name: S,
function_name: S,
) -> Result<()>
where
S: Into<String>,
{
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop_collection_function uses a single generic type parameter S for both collection_name and function_name, forcing both arguments to have the same concrete type. This is unnecessarily restrictive for callers (e.g., mixing String and &str). Use separate generics for the two parameters.

Copilot uses AI. Check for mistakes.
Comment on lines 165 to +183
@@ -168,24 +172,55 @@ impl FieldSchema {
chunk_size: 0,
dim: 0,
max_length: 0,
nullable: false,
extra_type_params: HashMap::new(),
}
}

#[deprecated(note = "use FieldSchema::empty() instead")]
pub fn const_default() -> Self {
Self::empty()
}
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FieldSchema::const_default used to be a const fn but is now a regular function (and deprecated). This is a breaking change for downstream users that relied on calling it in const contexts. If API compatibility is a goal, consider keeping a const fn constructor (even if it can only initialize a minimal/default state) or introducing a new const-safe constructor and keeping the old one as const fn for a deprecation cycle.

Copilot uses AI. Check for mistakes.
src/data.rs Outdated
Comment on lines +178 to +182
// Complex aggregate types: these represent the entire field data and are
// copied as a whole rather than pushed element by element.
(ValueVec::SparseFloat(dst), Value::SparseFloat(src)) => *dst = src.into_owned(),
(ValueVec::StructArray(dst), Value::StructArray(src)) => *dst = src.into_owned(),
(ValueVec::VectorArray(dst), Value::VectorArray(src)) => *dst = src.into_owned(),
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For complex column types (SparseFloat, StructArray, VectorArray), FieldColumn::push overwrites the destination value (*dst = src.into_owned()) rather than appending a single row. This breaks search/query result slicing, because result construction pushes one value per hit; overwriting will leave the full original column (or the last assignment) instead of a per-hit subset and can make column lengths inconsistent with topk. Consider implementing per-row extraction/append semantics for these types (or updating result assembly to handle these types without per-row push).

Suggested change
// Complex aggregate types: these represent the entire field data and are
// copied as a whole rather than pushed element by element.
(ValueVec::SparseFloat(dst), Value::SparseFloat(src)) => *dst = src.into_owned(),
(ValueVec::StructArray(dst), Value::StructArray(src)) => *dst = src.into_owned(),
(ValueVec::VectorArray(dst), Value::VectorArray(src)) => *dst = src.into_owned(),
// Complex aggregate types: these represent the entire field data rather than a
// single row. Pushing them element-by-element is not supported, because it would
// overwrite the stored aggregate and break result slicing.
(ValueVec::SparseFloat(_), Value::SparseFloat(_))
| (ValueVec::StructArray(_), Value::StructArray(_))
| (ValueVec::VectorArray(_), Value::VectorArray(_)) => {
panic!(
"per-row push is not supported for SparseFloat/StructArray/VectorArray; \
construct these fields as full aggregates instead"
)
}

Copilot uses AI. Check for mistakes.
@xiaofan-luan
Copy link
Copy Markdown
Contributor

Thanks for the contribution.

I'll start to review the commits once I got time

@mergify
Copy link
Copy Markdown

mergify bot commented Apr 2, 2026

@richzw Thanks for your contribution. Please submit with DCO, see the contributing guide https://github.com/milvus-io/milvus/blob/master/CONTRIBUTING.md#developer-certificate-of-origin-dco.

@mergify mergify bot added needs-dco and removed dco-passed labels Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants