Skip to content

Commit f7cddbf

Browse files
committed
Added test for testing the new OpenAI api
Signed-off-by: Chaitany patel <[email protected]>
1 parent 3731f85 commit f7cddbf

18 files changed

Lines changed: 1538 additions & 417 deletions

File tree

Lines changed: 344 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,344 @@
1+
---
2+
title: "Making Feast Speak OpenAI: Vector Search Without the Glue Code"
3+
description: "Feast now exposes an OpenAI-compatible vector store search endpoint. Send a plain text query, get results back in the standard OpenAI format. No client-side embeddings required."
4+
date: 2026-04-28
5+
authors: ["Chaitanya Patel", "Nikhil Kathole"]
6+
---
7+
8+
<div class="hero-image">
9+
<img src="/images/blog/feast-openai-compat-flow.png" alt="Sequence diagram showing a client sending a text query to Feast, which embeds and searches server-side" loading="lazy">
10+
</div>
11+
12+
If you've tried to connect an AI agent to Feast's vector search, you've probably hit this wall: the agent needs to search your feature store, but Feast expects a raw embedding vector. The agent doesn't have one. It has a question in English.
13+
14+
Until now, the workaround was ugly. You'd call an embedding provider (OpenAI, Ollama, whatever) to turn the text into a float array, then pass that array to Feast's `retrieve-online-documents` endpoint. Every client had to know both APIs, carry both sets of credentials, and run glue code whose only job was bridging the gap.
15+
16+
Feast now has a new endpoint: `POST /v1/vector_stores/{feature_view}/search`. It follows the [OpenAI Vector Store Search API](https://platform.openai.com/docs/api-reference/vector-stores-search) format. You send text, Feast handles the embedding internally, and you get results back in the same JSON shape that OpenAI returns. No float arrays, no extra SDK.
17+
18+
## The two-API tax
19+
20+
Here's what searching Feast looked like before:
21+
22+
```python
23+
import openai
24+
import requests
25+
26+
# Step 1: Call the embedding provider yourself
27+
embed_response = openai.embeddings.create(
28+
model="text-embedding-3-small",
29+
input="wireless noise-cancelling headphones"
30+
)
31+
query_vector = embed_response.data[0].embedding # 1536 floats
32+
33+
# Step 2: Call Feast's proprietary API with the raw vector
34+
result = requests.post("http://feast-server:6566/retrieve-online-documents", json={
35+
"features": [
36+
"product_catalog:vector",
37+
"product_catalog:name",
38+
"product_catalog:description",
39+
"product_catalog:price",
40+
],
41+
"query": query_vector,
42+
"top_k": 5,
43+
"api_version": 2,
44+
})
45+
```
46+
47+
This works fine. But it has costs that add up:
48+
49+
- Every service calling Feast needs an embedding SDK, an API key, and logic to handle the embedding call. Five microservices means five places managing embedding credentials.
50+
- LLM agents can't use it. They discover tools through MCP or function calling, and they know how to call OpenAI-shaped endpoints. They don't know how to compute embeddings and pass raw float arrays to a custom API.
51+
- The embedding model becomes a client-side decision. Different clients might use different models or versions, which means inconsistent search results against the same vector store.
52+
- Feast's filter syntax is its own format. Not something an agent framework knows out of the box.
53+
54+
## One endpoint, standard format
55+
56+
With the new endpoint, that same search looks like this:
57+
58+
```python
59+
import requests
60+
61+
result = requests.post(
62+
"http://feast-server:6566/v1/vector_stores/product_catalog/search",
63+
json={
64+
"query": "wireless noise-cancelling headphones",
65+
"max_num_results": 5,
66+
},
67+
)
68+
```
69+
70+
No embedding SDK. No raw vectors. The request and response match OpenAI's format, so anything that already talks to OpenAI can talk to Feast.
71+
72+
### What happens under the hood
73+
74+
When Feast receives this request, it:
75+
76+
1. Embeds the query server-side using the model configured in `feature_store.yaml` (via [LiteLLM](https://docs.litellm.ai/), which supports OpenAI, Ollama, Azure, Cohere, HuggingFace, and 100+ other providers).
77+
2. Runs vector similarity search against the feature view's online store (Postgres/pgvector, Milvus, Elasticsearch, SQLite, or whatever backend you've configured).
78+
3. Applies filters if you provided any, using string equality, numeric comparisons, or compound AND/OR conditions in the OpenAI filter format.
79+
4. Returns results in OpenAI's `vector_store.search_results.page` format.
80+
81+
Because the embedding model is a server-side configuration, every client gets consistent results. No more worrying about whether service A is using `text-embedding-3-small` while service B accidentally stuck with `ada-002`.
82+
83+
## Setting it up
84+
85+
### Step 1: Configure the embedding model
86+
87+
Add an `embedding_model` section to your `feature_store.yaml`:
88+
89+
```yaml
90+
project: my_project
91+
registry: data/registry.db
92+
provider: local
93+
94+
online_store:
95+
type: postgres
96+
host: localhost
97+
port: 5432
98+
database: feast
99+
user: feast
100+
password: ${DB_PASSWORD}
101+
pgvector_enabled: true
102+
vector_len: 384
103+
enable_openai_compatible_store: true
104+
105+
embedding_model:
106+
model: text-embedding-3-small
107+
api_key: ${OPENAI_API_KEY}
108+
```
109+
110+
Feast uses [LiteLLM](https://docs.litellm.ai/) under the hood, so any provider works:
111+
112+
```yaml
113+
# OpenAI
114+
embedding_model:
115+
model: text-embedding-3-small
116+
api_key: sk-...
117+
118+
# Ollama (local, no API key needed)
119+
embedding_model:
120+
model: ollama/nomic-embed-text
121+
api_base: http://localhost:11434
122+
123+
# Azure OpenAI
124+
embedding_model:
125+
model: azure/text-embedding-ada-002
126+
api_key: ${AZURE_API_KEY}
127+
api_base: https://your-resource.openai.azure.com
128+
api_version: "2024-02-01"
129+
130+
# Any OpenAI-compatible endpoint (vLLM, LiteLLM proxy, etc.)
131+
embedding_model:
132+
model: text-embedding-3-small
133+
api_key: ${API_KEY}
134+
api_base: https://your-endpoint/v1
135+
```
136+
137+
### Step 2: Define a feature view with vector search
138+
139+
```python
140+
from feast import Entity, FeatureView, Field
141+
from feast.types import Array, Float32, String, Float64, Int64
142+
from datetime import timedelta
143+
144+
product = Entity(name="product_id", join_keys=["product_id"])
145+
146+
product_catalog = FeatureView(
147+
name="product_catalog",
148+
entities=[product],
149+
schema=[
150+
Field(
151+
name="vector",
152+
dtype=Array(Float32),
153+
vector_index=True,
154+
vector_search_metric="COSINE",
155+
),
156+
Field(name="name", dtype=String),
157+
Field(name="description", dtype=String),
158+
Field(name="category", dtype=String),
159+
Field(name="price", dtype=Float64),
160+
Field(name="rating", dtype=Float64),
161+
],
162+
source=product_source,
163+
ttl=timedelta(days=7),
164+
)
165+
```
166+
167+
### Step 3: Apply, load data, and serve
168+
169+
```bash
170+
feast apply
171+
feast serve
172+
```
173+
174+
### Step 4: Search
175+
176+
```bash
177+
curl -X POST http://localhost:6566/v1/vector_stores/product_catalog/search \
178+
-H "Content-Type: application/json" \
179+
-d '{
180+
"query": "wireless noise-cancelling headphones",
181+
"max_num_results": 3
182+
}'
183+
```
184+
185+
Response:
186+
187+
```json
188+
{
189+
"object": "vector_store.search_results.page",
190+
"search_query": ["wireless noise-cancelling headphones"],
191+
"data": [
192+
{
193+
"file_id": "product_catalog_42",
194+
"filename": "product_catalog",
195+
"score": 0.92,
196+
"attributes": {
197+
"name": "Sony WH-1000XM5",
198+
"description": "Premium wireless noise-cancelling headphones",
199+
"category": "Electronics",
200+
"price": 349.99,
201+
"rating": 4.8
202+
},
203+
"content": [
204+
{"type": "text", "text": "Sony WH-1000XM5"},
205+
{"type": "text", "text": "Premium wireless noise-cancelling headphones"},
206+
{"type": "text", "text": "Electronics"}
207+
]
208+
}
209+
],
210+
"has_more": false,
211+
"next_page": null
212+
}
213+
```
214+
215+
The response follows OpenAI's `vector_store.search_results.page` schema. Any client that already parses OpenAI search results can parse this without changes.
216+
217+
## Filtering
218+
219+
The endpoint supports OpenAI-style filters for narrowing results beyond vector similarity. Filters work on the metadata stored alongside your vectors.
220+
221+
### String filters
222+
223+
```json
224+
{
225+
"query": "running shoes",
226+
"max_num_results": 5,
227+
"filters": {
228+
"type": "eq",
229+
"key": "category",
230+
"value": "Footwear"
231+
}
232+
}
233+
```
234+
235+
### Numeric filters
236+
237+
```json
238+
{
239+
"query": "budget laptop",
240+
"max_num_results": 5,
241+
"filters": {
242+
"type": "lt",
243+
"key": "price",
244+
"value": 500.0
245+
}
246+
}
247+
```
248+
249+
### Compound filters (AND / OR)
250+
251+
```json
252+
{
253+
"query": "wireless earbuds",
254+
"max_num_results": 5,
255+
"filters": {
256+
"type": "and",
257+
"filters": [
258+
{"type": "eq", "key": "category", "value": "Electronics"},
259+
{"type": "gte", "key": "rating", "value": 4.5},
260+
{"type": "lt", "key": "price", "value": 200.0}
261+
]
262+
}
263+
}
264+
```
265+
266+
Comparison operators: `eq`, `ne`, `gt`, `gte`, `lt`, `lte`, `in`, `nin`. Compound operators: `and`, `or`. These nest to arbitrary depth.
267+
268+
Numeric and boolean filters require the `enable_openai_compatible_store` flag in your online store config, plus a `feast apply` to add the `value_num` column to existing tables. String filters work on all existing schemas without migration.
269+
270+
## What this means for AI agents
271+
272+
We built this with agents in mind. When Feast added [MCP support](./feast-agents-mcp) earlier this year, agents could discover and call Feast tools dynamically. But vector search still had this gap where the agent needed to produce a float array. LLMs can't do that.
273+
274+
Now the search tool is just text in, structured results out. An agent calls it the same way it calls any other OpenAI-compatible service. The feature server currently exposes these tools:
275+
276+
| Capability | Endpoint | What it does |
277+
|---|---|---|
278+
| Structured feature lookup | `get-online-features` | Get customer profiles, account data, etc. |
279+
| Vector search (proprietary) | `retrieve-online-documents` | Search with a pre-computed embedding vector |
280+
| Vector search (OpenAI format) | `/v1/vector_stores/{id}/search` | Search with plain text, embedding handled server-side |
281+
| Write features / memory | `write-to-online-store` | Persist agent state, update features |
282+
283+
That last row is what this post is about. Before it existed, agents could read structured features and write state back, but they couldn't search vectors without help from glue code.
284+
285+
## What this is, and what it isn't
286+
287+
This makes Feast's vector search speak OpenAI's protocol. It doesn't turn Feast into a general purpose OpenAI-compatible vector database.
288+
289+
| Works today | Not yet |
290+
|---|---|
291+
| `POST /v1/vector_stores/{id}/search` | Creating vector stores via the API |
292+
| Plain text queries with server-side embedding | Client-provided embedding vectors on this endpoint |
293+
| OpenAI-format filters (string, numeric, compound) | `ranking_options` and `rewrite_query` (accepted but ignored) |
294+
| All Feast online store backends | Standalone `/v1/embeddings` endpoint |
295+
296+
Feature views are still defined in Python and managed through `feast apply`. Data is still ingested through Feast's existing write paths. The OpenAI-compatible layer is a read API that gives standard access to what's already in your feature store.
297+
298+
## Deploying on Kubernetes
299+
300+
The `deploy-openai-compat/` directory in the Feast repository has Kubernetes manifests that deploy the feature server with an Ollama sidecar for local embedding:
301+
302+
```yaml
303+
# configmap.yaml (embedding model section)
304+
embedding_model:
305+
model: ollama/nomic-embed-text
306+
api_base: http://feast-ollama:11434
307+
```
308+
309+
```yaml
310+
# deployment.yaml
311+
containers:
312+
- name: feast-server
313+
command: ["feast", "serve", "-h", "0.0.0.0", "-p", "6566"]
314+
ports:
315+
- containerPort: 6566
316+
```
317+
318+
With this setup, embedding happens in-cluster. Nothing leaves your network.
319+
320+
## Try it yourself
321+
322+
```bash
323+
# Install Feast with LiteLLM support
324+
pip install feast litellm
325+
326+
# If using Ollama for local embeddings (no API key needed)
327+
ollama pull nomic-embed-text
328+
```
329+
330+
Configure your `feature_store.yaml` with an `embedding_model` section, define a feature view with vector search enabled, run `feast apply`, load your data, start the server with `feast serve`, and search:
331+
332+
```bash
333+
curl -s http://localhost:6566/v1/vector_stores/your_feature_view/search \
334+
-H "Content-Type: application/json" \
335+
-d '{"query": "your search query", "max_num_results": 5}' | python -m json.tool
336+
```
337+
338+
## What's next
339+
340+
Next on the list: wiring up `ranking_options` and `rewrite_query` so they actually do something (right now they're accepted but ignored). We also want a standalone `/v1/embeddings` endpoint for clients that just need embeddings, and eventually the ability to create feature views through the OpenAI vector store API instead of requiring Python + `feast apply`.
341+
342+
## Join the conversation
343+
344+
If you're using this or have thoughts on what the OpenAI-compatible layer should support next, come find us on [Slack](https://slack.feast.dev/) or [GitHub](https://github.com/feast-dev/feast).
3.23 MB
Loading

sdk/python/feast/feature_server.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@
3838
)
3939
from fastapi.concurrency import run_in_threadpool
4040
from fastapi.logger import logger
41-
from fastapi.responses import JSONResponse, ORJSONResponse
41+
from fastapi.responses import JSONResponse
4242
from fastapi.staticfiles import StaticFiles
4343
from google.protobuf.json_format import MessageToDict
4444
from pydantic import BaseModel, field_validator
@@ -492,7 +492,7 @@ async def retrieve_online_documents(
492492
async def openai_vector_store_search(
493493
vector_store_id: str,
494494
request: OpenAISearchRequest,
495-
) -> ORJSONResponse:
495+
) -> JSONResponse:
496496
with feast_metrics.track_request_latency(
497497
"/v1/vector_stores/{vector_store_id}/search"
498498
):
@@ -519,7 +519,7 @@ async def openai_vector_store_search(
519519
)
520520
)
521521
except FeatureViewNotFoundException:
522-
return ORJSONResponse(
522+
return JSONResponse(
523523
status_code=404,
524524
content={
525525
"error": {
@@ -528,7 +528,7 @@ async def openai_vector_store_search(
528528
}
529529
},
530530
)
531-
return ORJSONResponse(content=result)
531+
return JSONResponse(content=result)
532532

533533
@app.post("/push", dependencies=[Depends(inject_user_details)])
534534
async def push(request: PushFeaturesRequest) -> Response:

0 commit comments

Comments
 (0)