The Spryker Project Semantic Search Tool enables intelligent, context-aware search across your Spryker project using Large Language Model (LLM)-based embeddings and Chroma DB for efficient indexing and retrieval.
Spryker applications consist of a vast and growing number of module APIs, including Facades, Clients, and various Plugin interfaces used as extension points. Due to this complexity, identifying the right module or method for a specific task can be time-consuming and error-prone, especially for new developers or when working across multiple feature sets.
-
Indexes key module APIs and configurations, including:
FacadeinterfacesClientinterfacesServiceinterfacesPlugininterfaces and plugin classesConfigclasses of modules
-
Semantic understanding of code:
- Uses class and method names (module APIs)
- Incorporates method doc blocks for deeper intent and specification analysis
- Does not use full file code
-
Security Measures:
- Prevents exposure of sensitive project data by operating in local mode.
- Extracts only semantic information from module APIs, without analysing underlying code implementations.
-
Efficient navigation and output:
- Presents results in chunked format for readability
- Each result includes a link to the source file and line number for quick access
- Various AI modes for better usability
- Embeddings Generation: Creates vector embeddings for indexed elements using a Large Language Model (LLM).
- Indexing: Stores embeddings in Chroma DB for fast semantic search.
- User Query: Accepts natural language input.
- Matching & Ranking: Finds semantically relevant matches across the project and summarises with AI.
- Result Presentation: Displays readable chunks with direct links to source files.
- Understand large or unfamiliar Spryker codebases faster.
- Discover relevant module APIs and plugins by use case, not just name.
- Accelerate onboarding for new developers and cross-team collaboration.
💡 Tip: Works best when combined with up-to-date PHPDoc across modules for optimal semantic accuracy.
The tool supports advanced Machine Learning-based filtering detection.
To use this feature, simply mention it in your query — no additional setup is required beyond providing a suitable Facade, Configuration, Client, Plugin etc.
Additionally, the tool is capable of understanding extended filter definitions configured in \Spryker\Config::getDataTypeTrainData,
which you can override to customize detection behavior for your specific use case.
Ensure you have the following installed on your machine:
- Docker
- Docker Compose
- Composer
- Bash shell
- Clone the repository into the project root directory:
git clone git@github.com:spryker-community/project-semantic-search.git &&
echo "/project-semantic-search/" >> .git/info/exclude &&
cd project-semantic-search &&
cp php/.env.example php/.env-
Configure environment in
project-semantic-search/php/.env -
Run the installer script:
bash installThis will:
* Start Docker containers
* Install dependencies via Composer
* Pull ollama embedding model nomic-embed-text
* Index the project (takes 5–20 minutes)
* Launch the interactive CLI tool
├── spryker-project/
│ ├── src/
│ │ └── Pyz/
│ ├── ...
│ └── project-semantic-search/
│ ├── php/
│ │ └── bin/
│ │ └── sprykeye
│ ├── docker-compose.yml
│ ├── php.ini
│ ├── install
│ ├── run
│ └── readme.md
After setup, you can launch the search tool anytime by running:
docker exec -it php bash -c "bin/sprykeye project:search"or
bash runMCP server makes the tool compatible with various AI agents to extend context with Spryker Project context
- Controller classes indexing (public methods);
- Dependency provider's extension points;
- Filtering by Module name;
- AI chat mode - discuss with AI results or better query;
- AI: search with remote AI agent (Gemini, OpenAI, etc.);
MIT or your preferred license.





