Natural Language Search with the Search Orchestrator

The KMDS Search Orchestrator lets you ask free-form questions about a project knowledge base and receive a plain-English answer. Under the hood it uses an LLM as a router to classify your intent, execute the best matching observation-query template, and synthesise the raw results into a readable response. When no structured template matches, it falls back automatically to semantic vector search.

Overview 

The orchestrator follows five logical steps every time you call it:

Step	Label	What happens
1	Context Injection	A tool description — a catalogue of all available search templates and their purpose — is injected into the LLM prompt so the model always knows its options.
2	Intent Classification & Entity Extraction	The LLM analyses your query and returns a Pydantic-validated JSON payload identifying which template to invoke and any filter parameters (observation type, keyword, sequence range) it extracted.
3	Template Execution	The matching KMDS API function is called against the loaded knowledge base with the extracted filters applied.
4	LLM Synthesis	The LLM converts the raw observation records into a concise natural language answer.
5	Semantic Fallback (catch-all)	If no template matches, or if a template returns zero results, the ChromaDB semantic vector index is queried instead.

Note

Why not LlamaIndex (for now)? KMDS currently uses a custom orchestrator because it provides strict control over template routing, filter validation, and fallback behaviour with a smaller runtime dependency surface. This keeps behaviour predictable for ontology-specific queries and simplifies maintenance. We may revisit LlamaIndex later if we need broader multi-retriever orchestration, connector breadth, or rapid RAG pipeline experimentation.

Available Search Templates 

Template	Best for
`exploratory_search`	Data quality, missing values, outliers, distributions, initial data understanding
`data_representation_search`	Feature engineering, transformations, encodings, scaling, data preparation decisions
`modelling_choice_search`	Algorithm selection rationale, modelling assumptions, hyperparameter decisions
`model_selection_search`	Model comparison, evaluation metrics, benchmarking, final model recommendation
`all_observations_search`	Broad cross-phase questions
`semantic_search` (fallback)	Nuanced natural-language similarity queries

Step-by-Step Usage 

Step A – Prerequisites

Install the package with all dependencies:

pip install kmds          # core (pydantic, chromadb, sentence-transformers included)

For LLM-powered routing and synthesis you need a Google GenAI API key (gemini-1.5-flash is used by default):

export GOOGLE_API_KEY="your-google-api-key"

Note

If you prefer a different LLM (OpenAI, Anthropic, local Ollama, etc.), pass a llm_fn callable — see Using a Custom LLM Backend below.

Step B – Build or Load a Knowledge Base

You need a .xml knowledge-base file produced by a KMDS workflow. A test knowledge base ships with the package:

from importlib.resources import files
kb_path = str(files("kmds.examples").joinpath("example_analytics_kb_app_workflow.xml"))

Or use your own project file:

kb_path = "path/to/my_project_kb.xml"

Step C – Initialise the Orchestrator

from kmds.search import SearchOrchestrator

orc = SearchOrchestrator(
    kb_path=kb_path,
    persist_dir="./my_index",   # omit for in-memory (rebuilt each session)
)

What happens here:

The knowledge base is loaded into memory once.
The semantic vector index (ChromaDB) is built from all observation findings and persisted to ./my_index for fast reload on subsequent runs.

Step D – Ask a Question

result = orc.ask("What data quality issues were found during exploration?")

print(result.answer)          # synthesised natural language answer
print(result.intent_class)    # which template was used
print(result.route_explanation)
print(result.results)         # raw observation records

The returned OrchestratorResult always has these attributes:

Attribute	Description
`answer`	Synthesised natural language answer (string).
`intent_class`	The template that was ultimately executed.
`route_explanation`	The LLM’s one-sentence reason for choosing that template.
`results`	List of raw observation dicts (`obs_type`, `finding`, `finding_seq`, optional `intent`, optional `distance`).

Step E – Inspect Raw Results

for r in result.results:
    print(r["obs_type"], "|", r["finding"])

Step F – Use the CLI

The orchestrator is also available as a command-line tool:

# Basic usage
kmds-ask --project-file my_project.xml \
          --query "What feature engineering steps were taken?"

# Persist the index for fast repeated queries
kmds-ask --project-file my_project.xml \
          --query "Which model was selected and why?" \
          --persist-dir ./my_idx

# Show routing decision and raw records
kmds-ask --project-file my_project.xml \
          --query "transformation decisions" \
          --verbose

# Machine-readable JSON output
kmds-ask --project-file my_project.xml \
          --query "model evaluation metrics" \
          --output-format json

# Use a different LLM model
kmds-ask --project-file my_project.xml \
          --query "data quality" \
          --model gemini-2.0-flash

Using a Custom LLM Backend 

Pass any callable that accepts a str prompt and returns a str response:

def my_llm(prompt: str) -> str:
    # Example using OpenAI
    import openai
    client = openai.OpenAI(api_key="sk-...")
    resp = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
    )
    return resp.choices[0].message.content

orc = SearchOrchestrator(kb_path=kb_path, llm_fn=my_llm)
result = orc.ask("What modelling assumptions were made?")
print(result.answer)

Fallback Behaviour 

The orchestrator is designed to always return something:

If the LLM is unreachable → router falls back to semantic vector search.
If the JSON response cannot be parsed → falls back to semantic search.
If the chosen template returns zero observations → falls back to semantic search.
If the synthesis LLM call fails → raw records are formatted as plain text.

Worked Example 

import os
from importlib.resources import files
from kmds.search import SearchOrchestrator

os.environ["GOOGLE_API_KEY"] = "your-key-here"

kb_path = str(files("kmds.examples").joinpath("example_ml_kb_exp_workflow.xml"))

orc = SearchOrchestrator(kb_path=kb_path, persist_dir="./ml_idx")

questions = [
    "What data quality issues were encountered?",
    "Which features were engineered and why?",
    "What modelling assumptions were made?",
    "Which model was finally selected and what were its evaluation metrics?",
    "Summarise the key findings across all phases.",
]

for q in questions:
    result = orc.ask(q)
    print(f"Q: {q}")
    print(f"   Template used : {result.intent_class}")
    print(f"   Answer        : {result.answer[:200]}")
    print()

Pydantic Schema Reference 

The router output is validated against:

class kmds.search.search_orchestrator.SearchFilters(*, obs_type_filter: str | None = None, finding_seq_min: int | None = None, finding_seq_max: int | None = None, keyword: str | None = None)

Optional parameters the LLM may extract from the user query.

All fields are optional. Any field left as None is ignored during the post-retrieval filtering step.

finding_seq_max: int | None: Include only observations with finding_seq <= this value.

finding_seq_min: int | None: Include only observations with finding_seq >= this value.

keyword: str | None: Additional keyword to filter the finding text (case-insensitive substring).

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

obs_type_filter: str | None: Substring to match against the observation-type label (case-insensitive).

class kmds.search.search_orchestrator.OrchestratorRoute(*, intent_class: Literal['exploratory_search', 'data_representation_search', 'modelling_choice_search', 'model_selection_search', 'all_observations_search', 'semantic_search'], filters: SearchFilters = SearchFilters(obs_type_filter=None, finding_seq_min=None, finding_seq_max=None, keyword=None), explanation: str = '')

Structured output produced by the LLM router.

The LLM is instructed to return a JSON object that conforms to this schema. Pydantic validates and coerces the payload before execution.

explanation: str: Brief explanation of why this route was chosen (surfaced to the caller).

filters: SearchFilters: Optional query parameters extracted from the query text.

intent_class: IntentClass: Which search template best matches the user query.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

API Reference 

class kmds.search.search_orchestrator.SearchOrchestrator(kb_path: str, *, persist_dir: str | None = None, llm_fn: Callable[[str], str] | None = None, model: str = 'gemini-1.5-flash', embedding_model: str = 'all-MiniLM-L6-v2', n_results: int = 5)

LLM-driven search orchestrator for a KMDS knowledge base.

The orchestrator routes natural language queries through an LLM to identify the best search template, executes it against the loaded knowledge base, and synthesises the results into a natural language answer.

Parameters:

kb_path – Path to the KMDS .xml knowledge-base file.
persist_dir – Directory to persist the semantic vector index. None keeps the index in memory (rebuilt on each interpreter session).
llm_fn – Optional callable (prompt: str) -> str for your own LLM backend. If None, the orchestrator uses Google GenAI (requires GOOGLE_API_KEY environment variable).
model – Google GenAI model name (ignored when llm_fn is supplied).
embedding_model – Sentence-transformers model used for the semantic fallback index.
n_results – Default maximum number of observation records returned per query.

Examples

Using Google GenAI (default):

import os
os.environ["GOOGLE_API_KEY"] = "your-key"

from kmds.search import SearchOrchestrator

orc = SearchOrchestrator("my_project.xml", persist_dir="./idx")
result = orc.ask("What data quality issues were found?")
print(result.answer)

Using a custom LLM backend:

def my_llm(prompt: str) -> str:
    # call any LLM here
    return my_model.generate(prompt)

orc = SearchOrchestrator("my_project.xml", llm_fn=my_llm)
result = orc.ask("Which model was selected and why?")
print(result.answer)
print(result.results)      # raw records

__init__(kb_path: str, *, persist_dir: str | None = None, llm_fn: Callable[[str], str] | None = None, model: str = 'gemini-1.5-flash', embedding_model: str = 'all-MiniLM-L6-v2', n_results: int = 5) → None

ask(query: str) → OrchestratorResult

Route a natural language query and return a synthesised answer.

This is the single public entry point for the orchestrator. It performs all five steps internally (routing, execution, synthesis, fallback) and returns an OrchestratorResult.

Parameters:: query – Free-form natural language question about the knowledge base.
Return type:: OrchestratorResult

class kmds.search.search_orchestrator.OrchestratorResult(answer: str, intent_class: str, route_explanation: str, results: list[dict[str, Any]])

Result returned by SearchOrchestrator.ask().

answer: Synthesised natural language answer.

intent_class: The search template that was ultimately executed.

route_explanation: The LLM’s own explanation for its routing choice.

results: Raw observation record dicts that informed the answer.