Natural Language Observation Ingestion

KMDS can convert a free-form natural language statement into a structured observation that matches the existing ontology-backed schema.

This feature supports two primary interaction modes:

Summary mode: classify the input text, extract entities, and return a structured summary without modifying a knowledge base.
Log mode: validate the input text, create the matching KMDS observation, and save it into a KMDS knowledge base.

The implementation uses open-source tooling only. Classification and entity extraction are handled by the KMDS parser with spaCy tokenization support when available.

What The Feature Produces

Given an input such as:

The model accuracy dropped by 5% after pruning on 2026-04-20.

KMDS can produce:

A classified observation family and KMDS observation type
Extracted entities such as metric, value, timestamp, and affected component
A Python snippet that logs the observation into the ontology
A JSON-LD payload using the existing KMDS RDF classes and properties
A validated logged observation saved into a KMDS knowledge-base file

Supported Observation Families

The mapper stays within the current KMDS schema and classifies into these families:

Exploratory observations
Data representation observations
Modelling choice observations
Model selection observations
Experimental observations

The default workflow phase ordering follows the documented KMDS workflow:

Exploratory
Data representation
Modelling choice
Model selection

Experimental observations are treated as a separate experimentation track.

Python API

Summary Mode

Use map_text_to_observation to get the full structured mapping object:

from kmds.utils.natural_language_observation import map_text_to_observation

mapping = map_text_to_observation(
    "The model accuracy dropped by 5% after pruning on 2026-04-20."
)

print(mapping.workflow_family)
print(mapping.observation_type)
print(mapping.extracted_entities.metric)
print(mapping.extracted_entities.value)
print(mapping.validation_passed)

Use summarize_observation_text when you want a compact human-readable summary:

from kmds.utils.natural_language_observation import summarize_observation_text

summary = summarize_observation_text(
    "Missing values were observed in the customer_age field during intake validation."
)
print(summary)

Generate Python Logging Code

Use build_observation_python_code to generate a code snippet that follows the existing KMDS ontology classes and properties:

from kmds.utils.natural_language_observation import (
    build_observation_python_code,
    map_text_to_observation,
)

mapping = map_text_to_observation(
    "We engineered a rolling 7 day demand feature from timestamped order counts."
)
code = build_observation_python_code(mapping)
print(code)

Generate JSON-LD

Use build_observation_jsonld to generate a JSON-LD structure that only uses existing KMDS schema properties:

from kmds.utils.natural_language_observation import (
    build_observation_jsonld,
    map_text_to_observation,
)

mapping = map_text_to_observation(
    "We chose XGBoost after comparing several tree ensembles on validation AUC 0.91."
)
json_ld = build_observation_jsonld(mapping)
print(json_ld)

Log Mode

Use log_text_as_observation to validate and save the observation into a KMDS knowledge base:

from kmds.utils.natural_language_observation import log_text_as_observation

result = log_text_as_observation(
    text="Missing values were observed in the customer_age field during intake validation.",
    workflow_name="support_reporting_intake",
    project_file_path="./support_reporting_intake.xml",
    project_mode="create",
    workflow_type="application",
)

print(result.mapping.observation_type)
print(result.project_file)

Update an existing KMDS knowledge base:

result = log_text_as_observation(
    text="We engineered a rolling 7 day demand feature from timestamped order counts.",
    workflow_name="support_reporting_intake",
    project_file_path="./support_reporting_intake.xml",
    project_mode="update",
)

CLI Usage

The feature is available as kmds-observe.

Summary Mode As Text

kmds-observe \
  --text "Missing values were observed in the customer_age field during intake validation." \
  --mode summary

Summary Mode As JSON

kmds-observe \
  --text "The model accuracy dropped by 5% after pruning on 2026-04-20." \
  --mode summary \
  --output-format json

Summary Mode From A File

kmds-observe \
  --text-file ./observation.txt \
  --mode summary \
  --output-format json

Log Mode: Create A New Project

kmds-observe \
  --text "Missing values were observed in the customer_age field during intake validation." \
  --mode log \
  --workflow-name "support_reporting_intake" \
  --project-file ./support_reporting_intake.xml \
  --workflow-type application \
  --create-project

Log Mode: Update An Existing Project

kmds-observe \
  --text "We engineered a rolling 7 day demand feature from timestamped order counts." \
  --mode log \
  --workflow-name "support_reporting_intake" \
  --project-file ./support_reporting_intake.xml \
  --update-project

Log Mode As JSON

kmds-observe \
  --text "We chose XGBoost after comparing several tree ensembles on validation AUC 0.91." \
  --mode log \
  --workflow-name "support_reporting_intake" \
  --project-file ./support_reporting_intake.xml \
  --update-project \
  --output-format json

Notebook Usage Pattern

You can use the mapper inside notebooks without switching to the CLI. This is useful when you want one observation to remain natural-language driven while the rest of the notebook continues to use manual ontology operations.

from kmds.ontology.kmds_ontology import ExploratoryObservation
from kmds.utils.natural_language_observation import map_text_to_observation

nl_mapping = map_text_to_observation(
    "Ticket creation and closed timestamps have inconsistent datetime formats, so they must be normalized before calculating time to resolution."
)

e4 = ExploratoryObservation(namespace=onto)
e4.finding = nl_mapping.finding
e4.finding_sequence = observation_count
e4.exploratory_observation_type = nl_mapping.observation_type
e4.intent = nl_mapping.intent
exp_obs_list.append(e4)

Validation Behavior

KMDS rejects vague inputs that do not provide enough structure for a valid observation. Examples of invalid input include very short or underspecified statements such as:

Looks better now

Typical validation checks include:

The text must be long enough to be meaningful
The text must contain enough context to classify into an existing KMDS type
The text should expose at least one structured element such as a metric, value, timestamp, or affected component
Model-selection observations should contain a measurable outcome

Outputs and Return Values

map_text_to_observation returns a structured mapping object with:

KMDS observation family
KMDS observation type
Ontology class and property names
Extracted entities
Validation status and validation errors
Classification confidence

log_text_as_observation returns a result object with:

The structured mapping
The project file path written to
The workflow name used
The action taken, create or update
The JSON-LD payload
The generated Python logging code