Dataset Serialization ↗

pydantic-ai concept intermediate testing

Original Documentation

Learn how to save and load datasets in different formats, with support for custom evaluators and IDE integration.

Overview#

Pydantic Evals supports serializing datasets to files in two formats:

YAML (.yaml, .yml) - Human-readable, great for version control
JSON (.json) - Structured, machine-readable

Both formats support:

Automatic JSON schema generation for IDE autocomplete and validation
Custom evaluator serialization/deserialization
Type-safe loading with generic parameters

YAML Format#

YAML is the recommended format for most use cases due to its readability and compact syntax.

Basic Example#

from typing import Any

from pydantic_evals import Case, Dataset
from pydantic_evals.evaluators import EqualsExpected, IsInstance

# Create a dataset with typed parameters
dataset = Dataset[str, str, Any](
    name='my_tests',
    cases=[
        Case(
            name='test_1',
            inputs='hello',
            expected_output='HELLO',
        ),
    ],
    evaluators=[
        IsInstance(type_name='str'),
        EqualsExpected(),
    ],
)

# Save to YAML
dataset.to_file('my_tests.yaml')

This creates two files:

my_tests.yaml - The dataset
my_tests_schema.json - JSON schema for IDE support

YAML Output#

# yaml-language-server: $schema=my_tests_schema.json
name: my_tests
cases:
- name: test_1
  inputs: hello
  expected_output: HELLO
evaluators:
- IsInstance: str
- EqualsExpected

JSON Schema for IDEs#

The first line references the schema file:

# yaml-language-server: $schema=my_tests_schema.json

This enables:

✅ Autocomplete in VS Code, PyCharm, and other editors
✅ Inline validation while editing
✅ Documentation tooltips for fields
✅ Error highlighting for invalid data

Editor Support

The yaml-language-server comment is supported by:

VS Code (with YAML extension)
JetBrains IDEs (PyCharm, IntelliJ, etc.)
Most editors with YAML language server support

See the YAML Language Server docs for more details.

Loading from YAML#

from pathlib import Path
from typing import Any

from pydantic_evals import Case, Dataset
from pydantic_evals.evaluators import EqualsExpected, IsInstance

# First create and save the dataset
Path('my_tests.yaml').parent.mkdir(exist_ok=True)
dataset = Dataset[str, str, Any](
    name='my_tests',
    cases=[Case(name='test_1', inputs='hello', expected_output='HELLO')],
    evaluators=[IsInstance(type_name='str'), EqualsExpected()],
)
dataset.to_file('my_tests.yaml')

# Load the dataset with type parameters
dataset = Dataset[str, str, Any].from_file('my_tests.yaml')


def my_task(text: str) -> str:
    return text.upper()


# Run evaluation
report = dataset.evaluate_sync(my_task)

JSON Format#

JSON format is useful for programmatic generation or when strict structure is required.

Basic Example#

from typing import Any

from pydantic_evals import Case, Dataset
from pydantic_evals.evaluators import EqualsExpected

dataset = Dataset[str, str, Any](
    name='my_tests',
    cases=[
        Case(name='test_1', inputs='hello', expected_output='HELLO'),
    ],
    evaluators=[EqualsExpected()],
)

# Save to JSON
dataset.to_file('my_tests.json')

JSON Output#

{
  "$schema": "my_tests_schema.json",
  "name": "my_tests",
  "cases": [
    {
      "name": "test_1",
      "inputs": "hello",
      "expected_output": "HELLO"
    }
  ],
  "evaluators": [
    "EqualsExpected"
  ]
}

The $schema key at the top enables IDE support similar to YAML.

Loading from JSON#

from typing import Any

from pydantic_evals import Case, Dataset
from pydantic_evals.evaluators import EqualsExpected

# First create and save the dataset
dataset = Dataset[str, str, Any](
    name='my_tests',
    cases=[Case(name='test_1', inputs='hello', expected_output='HELLO')],
    evaluators=[EqualsExpected()],
)
dataset.to_file('my_tests.json')

# Load from JSON
dataset = Dataset[str, str, Any].from_file('my_tests.json')

Schema Generation#

Automatic Schema Creation#

By default, to_file() creates a JSON schema file alongside your dataset:

from typing import Any

from pydantic_evals import Case, Dataset

dataset = Dataset[str, str, Any](cases=[Case(inputs='test')])

# Creates both my_tests.yaml AND my_tests_schema.json
dataset.to_file('my_tests.yaml')

Custom Schema Location#

from pathlib import Path
from typing import Any

from pydantic_evals import Case, Dataset

dataset = Dataset[str, str, Any](cases=[Case(inputs='test')])

# Create directories
Path('data').mkdir(exist_ok=True)

# Custom schema filename (relative to dataset file location)
dataset.to_file(
    'data/my_tests.yaml',
    schema_path='my_schema.json',
)

# No schema file
dataset.to_file('my_tests.yaml', schema_path=None)

Schema Path Templates#

Use {stem} to reference the dataset filename:

from typing import Any

from pydantic_evals import Case, Dataset

dataset = Dataset[str, str, Any](cases=[Case(inputs='test')])

# Creates: my_tests.yaml and my_tests.schema.json
dataset.to_file(
    'my_tests.yaml',
    schema_path='{stem}.schema.json',
)

Manual Schema Generation#

Generate a schema without saving the dataset:

import json
from typing import Any

from pydantic_evals import Dataset

# Get schema as dictionary for a specific dataset type
schema = Dataset[str, str, Any].model_json_schema_with_evaluators()

# Save manually
with open('custom_schema.json', 'w', encoding='utf-8') as f:
    json.dump(schema, f, indent=2)

Custom Evaluators#

Custom evaluators require special handling during serialization and deserialization.

Requirements#

Custom evaluators must:

Be decorated with @dataclass
Inherit from Evaluator
Be passed to both to_file() and from_file()

Complete Example#

from dataclasses import dataclass
from typing import Any

from pydantic_evals import Case, Dataset
from pydantic_evals.evaluators import Evaluator, EvaluatorContext


@dataclass
class CustomThreshold(Evaluator):
    """Check if output length exceeds a threshold."""

    min_length: int
    max_length: int = 100

    def evaluate(self, ctx: EvaluatorContext) -> bool:
        length = len(str(ctx.output))
        return self.min_length <= length <= self.max_length


# Create dataset with custom evaluator
dataset = Dataset[str, str, Any](
    cases=[
        Case(
            name='test_length',
            inputs='example',
            expected_output='long result',
            evaluators=[
                CustomThreshold(min_length=5, max_length=20),
            ],
        ),
    ],
)

# Save with custom evaluator types
dataset.to_file(
    'dataset.yaml',
    custom_evaluator_types=[CustomThreshold],
)

Saved YAML#

# yaml-language-server: $schema=dataset_schema.json
cases:
- name: test_length
  inputs: example
  expected_output: long result
  evaluators:
  - CustomThreshold:
      min_length: 5
      max_length: 20

Loading with Custom Evaluators#

from dataclasses import dataclass
from typing import Any

from pydantic_evals import Case, Dataset
from pydantic_evals.evaluators import Evaluator, EvaluatorContext


@dataclass
class CustomThreshold(Evaluator):
    """Check if output length exceeds a threshold."""

    min_length: int
    max_length: int = 100

    def evaluate(self, ctx: EvaluatorContext) -> bool:
        length = len(str(ctx.output))
        return self.min_length <= length <= self.max_length


# First create and save the dataset
dataset = Dataset[str, str, Any](
    cases=[
        Case(
            name='test_length',
            inputs='example',
            expected_output='long result',
            evaluators=[CustomThreshold(min_length=5, max_length=20)],
        ),
    ],
)
dataset.to_file('dataset.yaml', custom_evaluator_types=[CustomThreshold])

# Load with custom evaluator registry
dataset = Dataset[str, str, Any].from_file(
    'dataset.yaml',
    custom_evaluator_types=[CustomThreshold],
)

Important

You must pass custom_evaluator_types to both to_file() and from_file().

to_file(): Includes the evaluator in the JSON schema
from_file(): Registers the evaluator for deserialization

Evaluator Serialization Formats#

Evaluators can be serialized in three forms:

1. Name Only (No Parameters)#

evaluators:
- EqualsExpected
- IsInstance: str  # Using default parameter

2. Single Parameter (Short Form)#

evaluators:
- IsInstance: str
- Contains: "required text"
- MaxDuration: 2.0

3. Multiple Parameters (Dict Form)#

evaluators:
- CustomThreshold:
    min_length: 5
    max_length: 20
- LLMJudge:
    rubric: "Response is accurate"
    model: "openai:gpt-5"
    include_input: true

Format Comparison#

Feature	YAML	JSON
Human readable	✅ Excellent	⚠️ Good
Comments	✅ Yes	❌ No
Compact	✅ Yes	⚠️ Verbose
Machine parsing	✅ Good	✅ Excellent
IDE support	✅ Yes	✅ Yes
Version control	✅ Clean diffs	⚠️ Noisy diffs

Recommendation: Use YAML for most cases, JSON for programmatic generation.

Advanced: Evaluator Serialization Name#

Customize how your evaluator appears in serialized files:

from dataclasses import dataclass

from pydantic_evals.evaluators import Evaluator, EvaluatorContext


@dataclass
class VeryLongDescriptiveEvaluatorName(Evaluator):
    @classmethod
    def get_serialization_name(cls) -> str:
        return 'ShortName'

    def evaluate(self, ctx: EvaluatorContext) -> bool:
        return True

In YAML:

evaluators:
- ShortName  # Instead of VeryLongDescriptiveEvaluatorName

Troubleshooting#

Schema Not Found in IDE#

Problem: YAML file doesn’t show autocomplete

Solutions:

Check the schema path in the first line of YAML:

# yaml-language-server: $schema=correct_schema_name.json

Verify schema file exists in the same directory
Restart the language server in your IDE
Install YAML extension (VS Code: “YAML” by Red Hat)

Custom Evaluator Not Found#

Problem: ValueError: Unknown evaluator name: 'CustomEvaluator'

Solution: Pass custom_evaluator_types when loading:

from dataclasses import dataclass
from typing import Any

from pydantic_evals import Case, Dataset
from pydantic_evals.evaluators import Evaluator, EvaluatorContext


@dataclass
class CustomEvaluator(Evaluator):
    def evaluate(self, ctx: EvaluatorContext) -> bool:
        return True


# First create and save with custom evaluator
dataset = Dataset[str, str, Any](
    cases=[Case(inputs='test', evaluators=[CustomEvaluator()])],
)
dataset.to_file('tests.yaml', custom_evaluator_types=[CustomEvaluator])

# Load with custom evaluator types
dataset = Dataset[str, str, Any].from_file(
    'tests.yaml',
    custom_evaluator_types=[CustomEvaluator],  # Required!
)

Format Inference Failed#

Problem: ValueError: Cannot infer format from extension

Solution: Specify format explicitly:

from typing import Any

from pydantic_evals import Case, Dataset

dataset = Dataset[str, str, Any](cases=[Case(inputs='test')])

# Explicit format for unusual extensions
dataset.to_file('data.txt', fmt='yaml')
dataset_loaded = Dataset[str, str, Any].from_file('data.txt', fmt='yaml')

Schema Generation Error#

Problem: Custom evaluator causes schema generation to fail

Solution: Ensure evaluator is a proper dataclass:

from dataclasses import dataclass

from pydantic_evals.evaluators import Evaluator, EvaluatorContext


# ✅ Correct
@dataclass
class MyEvaluator(Evaluator):
    value: int

    def evaluate(self, ctx: EvaluatorContext) -> bool:
        return True


# ❌ Wrong: Missing @dataclass
class BadEvaluator(Evaluator):
    def __init__(self, value: int):
        self.value = value

    def evaluate(self, ctx: EvaluatorContext) -> bool:
        return True

Next Steps#

Dataset Management - Creating and organizing datasets
Custom Evaluators - Write custom evaluation logic
Core Concepts - Understand the data model

Link last verified June 7, 2026. View original ↗

Source: Pydantic AI Docs

Link last verified: 2026-03-04