Distiller

Main class for structured extraction using LLM-backed output.

Constructor

Distiller(
    model_name: str = "openai/gpt-4.1-mini",
    system_prompt: str = "Extract relevant information.",
    response_schema: Type[BaseModel] = DefaultSchema,
    engine: LLMEngine | None = None,
    cache_path: str | None = None,
)

Parameters

Parameter Type Default Description
model_name str "openai/gpt-4.1-mini" LLM model identifier
system_prompt str "Extract..." System prompt for extraction
response_schema Type[BaseModel] DefaultSchema Pydantic schema for output
engine LLMEngine | None None Custom LLM engine
cache_path str | None None Path to SQLite cache

Methods

distill

Extract structured data from a single chunk.

result = await distiller.distill(
    query="What are the requirements?",
    chunk_text="Students must complete 130 credits...",
)
print(result.summary)

distill_many

Batch extraction with concurrency control.

results = await distiller.distill_many(
    query="Extract key points",
    chunks=["chunk 1...", "chunk 2...", "chunk 3..."],
    concurrency=5,
)

create_job

Deferred extraction for async processing.

job = distiller.create_job(
    query="Extract citations",
    chunk_text="The court held in Smith v. Jones...",
)
result = await job.run()