Synthetic Data Generation

The sdg/ directory contains synthetic-data generation workflows used to teach Old Assyrian fundamentals and broaden training coverage. It complements source extraction from prompt-system and feeds the final data assembly in extraction-and-preparation-pipeline.

Workflows

grammar_transform.py: generates grammar transformations from seed examples and grammar/context resources.
generate_cad_drills.py: generates deliberate-practice examples from CAD/eSAD senses, examples, and a generation plan.
fill_engine.py: produces deterministic slot-filled examples from JSON templates without requiring an API call.

Template System

Template files under sdg/templates/ cover:

debts and loans;
legal and seal formulas;
letter openings and correspondence patterns;
accounting, memoranda, and trade examples.

The template engine draws from slot pools in template_pools.py, including names, commodities, amounts, places, months, eponyms, deadlines, penalties, occupations, kinship terms, and containers. Constraint helpers live in template_constraints.py.

Demo Value

This section is useful in the public brain because it shows the solution was not only model training. It combined:

mined scholarly data;
LLM-assisted extraction;
LLM-assisted transformations;
deterministic template generation;
source-specific normalization in normalization.

Provider-specific endpoint details in SDG config files should stay out of public prose. The reusable point is the data-generation architecture, not the API vendor.

Deep Past Solution Brain

Explorer

Synthetic Data Generation

Synthetic Data Generation

Workflows

Template System

Demo Value

Graph View

Table of Contents

Backlinks