Prompt System

The prompt system is one of the clearest demo surfaces in the repository. It shows how the solution turned heterogeneous scholarly sources into structured training data for extraction-and-preparation-pipeline.

The prompts are stored in prompts/ and are selected by extraction and splitting scripts according to document layout, source language, and data type.

Prompt Families

AKT extraction prompts handle side-by-side, top-bottom, OCR, Turkish, German, and Kouwenberg/Larsen-style alignment cases.
Journal prompts handle Turkish Dergipark papers and English Michel-style academic chapters.
Hecker prompts extract transliteration-only tablet entries from born-digital PDFs.
CAD prompts extract Old Assyrian attestations from dictionary-style OCR scans into dual raw/MT-normalized tracks.
Repair and sentence-splitting prompts turn document-level expert or synthetic outputs into sentence-aligned examples.

Design Pattern

Most extraction prompts combine:

a domain role, usually Assyriologist plus data engineer;
layout-specific instructions;
cleaning rules;
atomic chunking rules for witnesses, seals, goods, and itemized lists;
quality or confidence fields;
strict JSON output delimiters.

This is worth showing in a demo because it makes the data pipeline inspectable. The model-training code in training-stack is conventional enough; the prompt system explains much of how the non-standard data was converted into usable supervision.

Public-Wiki Handling

Do not copy full prompts into the wiki unless there is a specific reason. Summaries are preferred because the prompt files are already in the repo and can be read directly.

Deep Past Solution Brain

Explorer

Prompt System

Prompt System

Prompt Families

Design Pattern

Public-Wiki Handling

Graph View

Table of Contents

Backlinks