Transcript Import Workflow
Transcript Import restructures a raw conversation — between humans, between human and AI, or a monolog — into a structured intermediate document ready for scaffold mode. It does not write .mdoc files; it produces a .processed.md file.
Trigger: import transcript <file> where <file> ends in .transcript.md.
Input format: The transcript file has two sections:
## Instructions
(Optional processing context — scope constraints, focus areas, known decisions. May be empty. When present, treat as authoritative directives that override inferences from the transcript body.)
## Transcript
(Raw conversation content — speaker-attributed or plain prose.)Output: docs/{basename}.processed.md where {basename} is the input filename without the .transcript.md suffix.
Phase 1 — Read & Parse
Read the .transcript.md file in full. Separate the two sections:
-
Instructions — extract all processing directives. These constrain every subsequent phase:
- Scope restrictions (“focus on the auth domain only”)
- Known decisions (“we decided on event sourcing”)
- Exclusions (“ignore the billing discussion”)
- Role assignments (“Sarah is the product owner”)
- Any other context the user considers relevant
-
Transcript — identify the conversation structure:
- Speakers — extract all distinct participants (by name, label, or role). If speakers are not attributed, treat the transcript as a monolog.
- Conversation type — classify: requirements gathering, design discussion, brainstorm, decision meeting, technical deep-dive, user interview, or mixed.
- Topic segments — identify natural topic boundaries (subject changes, agenda items, explicit transitions).
Output a brief parse summary before proceeding:
Parsed: {n} speakers, {m} topic segments, type: {conversation-type}Instructions: {summary of directives, or "none"}Phase 2 — Extract & Classify
Walk through the transcript segment by segment. For each substantive statement, extract and classify it into one of these categories:
| Category | What to look for | Maps to |
|---|---|---|
| Role / Actor | Named users, personas, system actors, “as a …” | role document |
| Domain concept | Bounded concerns, entities, data ownership, “the X subsystem” | domain document |
| Feature | User-facing capabilities, “users can …”, “the system allows …” | feature document |
| Flow / Scenario | Step sequences, “first … then …”, “the happy path is …” | flow document |
| Requirement | Constraints, “must”, “shall”, invariants, acceptance criteria | {% requirement %} |
| Policy | Reactive rules, “when X → Y”, “automatically”, triggers | {% policy %} |
| API item | Actions, events, operations, error cases | {% action %} / {% event %} / {% operation %} / {% error %} |
| Data model | Entities, attributes, relationships, “has a”, “belongs to” | {% model %} |
| NFR | Performance, security, scalability, accessibility constraints | {% requirement %} with NFR scope |
| Value / Principle | Beliefs, design philosophies, “we value …”, “our approach is …” | {% value %} / {% principle %} |
| Goal | Success metrics, launch targets, KPIs | {% goal %} |
| Architecture | Technology choices, integration patterns, infrastructure decisions | blueprint document |
| UI / Interaction | Screen descriptions, navigation, layout, “the user sees …” | surface document or {% surface %} |
| Open question | Unresolved discussions, “we need to decide”, “TBD”, “not sure” | Open Questions section |
| Decision | Resolved choices, “we agreed”, “the decision is”, “let’s go with” | Decisions section |
Attribution: For each extract, record the speaker (if identifiable) and approximate position in the transcript (beginning / middle / end). This is metadata for the processed file, not for the final spec.
Instructions override: If the Instructions section restricts scope, skip extracts outside that scope entirely. If it declares a decision, mark any contradicting transcript content as superseded.
Phase 3 — Deduplicate & Consolidate
Conversations revisit topics. The same concept may be discussed three times with slight variations. This phase collapses redundancy.
For each category from Phase 2:
- Group extracts that refer to the same concept (same entity, same feature, same constraint — even if worded differently).
- Merge into a single canonical statement per concept:
- Prefer the most specific and complete formulation.
- Prefer later statements over earlier ones (conversations tend to refine).
- Prefer statements from domain experts over general discussion (use speaker roles from Instructions if available).
- Discard pure repetitions, filler, off-topic tangents, and social exchanges.
Track the merge count — report how many raw extracts collapsed into how many canonical items.
Phase 4 — Detect Inconsistencies
Compare all canonical items for contradictions:
| Inconsistency type | Example |
|---|---|
| Contradicting requirements | ”Must be real-time” vs. “batch processing is fine” |
| Scope conflict | Feature X assigned to two different domains |
| Naming collision | Two different concepts using the same term |
| Undecided alternative | Multiple options discussed, no resolution recorded |
| Decision vs. discussion | A later discussion reopens a previously recorded decision |
For each inconsistency, record:
- The conflicting statements (with speaker attribution)
- The category and concept they affect
- A suggested resolution if one is obvious from context (e.g., later statement supersedes earlier)
Do not silently resolve inconsistencies — always flag them. The processed file must make every conflict visible.
Phase 5 — Map to StarSpec Metamodel
Organize the deduplicated, classified extracts into a document plan aligned to the StarSpec content tree:
- Roles — one entry per identified actor with goals and responsibilities.
- Domains — group related concepts, entities, policies, and API items under bounded domains. Each domain gets:
- Glossary terms
- API items (actions, events, operations, errors)
- Data model entities
- Policies
- Features — map user-facing capabilities to features, linking to domains and roles.
- Flows — map step sequences to flows, linking to features.
- Blueprints — group architecture and technology decisions.
- Manifest — collect values, principles, goals, and NFRs.
Apply naming conventions from starspec/agents/conventions/naming-standards:
- Document ids:
kebab-case - Actions:
kebab-case-imperative(e.g.create-order) - Events:
kebab-case-past-tense(e.g.order-created) - Operations:
kebab-case(e.g.get-order) - Errors:
kebab-case-noun-phrase(e.g.order-not-found)
For each planned document, estimate completeness (0–100%) using the same scale as scaffold mode.
Phase 6 — Write Processed File
Write the output to docs/{basename}.processed.md. The file must follow this exact structure:
# Transcript Import — {title or subject}
**Source:** {relative path to .transcript.md file}**Date:** {ISO 8601 date}**Speakers:** {comma-separated list, or "monolog"}**Type:** {conversation type from Phase 1}**Instructions:** {summary of processing directives, or "none"}
---
## Document Plan
| Type | Id | Purpose | Completeness ||------|-----|---------|-------------|| manifest | {id} | {one-line} | {n}% || role | {id} | {one-line} | {n}% || domain | {id} | {one-line} | {n}% || ... | ... | ... | ... |
---
## Roles
### {role-id}
- **Goals:** {extracted goals}- **Responsibilities:** {extracted responsibilities}- **Source:** {speaker, position}
---
## Domains
### {domain-id}
**Scope:** {one-line scope description}
**Glossary:**- **{term}** — {definition}
**API:**- Action: `{signature}`- Event: `{event-name}: { fields }`- Operation: `{signature}`- Error: `{error-name}` — {description}
**Model:**{DBML or entity-relationship description}
**Policies:**- When {source} → {reaction} ({policy-id})
---
## Features
### {feature-id}
- **Domains:** {domain-id list}- **Roles:** {role-id list}- **Requirements:** - {requirement description} (priority: {must/should/could})- **Acceptance criteria:** - {criterion}
---
## Flows
### {flow-id}
- **Feature:** {feature-id}- **Preconditions:** {list}- **Steps:** 1. {Actor} {action} → {outcome} 2. ...- **Postconditions:** {list}
---
## Blueprints
### {blueprint-id}
- **Decisions:** {architecture/technology choices}- **Rationale:** {why}
---
## Manifest
**Values:**- {VALUE_NAME}: {description}
**Principles:**- {principle-name}: {description}
**Goals:**- {goal-name}: {target} (status: pending)
**NFRs:**- {nfr-name}: {constraint} (priority: {must/should})
---
## Decisions
Resolved choices extracted from the transcript:
1. **{decision}** — decided by {speaker}, {position in transcript}. Rationale: {why}2. ...
---
## Open Questions
Unresolved items that require follow-up:
1. [{GAP_TAG}] **{question}** — raised by {speaker}. Context: {surrounding discussion}2. ...
Gap tags: [SCOPE], [NAMING], [DOMAIN], [ERROR], [FLOW], [UI], [POLICY], [NFR], [ACTOR]
---
## Inconsistencies
Conflicts detected between transcript statements:
1. **{concept}** — "{statement A}" ({speaker A}) vs. "{statement B}" ({speaker B}). Suggested resolution: {suggestion, or "needs user input"}2. ...
---
## Processing Summary
- **Raw extracts:** {n}- **After deduplication:** {m} ({n - m} redundant statements removed)- **Inconsistencies:** {k}- **Open questions:** {q}- **Decisions recorded:** {d}- **Documents planned:** {p} (avg. completeness: {avg}%)Omit any section that has zero items (e.g., if there are no blueprints, omit the Blueprints section). Do not leave empty sections.
Phase 7 — Report & Next Steps
Output a concise summary to the user:
Transcript imported — {m} items extracted from {n} raw statements
Documents planned: {p} {type counts: e.g. 2 roles, 3 domains, 4 features, 2 flows}
Decisions recorded: {d}Open questions: {q}Inconsistencies: {k}
Output: docs/{basename}.processed.mdThen suggest next steps:
- If open questions or inconsistencies exist: “Review open questions and inconsistencies in the processed file before scaffolding.”
- If the processed file is clean: “Ready to scaffold. Run:
scaffold from docs/{basename}.processed.md” - If the Instructions section requested specific focus: note what was excluded and suggest separate import runs if needed.
Do’s and Don’ts
Do:
- Read the entire transcript before extracting — context from later sections often reframes earlier statements
- Honour the Instructions section as authoritative — it overrides inferences from the transcript
- Preserve the speakers’ own terminology in glossary terms and concept names
- Record attribution (speaker + position) for traceability
- Flag every inconsistency — never silently resolve conflicts
- Apply StarSpec naming conventions to all proposed ids
- Omit empty sections from the processed file
Don’t:
- Write
.mdocfiles — this workflow produces.processed.mdonly - Ask clarifying questions during import — infer, document uncertainty, and proceed (same principle as scaffold mode)
- Invent requirements not supported by the transcript text or Instructions
- Discard “off-topic” remarks without checking — they sometimes contain implicit NFRs, policies, or domain boundaries
- Merge statements from different speakers without recording the merge
- Resolve inconsistencies silently — the user must see every conflict
Definition of Done
- Both sections (Instructions + Transcript) parsed and accounted for
- All substantive statements extracted and classified
- Redundancy eliminated with merge counts reported
- Inconsistencies flagged with suggested resolutions
- Open questions gathered with gap tags
- Document plan aligned to StarSpec metamodel with completeness scores
- Processed file written to
docs/{basename}.processed.md - Summary with next-step guidance displayed to user