Module 5 Assignment: Metadata, lineage, and provenance

Contents

Module 5 Assignment: Metadata, lineage, and provenance#

Scenario#

You are advising a platform team designing data infrastructure for repeatable model training and monitoring. The stakeholders are: data engineer, ML engineer, security architect, data steward, and platform owner.

Task#

Answer the module question: How do we preserve the story of data transformations?

Use the module lab and course readings to produce: AI data platform design review with lineage, quality checks, cost controls, and access model focused on metadata, lineage, and provenance: Create a lineage record for a training dataset..

Required Evidence#

Define the decision or system boundary in one paragraph.
Identify the dataset, proxy data, or evidence source you used: synthetic pipeline events with freshness, schema drift, lineage completeness, volume, and access-risk indicators.
Compare at least two alternatives, baselines, policies, or designs.
Report one quantitative result or structured scoring table.
Explain two failure modes and one mitigation for each.
State what additional evidence would be required before real deployment.

Submission#

Submit the completed notebook plus a 900-1200 word memo. The memo must include clear headings for context, method, evidence, risks, recommendation, and open questions.

# Assignment workspace for Module 5: Metadata, lineage, and provenance
module = 5
decision = "How do we preserve the story of data transformations?"
artifact = "AI data platform design review with lineage, quality checks, cost controls, and access model focused on metadata, lineage, and provenance: Create a lineage record for a training dataset."

alternatives = [
    {"option": "baseline_or_manual_process", "strength": "", "risk": "", "evidence": ""},
    {"option": "ai_assisted_or_advanced_option", "strength": "", "risk": "", "evidence": ""},
]

recommendation = {
    "decision": decision,
    "recommended_option": "",
    "minimum_evidence_before_pilot": [],
    "monitoring_metric": "",
    "rollback_trigger": "",
}

{"module": module, "artifact": artifact, "alternatives": alternatives, "recommendation": recommendation}

{'module': 5,
 'artifact': 'AI data platform design review with lineage, quality checks, cost controls, and access model focused on metadata, lineage, and provenance: Create a lineage record for a training dataset.',
 'alternatives': [{'option': 'baseline_or_manual_process',
   'strength': '',
   'risk': '',
   'evidence': ''},
  {'option': 'ai_assisted_or_advanced_option',
   'strength': '',
   'risk': '',
   'evidence': ''}],
 'recommendation': {'decision': 'How do we preserve the story of data transformations?',
  'recommended_option': '',
  'minimum_evidence_before_pilot': [],
  'monitoring_metric': '',
  'rollback_trigger': ''}}

Acceptance Criteria#

Your submission is complete only if another reviewer can reproduce your reasoning from the evidence you provide. You do not need production-grade data, but you must be explicit about proxy-data limits and what would change with real institutional data.