Module 5 Assignment: Metadata, lineage, and provenance

Module 5 Assignment: Metadata, lineage, and provenance#

Scenario#

You are advising a platform team designing data infrastructure for repeatable model training and monitoring. The stakeholders are: data engineer, ML engineer, security architect, data steward, and platform owner.

Task#

Answer the module question: How do we preserve the story of data transformations?

Use the module lab and course readings to produce: AI data platform design review with lineage, quality checks, cost controls, and access model focused on metadata, lineage, and provenance: Create a lineage record for a training dataset..

Required Evidence#

  • Define the decision or system boundary in one paragraph.

  • Identify the dataset, proxy data, or evidence source you used: synthetic pipeline events with freshness, schema drift, lineage completeness, volume, and access-risk indicators.

  • Compare at least two alternatives, baselines, policies, or designs.

  • Report one quantitative result or structured scoring table.

  • Explain two failure modes and one mitigation for each.

  • State what additional evidence would be required before real deployment.

Submission#

Submit the completed notebook plus a 900-1200 word memo. The memo must include clear headings for context, method, evidence, risks, recommendation, and open questions.

# Assignment workspace for Module 5: Metadata, lineage, and provenance
module = 5
decision = "How do we preserve the story of data transformations?"
artifact = "AI data platform design review with lineage, quality checks, cost controls, and access model focused on metadata, lineage, and provenance: Create a lineage record for a training dataset."

alternatives = [
    {"option": "baseline_or_manual_process", "strength": "", "risk": "", "evidence": ""},
    {"option": "ai_assisted_or_advanced_option", "strength": "", "risk": "", "evidence": ""},
]

recommendation = {
    "decision": decision,
    "recommended_option": "",
    "minimum_evidence_before_pilot": [],
    "monitoring_metric": "",
    "rollback_trigger": "",
}

{"module": module, "artifact": artifact, "alternatives": alternatives, "recommendation": recommendation}
{'module': 5,
 'artifact': 'AI data platform design review with lineage, quality checks, cost controls, and access model focused on metadata, lineage, and provenance: Create a lineage record for a training dataset.',
 'alternatives': [{'option': 'baseline_or_manual_process',
   'strength': '',
   'risk': '',
   'evidence': ''},
  {'option': 'ai_assisted_or_advanced_option',
   'strength': '',
   'risk': '',
   'evidence': ''}],
 'recommendation': {'decision': 'How do we preserve the story of data transformations?',
  'recommended_option': '',
  'minimum_evidence_before_pilot': [],
  'monitoring_metric': '',
  'rollback_trigger': ''}}

Acceptance Criteria#

Your submission is complete only if another reviewer can reproduce your reasoning from the evidence you provide. You do not need production-grade data, but you must be explicit about proxy-data limits and what would change with real institutional data.