# Syllabus: AINS6006 Big Data Management for AI Applications

## Catalog Description

Designs data platforms, pipelines, lineage, cloud integration, and security for AI workflows.

## Course Structure

Each week includes readings, a lecture/slide sequence, an executable lab, and an applied deliverable. Students maintain a reproducible project record and submit work through the LMS or GitHub workflow selected by the instructor.

## Weekly Schedule

| Week | Topic | Essential Question | Deliverable |
|------|-------|--------------------|-------------|
| 1 | Data architectures for AI | What architecture supports trustworthy AI workflows? | Lab notebook + assignment brief |
| 2 | Pipelines, orchestration, and quality | How do data pipelines fail, and how are failures detected? | Lab notebook + assignment brief |
| 3 | Storage, indexing, and retrieval | How do access patterns shape storage choices? | Lab notebook + assignment brief |
| 4 | Distributed processing and scale | When does scale require distributed computation? | Lab notebook + assignment brief |
| 5 | Metadata, lineage, and provenance | How do we preserve the story of data transformations? | Lab notebook + assignment brief |
| 6 | Cloud integration and cost control | How do cloud choices affect reliability and budget? | Lab notebook + assignment brief |
| 7 | Security and access governance | How should sensitive data be protected across AI workflows? | Lab notebook + assignment brief |
| 8 | AI data platform readiness review | What proves a data system can support production AI? | Lab notebook + assignment brief |

## Assessment

| Component | Weight |
|-----------|--------|
| Weekly labs and notebooks | 30% |
| Applied assignments | 35% |
| Participation and technical critique | 15% |
| Final synthesis portfolio | 20% |

## Graduate Expectations

Submissions must show technical reasoning, evidence awareness, clear limitations, and responsible use of AI assistance. Code and analysis should be reproducible enough for instructor review.
