Corporate Onsite Workshop Syllabus
neurex.dev

Databricks Agentic Data Engineering Workshop

2-Day Intensive Onsite 12–20 Data Engineers

This workshop teaches data engineering teams to use agentic coding assistants—Claude Code, GitHub Copilot, Codex CLI, Cursor, and others—to build, maintain, and evolve data pipelines, dashboards, and data applications on the Databricks Data Intelligence Platform. We teach the full agentic development lifecycle through the lens of data engineering: Unity Catalog context injection, Databricks AI Dev Kit skills, MCP-based tool access, event-driven backpressure (deterministic tools wired to agent events for imperative self-correction), and production-grade deployment patterns.

The curriculum is tool-agnostic by design—agents come and go, but the context-engineering and backpressure patterns are durable.

Engagement Model 2 × 2-hour online discovery sessions + 2-day onsite intensive

Duration 2 days (16 hours instruction + 4 hours guided lab)

Format Instructor-led, hands-on, cohort-based

Prerequisites Python + SQL proficiency; active Databricks workspace; Git fluency; basic CI/CD familiarity

Deliverables Databricks agentic dev environment config, BACKPRESSURE.md spec, agentic CI/CD pipeline


Pre-Workshop Discovery Sessions (Online)

Session 1 — Scoping & Stakeholder Alignment

2 hours · 2–3 weeks before workshop

  • Audience mapping: Identify participant roles, skill levels, and learning objectives
  • Databricks environment audit: Workspace configuration, Unity Catalog setup, existing pipelines and dashboards
  • Tool audit: Current agentic tools in use (Claude Code, Copilot, Cursor, Codex, etc.) and Databricks AI Dev Kit adoption status
  • Data landscape review: Key data sources, pipelines, quality challenges, and dashboard inventory
  • Pain point identification: Where does agentic data engineering currently break down?
  • Customization brief: Identify 2–3 real-world data scenarios from your backlog to use as capstone candidates

Deliverable: Customization brief and pre-workshop preparation checklist

Session 2 — Curriculum Customization & Environment Prep

2 hours · 1 week before workshop

  • Skill alignment: Identify which modules need depth vs. pace adjustment
  • Databricks environment validation: AI Dev Kit installation, MCP server configuration, Unity Catalog access, Git integration
  • BACKPRESSURE.md draft: Co-create initial data verification contract; identify deterministic tools to wire to agent events
  • Materials preview: Review custom playbooks, Databricks skills, and pipeline templates built for your team
  • (Optional) Scenario finalization: Lock in capstone project(s)

Deliverable: Finalized curriculum, customized materials, and environment readiness confirmation


Day 1 — Databricks Agentic Dev Environment & The Agentic Data Loop

Module 1: The Databricks Agentic Landscape

90 min

What agentic coding looks like on the Data Intelligence Platform

Hands-on: Install the Databricks AI Dev Kit; configure Claude Code with Databricks MCP; run a side-by-side comparison with Copilot on the same data task

Module 2: Context Engineering for Data Workloads

120 min

Unity Catalog as the agent's shared brain

Hands-on: Build a custom Databricks skill for "Create a GDPR-compliant ingestion pipeline"; configure Unity Catalog schema injection into agent prompts; compare agent output with/without lineage context

Module 3: The Agentic Data Loop — Pipelines, Dashboards, Apps

120 min

Agents that build data systems, not just SQL queries

Hands-on: Give an agent a natural-language spec for a 3-table ETL pipeline; have it generate DLT code with expectations; auto-generate a SQL dashboard from the output tables

Module 4: Backpressure I — Data Validation & Testing

90 min

BACKPRESSURE.md defines the data contract; deterministic tools enforce it with imperative feedback

Hands-on: Write a BACKPRESSURE.md data contract; wire deterministic tools (SQL validator, schema checker, data diff) to agent events; have an agent generate a pipeline and iterate until all tools pass

Day 2 — Verification, Validation & Production DataOps

Module 5: Backpressure II — E2E, Visual & Performance Validation

120 min

From data correctness to production readiness

Hands-on: Build an E2E test for a DLT pipeline with golden-data comparison; configure Playwright screenshot tests for a SQL dashboard; have an agent optimize a slow query and validate the plan improvement

Module 6: Agentic Review, PRs & DataOps

90 min

Evidence-first review for data engineering changes

Hands-on: Configure a GitHub Agentic Workflow for Databricks PRs; set up auto-generated PR descriptions with data diffs and lineage impact; build a review confidence scorecard

Module 7: DataOps, Staging & Blue/Green for Data

120 min

Production-grade deployment of agent-built data systems

Hands-on: Build a GitHub Actions pipeline for Databricks: agent PR → data tests → staging schema deploy → blue/green diff → canary gate → human approval → production swap

Module 8: The Production Agentic Data Workflow

90 min

End-to-end: from business requirement to deployed data system

Hands-on: Teams run a full agentic data workflow: "Build a customer churn prediction pipeline with dashboard"—from spec to production-ready PR with all gates

Optional Capstone Day (Day 3 — 6 hours)

Add-on package: Core + Capstone Day ($32,500). Teams of 3–4 take a real data engineering task (from their backlog or a curated scenario) through the complete agentic data workflow, guided by an instructor. Each team must demonstrate:

  1. Context engineering: Unity Catalog injection, AGENTS.md, Databricks skills, MCP config
  2. Agentic generation: Using ≥2 tools to build a data pipeline + dashboard
  3. Backpressure compliance: BACKPRESSURE.md contract enforced by deterministic tools (schema, quality, lineage, performance)
  4. Test coverage: Unit SQL tests + DLT integration tests + E2E snapshot + Playwright dashboard screenshots
  5. Agentic review: Automated PR with data diff, lineage impact, cost estimate
  6. Deployment plan: Staging schema → blue/green → canary with automatic rollback triggers

Instructor feedback: Real-time critique using the neurex.dev Agentic Data Workflow Review framework

Post-Workshop Resources

Platform Requirements

Databricks Workspace Unity Catalog enabled; Serverless SQL + Jobs available
AI Dev Kit pip install databricks-ai-dev-kit or equivalent
Agent Tools Claude Code, Copilot, and/or Codex CLI installed locally
Git Repository connected to Databricks Git integration
CI/CD GitHub Actions or Azure DevOps for pipeline exercises
Data Workshop data provided; customer data may be substituted for capstone