04 - Databricks Tooling

Databricks Jobs

Creating and Managing Jobs

  • Creating Jobs: Understand how to create jobs in the Databricks UI, including configuring tasks, dependencies, and scheduling.
  • Job Clusters: Know the difference between all-purpose clusters and job clusters, and when to use each.
  • Task Types: Be familiar with different task types such as notebooks, JAR, Python scripts, and SQL tasks.

Job Clusters

Feature All-Purpose Clusters Job Clusters
Primary Use Case Interactive analysis, development, and collaboration Scheduled jobs, ETL, batch processing, production tasks
Lifecycle Long-running Ephemeral, auto-terminates after job completion
User Interaction Supports multiple concurrent users Typically used by a single job/task at a time
Cost Efficiency May incur higher costs if left running unnecessarily More cost-efficient due to on-demand creation and termination
Resource Isolation Shared environment Isolated per job/task

Configuring Job Parameters

  • Passing Parameters: Learn how to pass parameters to tasks using widgets, the REST API, and the Databricks CLI.
  • Base Parameters: Understand how to define base parameters when setting up a job and how to override them.

Task Dependencies:

  • Task Dependencies: Know how to configure task dependencies within a job to ensure tasks run in the correct order.
  • Conditional Execution: Understand how to set up tasks that only run if previous tasks succeed or fail.

Scheduling and Automation:

  • Job Scheduling: Know how to schedule jobs using the Databricks scheduler and how to create cron expressions for custom schedules.
  • Job Orchestration: Understand how to use Databricks workflows to orchestrate complex job dependencies and multi-task jobs.

Advanced Job Features:

  • Task Retry Policies: Configure retry policies for tasks to handle transient failures.
  • Task Execution Context: Understand the execution context for different task types and how to manage state between tasks.
  • Secrets Management: Be familiar with managing secrets in Databricks and how to securely pass sensitive information to jobs.

Creating Widgets

dbutils.widgets.text("input_param", "default_value", "Parameter Label")
input_param = dbutils.widgets.get("input_param")
print(f"The input parameter is: {input_param}")

Using Databricks API

https://docs.databricks.com/api/workspace/introduction

curl --request GET "https://${DATABRICKS_HOST}/api/2.0/clusters/get" \
     --header "Authorization: Bearer ${DATABRICKS_TOKEN}" \
     --data '{ "cluster_id": "1234-567890-a12bcde3" }'

Databricks Secrets

  • A secrets scope is a logical grouping for secrets that can be managed independently. It allows you to control access to secrets within that scope.
  • A secret is a key-value pair where the value is encrypted and securely stored.
  • dbutils.secrets method can be used to access secrets from notebook
# Using Databricks CLI
databricks secrets create-scope --scope my-scope
databricks secrets put --scope my-scope --key my-secret-key

# Access Secrets
secret_value = dbutils.secrets.get(scope="my-scope", key="my-secret-key")
print(f"My secret value is: {secret_value}")