04 - Databricks Tooling
Databricks Jobs
Creating and Managing Jobs
- Creating Jobs: Understand how to create jobs in the Databricks UI, including configuring tasks, dependencies, and scheduling.
- Job Clusters: Know the difference between all-purpose clusters and job clusters, and when to use each.
- Task Types: Be familiar with different task types such as notebooks, JAR, Python scripts, and SQL tasks.
Job Clusters
Feature |
All-Purpose Clusters |
Job Clusters |
Primary Use Case |
Interactive analysis, development, and collaboration |
Scheduled jobs, ETL, batch processing, production tasks |
Lifecycle |
Long-running |
Ephemeral, auto-terminates after job completion |
User Interaction |
Supports multiple concurrent users |
Typically used by a single job/task at a time |
Cost Efficiency |
May incur higher costs if left running unnecessarily |
More cost-efficient due to on-demand creation and termination |
Resource Isolation |
Shared environment |
Isolated per job/task |
Configuring Job Parameters
- Passing Parameters: Learn how to pass parameters to tasks using widgets, the REST API, and the Databricks CLI.
- Base Parameters: Understand how to define base parameters when setting up a job and how to override them.
Task Dependencies:
- Task Dependencies: Know how to configure task dependencies within a job to ensure tasks run in the correct order.
- Conditional Execution: Understand how to set up tasks that only run if previous tasks succeed or fail.
Scheduling and Automation:
- Job Scheduling: Know how to schedule jobs using the Databricks scheduler and how to create cron expressions for custom schedules.
- Job Orchestration: Understand how to use Databricks workflows to orchestrate complex job dependencies and multi-task jobs.
Advanced Job Features:
- Task Retry Policies: Configure retry policies for tasks to handle transient failures.
- Task Execution Context: Understand the execution context for different task types and how to manage state between tasks.
- Secrets Management: Be familiar with managing secrets in Databricks and how to securely pass sensitive information to jobs.
dbutils.widgets.text("input_param", "default_value", "Parameter Label")
input_param = dbutils.widgets.get("input_param")
print(f"The input parameter is: {input_param}")
Using Databricks API
https://docs.databricks.com/api/workspace/introduction
curl --request GET "https://${DATABRICKS_HOST}/api/2.0/clusters/get" \
--header "Authorization: Bearer ${DATABRICKS_TOKEN}" \
--data '{ "cluster_id": "1234-567890-a12bcde3" }'
Databricks Secrets
- A secrets scope is a logical grouping for secrets that can be managed independently. It allows you to control access to secrets within that scope.
- A secret is a key-value pair where the value is encrypted and securely stored.
dbutils.secrets
method can be used to access secrets from notebook
# Using Databricks CLI
databricks secrets create-scope --scope my-scope
databricks secrets put --scope my-scope --key my-secret-key
# Access Secrets
secret_value = dbutils.secrets.get(scope="my-scope", key="my-secret-key")
print(f"My secret value is: {secret_value}")