02 - Transaction Log

Transaction Log

Transaction logs are a fundamental component of Delta Lake, providing the foundation for its ACID (Atomicity, Consistency, Isolation, Durability) transactions and enabling reliable data management. Transaction logs in Delta Lake are a series of JSON files that record every change made to a Delta Lake table. These logs maintain a chronological history of all transactions, which allows Delta Lake to provide ACID transaction guarantees and enable features like time travel and data versioning.

  • Databricks automatically creates Parquet checkpoint files every 10 commits to accelerate the resolution of the current table state.

Structure of Transaction Logs

Delta Log Directory Located in the _delta_log directory within the table’s storage location, this directory contains all transaction log files.

JSON Files: Each transaction is recorded as a JSON file, named sequentially (e.g., 00000000000000000010.json). These files contain metadata about the transaction, such as the operation type, affected files, and schema changes.

Checkpoint Files: To improve performance, Delta Lake periodically creates Parquet checkpoint files that summarize the state of the table at a particular version. These files allow Delta Lake to quickly reconstruct the table state without reading all JSON files.

  • Delta Lake captures statistics in the transaction log for each added data file

delta_lake_statistics

alt text