You're reading for free via Ayush Gupta's Friend Link. Become a member to access the best of Medium.

Member-only story

Write-Ahead Log (WAL): What and How it works?

How does WAL provide resiliency to Kafka, Cassandra & Zookeeper?

Ayush Gupta
4 min readSep 25, 2024
Photo by Joshua Sortino on Unsplash

Write-ahead logging is a key principle in distributed systems. It is extensively used in databases, consensus algorithms, and event-streaming platforms.

Its role is to guarantee data durability and recovery during system crashes or failures.

It ensures that state changes in data are safely preserved by logging them as commands in an append-only format, rather than requiring the entire data structure to be saved to disc.

Not a Medium Member, read this article for free using friend link
https://ayushgupta2959.medium.com/write-ahead-log-wal-the-backbone-of-data-durability-426264ed5eb8?source=friends_link&sk=98c4a8008498b7e23aa82c6543dcb737

This article will explore what problems Write-Ahead Logging addresses, how it works, and real-world systems that leverage it, including ZooKeeper, etcd(Raft), Kafka, and Cassandra.

What Problem Does WAL Solve?

In distributed systems, multiple nodes or servers must maintain a consistent state. However, they often encounter challenges like system crashes, power outages, or network failures.

In such situations, there is a risk of losing or corrupting data if changes are applied partially or out of order.

WAL addresses this problem by logging every change before it is committed to the database.

This ensures that even in the event of a crash, the system can replay the log and apply the changes in the correct order, preserving the consistency of the system state.

Key features of WAL:

  1. Durability: Changes are first written to a persistent log, ensuring that even in the event of a crash, the changes can be recovered.
  2. Consistency: The system guarantees that changes are applied in the same order they were logged.
  3. Recovery: After a failure, the WAL is replayed to restore the system to its pre-crash state.

How WAL Works

In most systems, WAL works by following a two-step process for any data modification:

  1. Log the Change: Before a change is applied, it is written to the log. This ensures that the modification is captured before the system’s state is altered.
  2. Commit the Change: Once the change is safely logged, it can be applied to the main system (database, key-value store, or file system).

If the system crashes before committing the change, the WAL can be replayed to apply the logged changes. If a crash happens after the commit but before the system’s acknowledgment, the WAL ensures no partial writes are lost.

How ZooKeeper utilizes WAL

The Zab (ZooKeeper Atomic Broadcast) consensus algorithm in Apache ZooKeeper utilizes Write-Ahead Logging (WAL) to ensure reliable and ordered message delivery among nodes, which is essential for maintaining consistency in a distributed environment.

When a client initiates a write operation, the leader node first logs this change in the WAL before applying it to its local state.

This logging ensures that if the leader fails before the operation can be fully propagated, the system can recover by replaying the WAL, thus preserving data integrity.

Following the logging, the leader broadcasts the change to follower nodes, which must acknowledge receipt of the update.

Once a quorum of followers confirms receipt, the leader commits the change, ensuring that all nodes have a consistent view of the data.

This combination of WAL and the Zab protocol allows ZooKeeper to effectively manage leader elections and state synchronization, providing fault tolerance and reliability in distributed applications.

Check out how this log is implemented in Zookeeper here.

Kafka as a Write-Ahead Log

Kafka, a distributed event-streaming platform, uses a log-based storage system that operates similarly to WAL in databases.

Kafka’s commit log ensures that messages are written to a persistent log file before they are consumed or processed. This guarantees that messages are not lost even in the case of broker failures. Its implementation can be found here.

Kafka’s log structure also allows consumers to read messages at different offsets, ensuring high availability and durability.

Messages remain in Kafka’s log for a specified retention period, allowing recovery and reprocessing of data if needed.

Cassandra’s Commit Log

In the NoSQL database Cassandra, WAL is referred to as the Commit Log. Every write operation in Cassandra is first written to this log before being committed to the in-memory data structure (memtable).

The commit log ensures that, even if a system failure occurs immediately after a write, the data can still be recovered, preventing data loss. When the system restarts, the commit log is replayed, and any pending writes are applied to the memtable.

Cassandra’s commit log implementation.

Key Considerations for WAL

Durability vs. Performance Trade-off

Flushing each log entry immediately to disk ensures strong durability but can slow down performance. Delayed or asynchronous flushing improves performance but risks data loss if the system crashes before flushing.

One can use batching techniques to limit the negative impact of frequent flushes and improve performance. Cassandra has an option to use batching.

Corruption Detection

To avoid issues with corrupted log files, log entries are often written with CRC checks, allowing verification when logs are read.

Handling Duplicate Log Entries

In cases of client communication failures and retries, WAL may have duplicate entries. Systems must either use idempotent operations (e.g., HashMap updates) or track requests with unique identifiers to avoid applying duplicates.

If you enjoyed reading this article, don’t hesitate to show some love with those 10 claps! 👏

I’d also love to hear from you — what are the other systems you are aware of which utilize WAL? Drop a comment below, and feel free to share any suggestions on what topics you’d like to see next.

Ayush Gupta
Ayush Gupta

Written by Ayush Gupta

Generalist || Sharing what I know || Software Engineering || AI || Game Theory || Business

Responses (3)

Write a response

nice please read my stories and visit my profile

Nice one..plz hv a look to my stories

Visit my profile also