You're reading for free via Ayush Gupta's Friend Link. Become a member to access the best of Medium.

Member-only story

Boosting MongoDB Availability with Arbiters

Learn about the role of arbiters in replica-set, PSA Architecture, and more

Ayush Gupta
4 min readSep 26, 2024

Not a Medium Member, read this article here.

A MongoDB Arbiter is a special type of mongod instance used in a replica set.

Its role is to provide a vote during elections to determine which node becomes the primary (i.e., the node responsible for handling writes) when there is a failure.

This feature is crucial in cases where having an additional data-bearing node is not feasible due to resource constraints​

Why is it Required?

Arbiters are used in specific scenarios where you need high availability but cannot afford to maintain an additional replica node due to cost or hardware limitations.

In MongoDB, a majority of nodes must be available for a replica set to elect a primary.

An arbiter participates in elections for primary but an arbiter cannot become a primary. An arbiter has exactly 1 election vote.

With a Replica Set of 2 data nodes, losing 1 server brings you below your voting minimum (which is “greater than N/2”). An arbiter solves this.

With a Replica Set of even-numbered data nodes, adding an Arbiter increases your fault tolerance by 1 without making it possible to have 2 voting clusters due to a split.

Thus, it maintains a quorum (majority voting).

PSA Architecture

PSA Architecture

In a primary-secondary-arbiter (PSA) architecture, the primary node handles the writes, while the secondary node stores a replica of the data. The arbiter doesn’t store data but helps decide which node should become the primary during an election.

This architecture provides only one complete copy of the data. As arbiter does not hold any data, it requires fewer resources, but it also reduces redundancy and fault tolerance of the system.

However, this architecture ensures that the replica set remains available even if one among primary or secondary fails.

What Happens When a Node Lags?

When one of the data-bearing nodes (either the primary or secondary) is down or lagging, it can cause some serious issues:

  1. If the primary node fails, the secondary node is promoted to the primary. However, if the secondary is also lagging (i.e., it can’t keep up with the incoming changes from the primary), the data replication process slows down.
  2. Writes with “w:1” (acknowledged by one node) will still succeed in this state, but any write that requires write concerns as “majority” (acknowledgment from multiple nodes) will start to fail because there aren’t enough data-bearing nodes to meet the majority requirement.

Impact of a Lagged Commit Point

Now, when the system can’t commit writes to the majority, a lagged commit point forms. This happens when there are changes made by the primary node that haven’t been fully replicated to the secondary node.

  1. Increased Storage Activity: MongoDB uses the storage engine (e.g., WiredTiger) to handle these changes. The system retains every change made after the last successful majority write. This history of changes takes up disk space and requires extra I/O (Input/Output) operations, which can significantly slow down your write operations.
  2. Cache Pressure: The increased I/O puts pressure on the system’s cache. When MongoDB’s cache gets overloaded with too much data to track, it can affect performance across the whole database.

To avoid losing critical data due to this lag, MongoDB allows its oplog (a log of changes in the replica set) to grow beyond its predefined size. The oplog is crucial because it’s used to replicate changes from the primary to the secondary node. However, expanding it adds additional strain to the system:

  • Oplog Growth: As the oplog grows, it consumes more disk space. This can cause performance degradation, especially if the server has limited resources.
  • System Strain: The continuous expansion and management of this growing oplog adds stress to both the disk and overall system performance, making it harder to maintain a smooth workflow for operations.

How to mitigate this issue is beyond the scope of this article so check it out here.

Do not use Multiple Arbiters

When setting up a MongoDB replica set, it’s crucial to use only one arbiter to avoid potential issues with data consistency.

Let’s consider a replica set setup, which uses multiple arbiters and has write concerns as “majority”.

If a secondary node falls behind the primary, and the cluster is reconfigured, votes from multiple arbiters can elect the node that had fallen behind. The new primary will not have the unreplicated writes even though the writes could have been majority committed by the old configuration. The result is data loss.

Also, consider if a primary node fails, the setup still requires a majority of nodes to acknowledge the write operation. Although the arbiters do not store data, they do contribute to the number of nodes.

When a replica set has multiple arbiters it is less likely that a majority of data bearing nodes will be available after a node failure.

To avoid this scenario, use at most a single arbiter.

Important: Do not run an arbiter on systems that also host the primary or the secondary members of the replica set.

If you enjoyed learning about MongoDB arbiters, don’t forget to show your appreciation with some claps! 👏

Interested in reading fun stuff about MongoDB, check out this article >>

Ayush Gupta
Ayush Gupta

Written by Ayush Gupta

Generalist || Sharing what I know || Software Engineering || AI || Game Theory || Business

Responses (1)

Write a response