<?xml version="1.0" encoding="utf-8" standalone="yes"?><?xml-stylesheet type="text/xsl" href="/feed.xsl"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Distributed-Systems · paper&amp;ink</title><link>https://bananameatpatty.vercel.app/tags/distributed-systems/</link><description>Recent content on paper&amp;ink</description><generator>Hugo</generator><language>en</language><lastBuildDate>Wed, 01 Jul 2026 11:38:07 +0000</lastBuildDate><atom:link href="https://bananameatpatty.vercel.app/tags/distributed-systems/index.xml" rel="self" type="application/rss+xml"/><item><title>The Chandra–Toueg Algorithm</title><link>https://bananameatpatty.vercel.app/blog/chandra-toueg/</link><pubDate>Wed, 17 Jun 2026 00:00:00 +0000</pubDate><guid>https://bananameatpatty.vercel.app/blog/chandra-toueg/</guid><category>distributed-systems</category><category>consensus</category><category>failure-detectors</category><category>fault-tolerance</category><description>&lt;p&gt;Imagine five friends trying to pick &lt;strong&gt;one&lt;/strong&gt; restaurant over a group chat. People
drop offline mid-conversation. Messages arrive late. Nobody can tell whether
Nandana &lt;em&gt;left&lt;/em&gt; or is just &lt;em&gt;slow to type&lt;/em&gt;. And yet — somehow — the group has to end
up at &lt;strong&gt;exactly one&lt;/strong&gt; restaurant, with nobody walking into a different door.&lt;/p&gt;
&lt;p&gt;That, stripped of the food, is the &lt;strong&gt;consensus problem&lt;/strong&gt;. The Chandra–Toueg
algorithm (1996) is a famous, elegant way to actually solve it when computers can
crash. It shares its core ideas — a leader, majority quorums, &amp;ldquo;freshest value
wins&amp;rdquo; — with &lt;strong&gt;Paxos&lt;/strong&gt; and &lt;strong&gt;Raft&lt;/strong&gt;, the protocols running inside basically every
serious distributed database today. (Paxos was developed independently around the
same time, not derived from it — they&amp;rsquo;re siblings, not parent and child.)&lt;/p&gt;</description><content:encoded><![CDATA[<p>Imagine five friends trying to pick <strong>one</strong> restaurant over a group chat. People
drop offline mid-conversation. Messages arrive late. Nobody can tell whether
Nandana <em>left</em> or is just <em>slow to type</em>. And yet — somehow — the group has to end
up at <strong>exactly one</strong> restaurant, with nobody walking into a different door.</p>
<p>That, stripped of the food, is the <strong>consensus problem</strong>. The Chandra–Toueg
algorithm (1996) is a famous, elegant way to actually solve it when computers can
crash. It shares its core ideas — a leader, majority quorums, &ldquo;freshest value
wins&rdquo; — with <strong>Paxos</strong> and <strong>Raft</strong>, the protocols running inside basically every
serious distributed database today. (Paxos was developed independently around the
same time, not derived from it — they&rsquo;re siblings, not parent and child.)</p>
<div class="note"><b class="lbl">In one sentence</b><p>Chandra–Toueg reaches agreement among crash-prone computers by bolting on a "<b>maybe-they're-dead</b>" detector and a <b>rotating chairperson</b>, while a <b>majority-vote</b> rule guarantees nobody ever decides the wrong thing.</p></div>
<hr>
<h2 id="1-why-agreement-is-secretly-hard">1. Why agreement is secretly hard</h2>
<p>There&rsquo;s a famous result — the <strong>FLP impossibility</strong> (Fischer, Lynch, Paterson) —
that says something brutal:</p>
<blockquote>
<p>In a purely asynchronous system where even <strong>one</strong> computer can crash, there is
<strong>no</strong> algorithm that is guaranteed to reach agreement <em>and</em> always finish.</p>
</blockquote>
<p>ELI5 version: in the group chat, if Nandana goes silent, you genuinely <strong>cannot
tell</strong> whether she crashed or is just lagging. If you wait for her, you might wait
forever (she crashed). If you give up on her, she might&rsquo;ve been about to send the
deciding vote (she was just slow). Either choice can go wrong. There&rsquo;s no perfect
move. That&rsquo;s FLP.</p>
<p>So how does anything work in real life? We <strong>cheat — legally.</strong></p>
<hr>
<h2 id="2-the-cheat-a-failure-detector-s">2. The cheat: a failure detector (◊S)</h2>
<p>Instead of trying to answer &ldquo;dead or slow?&rdquo; <em>inside</em> the agreement logic, we hand
that messy job to a separate gadget called a <strong>failure detector</strong>. Think of it as
a slightly unreliable <strong>rumor mill</strong>: each computer asks it &ldquo;who do you <em>suspect</em>
is down?&rdquo; and gets back a list of names.</p>
<p>The detector is allowed to be <strong>wrong</strong> — but only in two carefully limited ways.
The version Chandra–Toueg needs is called <strong>◊S</strong> (&ldquo;eventually strong&rdquo;):</p>
<ul>
<li><strong>Strong completeness</strong> — <em>every truly-crashed computer is eventually suspected
by everyone, forever.</em> → A dead node can&rsquo;t hide; we&rsquo;ll never wait on a corpse
indefinitely.</li>
<li><strong>Eventual weak accuracy</strong> — <em>eventually, at least one still-alive computer
stops being wrongly suspected by anyone.</em> → Eventually there&rsquo;s someone everyone
trusts, so progress becomes possible.</li>
</ul>
<div class="note"><b class="lbl">What the ◊ ("diamond") means</b><p>"<b>Eventually, and forever after.</b>" Before some unknown moment T, the rumor mill can be a disaster — accusing healthy nodes, missing dead ones, flip-flopping. After T, the two guarantees kick in. The whole trick is: the algorithm stays <b>correct</b> during the chaos, and only needs the calm period to <b>finish</b>.</p></div>
<p>◊S is <em>equivalent</em> to <strong>◊W</strong> and to <strong>Ω</strong> (the &ldquo;eventually one stable leader&rdquo;
oracle) when channels are reliable — and ◊W/Ω is the <strong>provably weakest</strong> failure
detector that can solve consensus, <em>provided a majority of nodes stay correct</em>. In
practice you implement ◊S with <strong>timeouts</strong> — if you haven&rsquo;t heard a heartbeat in a
while, you &ldquo;suspect.&rdquo; (Raft&rsquo;s election timeout is literally this.)</p>
<hr>
<h2 id="3-test-it--why-a-majority-always-overlaps">3. Test it — why a majority always overlaps</h2>
<p>Before the algorithm, one idea you have to <em>feel</em> in your bones: any two groups
that each contain <strong>more than half</strong> the nodes <strong>must share at least one member</strong>.
That shared member is how a decision survives from one round to the next.</p>
<p>Drag the slider. Quorum A is the left group, Quorum B is the right group — both
are majorities. Watch the dark <strong>overlap</strong> node that they&rsquo;re forced to share.</p>
<p>▶ <em>Interactive demo</em> — feed readers can't run it. <a href="https://bananameatpatty.vercel.app/blog/chandra-toueg/">Open it on the site →</a></p>
<div class="note"><b class="lbl">Why this matters</b><p>If a value was locked in by a majority in some round, then <b>any</b> later majority is guaranteed to include someone who remembers it. That single overlapping memory is what makes two conflicting decisions impossible.</p></div>
<hr>
<h2 id="4-the-plan-a-rotating-chairperson">4. The plan: a rotating chairperson</h2>
<p>The algorithm runs in numbered <strong>rounds</strong>. Each round has one pre-assigned
<strong>coordinator</strong> (the chairperson), chosen round-robin:</p>
<pre tabindex="0"><code>coordinator(round) = round mod N      // 0,1,2,...,N-1,0,1,2,...
</code></pre><p>The chairperson&rsquo;s job each round: collect everyone&rsquo;s current vote, pick one value,
and try to get a <strong>majority</strong> to lock it in. If that works, everyone agrees and
we&rsquo;re done. If the chairperson crashes — or the rumor mill <em>suspects</em> it — the
others give up on this round and rotate to the <strong>next</strong> chairperson.</p>
<p>Because the detector is <em>eventually accurate</em>, the rotation will <strong>eventually</strong>
land on a chairperson that nobody wrongly suspects. That round finishes. Done.</p>
<h3 id="the-four-phases-of-a-round-eli5">The four phases of a round (ELI5)</h3>
<ol>
<li><strong>Everyone → chairperson:</strong> &ldquo;here&rsquo;s my current vote, and how recently I changed
it&rdquo; (the <em>timestamp</em>).</li>
<li><strong>Chairperson decides what to propose:</strong> it waits to hear from a majority, then
picks the vote with the <strong>most recent timestamp</strong> and announces &ldquo;let&rsquo;s all go
with <em>this</em>.&rdquo;</li>
<li><strong>Everyone replies:</strong> each node either <em>adopts</em> the proposal and says <strong>ACK</strong>,
<strong>or</strong> — if the rumor mill says the chairperson looks dead — it says <strong>NACK</strong> and
bails out of the round.</li>
<li><strong>Chairperson tallies:</strong> if a <strong>majority</strong> said ACK, the value is locked → it
shouts <strong>DECIDE!</strong> and everyone who hears it commits and stops. Otherwise, no
decision; rotate to the next round.</li>
</ol>
<div class="note"><b class="lbl">The one rule that keeps it safe</b><p>"<b>Most recent timestamp wins.</b>" If some value was ever locked by a majority, it carries the freshest timestamp, so every future chairperson is <b>forced</b> to re-propose it. Combined with the overlap from §3, this is why the group can never split into two different answers. (Paxos calls this exact rule "pick the value of the highest-numbered accepted proposal.")</p></div>
<hr>
<h2 id="5-test-it--run-the-consensus-yourself">5. Test it — run the consensus yourself</h2>
<p>Below is a live, simplified Chandra–Toueg cluster. Each card is a node with a
<strong>vote</strong> (0 or 1) and a timestamp (<code>ts</code>). The ★ marks the current round&rsquo;s
chairperson.</p>
<ul>
<li><strong>Run round</strong> → advance one round of the rotating chairperson.</li>
<li><strong>Click a node</strong> → crash or revive it (a corpse, or a node that came back).</li>
<li><strong>Click a node&rsquo;s value chip</strong> → flip its vote (before anyone decides).</li>
<li><strong>&ldquo;falsely suspect&rdquo;</strong> → simulate the rumor mill <em>lying</em> about a healthy
chairperson for one round, so you can watch a round get <strong>wasted</strong> without the
cluster ever deciding something wrong.</li>
</ul>
<p>Try this: crash the node about to be chairperson, hit <strong>Run round</strong>, and watch it
rotate past the corpse. Or crash a majority and see it correctly <strong>refuse</strong> to
decide.</p>
<p>▶ <em>Interactive demo</em> — feed readers can't run it. <a href="https://bananameatpatty.vercel.app/blog/chandra-toueg/">Open it on the site →</a></p>
<div class="note"><b class="lbl">Honest fine print</b><p>This widget collapses the four message-passing phases into one "Run round" click and decides for all live nodes when an un-suspected chairperson has a live majority. It's faithful to the <b>mechanics that matter</b> — rotation, suspicion, majority, timestamps — not to every wire message.</p></div>
<hr>
<h2 id="6-why-its-always-safe-even-when-the-rumor-mill-lies">6. Why it&rsquo;s always <em>safe</em> (even when the rumor mill lies)</h2>
<p>Safety = &ldquo;<strong>never two different answers, never an invented one.</strong>&rdquo; This holds
<strong>no matter how badly ◊S misbehaves.</strong></p>
<ul>
<li><strong>Agreement:</strong> once value <code>v</code> is locked by a majority in some round, §3&rsquo;s overlap
means every later chairperson sees <code>v</code> carrying the freshest timestamp, and §4&rsquo;s
rule forces it to re-propose <code>v</code>. No second value can ever win.</li>
<li><strong>Validity:</strong> the proposed value is always <em>someone&rsquo;s actual starting vote</em> —
nothing is fabricated.</li>
</ul>
<p>A lying detector can cause <strong>wasted rounds and false alarms</strong>, but it can <strong>never</strong>
make two nodes decide differently. <em>The detector affects only how fast we finish,
never whether we&rsquo;re correct.</em> (You can prove this to yourself in §5: tick &ldquo;falsely
suspect&rdquo; as many times as you like — rounds get burned, but the eventual decision
is still single and valid.)</p>
<hr>
<h2 id="7-why-it-eventually-finishes-liveness">7. Why it eventually <em>finishes</em> (liveness)</h2>
<p>Termination needs <strong>both</strong> ◊S guarantees working together:</p>
<ul>
<li><strong>Strong completeness</strong> → a crashed chairperson is eventually suspected, so no
round hangs forever waiting on a dead node. The cluster always moves on.</li>
<li><strong>Eventual weak accuracy</strong> → eventually some live node stops being wrongly
suspected. When the rotation reaches <em>that</em> node&rsquo;s turn, nobody bails, a majority
ACKs, and <strong>DECIDE</strong> fires.</li>
</ul>
<div class="note"><b class="lbl">FLP still bites — but only briefly</b><p>Before the "eventually" kicks in, nodes can rotate through chairpersons forever, suspecting healthy ones, deciding nothing. <b>That stretch is FLP, still completely true.</b> The failure detector doesn't repeal the impossibility — it just guarantees the bad stretch is <b>finite</b>, after which finishing is assured.</p></div>
<hr>
<h2 id="8-the-family-resemblance-paxos--raft">8. The family resemblance: Paxos &amp; Raft</h2>
<p>Chandra–Toueg is the theory; Paxos and Raft are what you actually deploy. They&rsquo;re
the same idea in different clothes:</p>
<table>
	<thead>
			<tr>
					<th>Chandra–Toueg</th>
					<th>Paxos</th>
					<th>Raft</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>Round + chairperson</td>
					<td>Ballot + proposer</td>
					<td>Term + leader</td>
			</tr>
			<tr>
					<td>Rotating coordinator</td>
					<td>Proposer election</td>
					<td>Leader election</td>
			</tr>
			<tr>
					<td>◊S / Ω detector</td>
					<td>Implicit leader oracle</td>
					<td><strong>Election timeout</strong></td>
			</tr>
			<tr>
					<td>&ldquo;Freshest timestamp wins&rdquo;</td>
					<td>&ldquo;Highest accepted ballot wins&rdquo;</td>
					<td>Log/term comparison</td>
			</tr>
			<tr>
					<td>Majority ACK locks value</td>
					<td>Majority accept</td>
					<td>Majority replication</td>
			</tr>
			<tr>
					<td>Needs a correct majority</td>
					<td><code>f &lt; n/2</code></td>
					<td><code>f &lt; n/2</code></td>
			</tr>
	</tbody>
</table>
<div class="note"><b class="lbl">Raft's timeout <em>is</em> a failure detector</b><p>A Raft follower that hears no heartbeat within its randomized timeout "suspects" the leader and starts an election. That timeout <b>is</b> the ◊S detector, implemented in the crudest, most practical way possible. "Failure detectors" and "partial synchrony" turn out to be the same escape hatch wearing different outfits.</p></div>
<hr>
<h2 id="9-tldr">9. TL;DR</h2>
<ul>
<li>Solves consensus despite crashes by adding a <strong>◊S &ldquo;maybe-dead&rdquo; detector</strong> to an
almost-asynchronous system.</li>
<li>A <strong>rotating chairperson</strong> drives rounds; nodes abandon a round the moment the
detector suspects the chairperson.</li>
<li><strong>Majority quorums</strong> + <strong>freshest-timestamp-wins</strong> make <strong>safety unconditional</strong> —
no two different answers, ever, no matter how much the detector lies.</li>
<li><strong>Finishing is only <em>eventual</em></strong> — guaranteed once the detector calms down. Until
then FLP still applies, but for a <em>finite</em> time.</li>
<li>Needs a <strong>correct majority</strong> (<code>f &lt; n/2</code>), crash-only failures, reliable channels.</li>
<li>It shares the same core machinery as <strong>Paxos</strong> and <strong>Raft</strong> (developed
independently, same era) — leader, quorums, freshest-value-wins.</li>
</ul>


]]></content:encoded></item></channel></rss>