PDC Visual Cheat Sheet

Spring 2026 · ESE · Dr. Khuram Shahzad

Scope W02 · W03 (P2P only) · W04–W07 · W10–W14

Pattern 4 × 25 marks → Q1 MCQ · Q2-Q4 multi-part theory + cases + activities

W02 Distributed Architectures

Software Architecture

Logical organization — components, interfaces, how they connect & communicate.

System Architecture

Physical realization — which component runs on which machine.

4 Architectural Styles

Layered

Stacked layers, only adjacent talk. OSI, TCP/IP

Object-based

Objects + interfaces + encapsulation. Client-server

Data-centered

Shared data store; passive DB or active blackboard.

Event-based

Pub/Sub event propagation. Kafka

System Architectures

Centralized

Client-server, vertical hierarchy, request-reply. LAN=connectionless; WAN=TCP/IP.

Decentralized

P2P, horizontal, symmetric. Every node = servant.

Hybrid

Mixes C/S + P2P. Edge servers, BitTorrent

Delivery Semantics & Idempotency

SemanticMeaning
At-most-once0 or 1 delivery — may LOSE
At-least-once1+ delivery — may DUPLICATE
Exactly-onceExactly 1 — no loss, no dup

Idempotent = safe to repeat (e.g., "return X"). Non-idempotent = "increment X", "charge card".

🔗 Chord DHT (Quiz 1 favorite!)

m-bit ring. Key k stored at successor(k) = smallest node id ≥ k. Finger table has m rows.

finger[i] = successor(n + 2i−1) mod 2m

Lookup: O(log N). Doubling network adds 1 row.

Asymmetric Distance

dist(A,B) = (B − A) mod 2m

m=4, A=2, B=15: dist(A,B)=13; dist(B,A)=2+16−15=3

N1 N8 N14 N21 N32 N42 N48 N51
Chord ring with N8's fingers → N14, N21, N32, N42

N8 Finger Table (m=6)

iN8 + 2i−1successor
19N14
210N14
312N14
416N21
524N32
640N42

Other W02 Concepts

Multi-tier

Two-tier: thin-client vs fat-client. Three-tier: Presentation / Logic / Data.

Vertical vs Horizontal

Vertical = split layers across machines. Horizontal = replicate same layer (load balance).

Blockchain

Append-only chain, immutable blocks, massive replication. Need consensus on who appends.

Cloud Layers

Hardware → Infrastructure (VMs) → Platform (S3) → Application.

W03 P2P Systems

Common primitives: Join · Publish · Search · Fetch

The Big 5 — Compare & Contrast

SystemArchitectureSearchDecentralizationKey Innovation
NapsterCentralized indexCentral serverNoneEasy UI
SkypeHybrid + supernodesSupernode lookupPartialNAT traversal
GnutellaPure P2PQuery floodingFullNo central server
KaZaAHybrid + supernodesSupernode-routedPartialFastTrack supernodes
BitTorrentHybrid (tracker/DHT)External/DHTHighSwarming + tit-for-tat

🌊 P2P Architecture Visual

Napster (Centralized) Server Gnutella (Pure P2P) KaZaA (Supernodes) SN SN BitTorrent (Swarm) Tracker S L L L L

Gnutella — 5 Message Types

Ping → probe network
Pong → reply with IP/port
Query → flood search
QueryHit → reverse-path response
Push → download from firewalled peer
TTL caps flooding (e.g., TTL=7)

⚠️ Flooding Explosion

With TTL=7 and b=5 neighbors per peer: 5 + 5·4 + 5·4² + … + 5·4⁶ messages — exponential. This is why KaZaA introduced supernodes and BitTorrent uses swarming.

BitTorrent Vocabulary

TermMeaning
Torrent fileMetadata + tracker URL
TrackerCoordinates peers in swarm (optional with DHT)
SeederPeer with complete file
LeecherPeer still downloading
SwarmPeers sharing same file
Tit-for-tatUpload fairness; rewards uploaders

W04 Time & Clocks

Cristian's Algorithm (Physical, Passive Server)

1A sends request at T1
2B receives at T2, sends reply at T3
3A receives at T4
delay = ((T2−T1) + (T4−T3)) / 2

NTP: repeats 8 times, takes min-delay sample for best estimate.

A (client) B (server) T1 T2 T3 T4

Berkeley Algorithm

No machine has accurate clock. Master polls slaves → averages → sends OFFSETS (not absolute times) back. Sending offsets avoids fresh RTT uncertainty at slave.

⏱️ Lamport's Logical Clocks

Happens-Before (→)

  1. Same process, a before b → a → b
  2. a = send, b = matching receive → a → b
  3. Transitive: a→b, b→c ⇒ a→c

Concurrent (a ‖ b): neither a→b nor b→a.

Update Rules

1Before any event: Cᵢ = Cᵢ + 1
2On send: ts(m) = Cᵢ
3On receive: Cⱼ = max(Cⱼ, ts(m)) + 1
P1 P2 P3 1 2 3 4 5 6 7
Lamport timeline. P1 sends m1 (ts=2) → P2: max(0,2)+1=3 → send (4) → P3 receive (5) → send → P1 receive max(2,6)+1=7

⚠️ Lamport Limitation

If a → b then C(a) < C(b)but the converse does NOT hold. C(a) < C(b) doesn't prove a→b — could be concurrent.

📍 Vector Clocks

Each process keeps VCᵢ[1..n]. VCᵢ[i] = own events; VCᵢ[j] = what i knows about j.

Rules

1Before event: VCᵢ[i]++
2Send: piggyback VCᵢ
3Receive m: VCⱼ[k] = max(VCⱼ[k], ts(m)[k]) ∀k, then VCⱼ[j]++
a → b ⟺ VC(a) < VC(b)
Concurrent ⟺ neither dominates
P1 P2 P3 (1,0,0) (2,0,0) (2,1,0) (2,2,0) (0,0,1) (2,2,2)
Vector clock trace. (2,0,0) vs (0,0,1) → concurrent (neither dominates)

Matrix Clocks

n×n matrix per process. M[i][*] = own vector clock. M[j][*] = what i knows about j's vector clock. Used when you need to know what others know (e.g., log GC).

W05 Synchronization Algorithms

Totally-Ordered Multicast

🏦 Bank Problem

$1000 in NY & SF. NY adds 1%, SF adds $100. Without ordering → NY: $1111, SF: $1110. Need same order at every replica.

Algorithm

1Sender timestamps update with Lamport time, multicasts (incl. self)
2Receiver puts in local queue sorted by timestamp
3Receiver multicasts ACK
4Deliver only when: at head of queue AND all processes have ACKed

Ties broken by process ID. This guarantees same delivery order at every node.

Causality vs Total Order

Causality

a→b means a may affect b

Partial Order

Only causally-related events ordered

Total Order

ALL events ordered (concurrent via tie-break)

Schiper-Eggli-Sandoz (SES)

Causal ordering with unicast only (no broadcast needed).

Each message carries a vector of "what sender has sent to each process".

Trade-off: fewer messages, but bigger messages + more state. Clock increments only on receive.

Matrix Clocks (recap)

Needed when application requires knowing what other processes know — e.g., garbage-collect message log entries that everyone has already seen.

W06 Mutual Exclusion & Election

3 Required Properties

Safety

≤1 process in CS at a time

Liveness

Every request eventually granted

Fairness

Requests served in arrival order (logical clock)

📊 Mutual Exclusion Algorithms — Compared

AlgorithmTypeMessages/CSNotes
CentralizedCoordinator3SPOF
LamportPermission3(N−1)REQUEST + REPLY + RELEASE
Ricart-AgrawalaPermission2(N−1)No separate release
Token RingToken1 to ∞No starvation
Suzuki-KasamiToken+broadcast0 or NCounter-based
Decentralized (voting)Quorum2m·NPossible starvation

Lamport's Mutex

3 message types: REQUEST, REPLY, RELEASE. Channels must be FIFO.

Enter CS when:

  • L1: Received a msg with TS > my request TS from all other sites
  • L2: My request is at head of RQᵢ (sorted by TS)

Exit: Remove from own RQ, broadcast RELEASE → others remove it.

Total cost: 3(N−1) messages per CS

Ricart-Agrawala (improved Lamport)

2 message types: REQUEST, REPLY.

On receiving REQUEST <Tⱼ, Pⱼ> at Pᵢ:

  • If state = Held → queue it
  • If state = Wanted AND (Tᵢ,i) < (Tⱼ,j) → queue it
  • Else → send REPLY immediately

Enter CS when received REPLY from ALL N−1 others. On exit → REPLY to queued.

Total cost: 2(N−1) messages per CS

Suzuki-Kasami (Token-based)

Each site has request counter RNᵢ[]; token holds LN[] (last completed per site).

Request: broadcast REQUEST(i, n) with n = RNᵢ[i]+1.

Token holder sends token to Pⱼ if RN[j] = LN[j]+1.


🗳️ Election Algorithms

Bully Algorithm

1P notices coordinator dead
2P sends ELECTION to all higher-ID processes
3If none reply → P wins, broadcasts COORDINATOR
4If higher-ID replies → it takes over

The biggest ID always wins.

P1 P2 P3 P4 P5💀 P2 sends ELECTION to higher IDs P4 ultimately wins ← highest alive

Ring Algorithm

Processes in a logical ring. P notices failure → builds ELECTION message with own ID, sends to successor. Each node appends its ID, forwards. When message returns to originator → pick max ID → recirculate as COORDINATOR.

Chang-Roberts (uniform, N unknown)

1 2 3 4 5 6 7 8 Node 8 receives own ID back → leader
Each node sends ID left. Forward if > own, drop if < own, win if you get your own back. O(N²) worst case.

Hirschberg-Sinclair (bidirectional ring)

Phase r: probe left & right to distance 2ʳ. Survive if largest in 2ʳ-neighborhood. Messages: O(N log N).

W07 Global State & Chandy-Lamport Snapshot

Global state = state of every process + state of every channel (msgs in transit).

Uses

Checkpoint

Restart on failure

GC

Find orphaned objects

Deadlock

Detect cycles

Termination

Detect job done

Assumptions (Chandy-Lamport)

  • Channels are FIFO
  • No process or channel failures
  • Messages arrive intact, no duplication

🎯 Algorithm

Initiator Pᵢ

  1. Records own state Sᵢ
  2. Sends Marker on each outgoing channel
  3. Starts recording incoming msgs on every incoming channel

On receiving Marker at Pᵢ from channel C_ki

If first marker:

  • Record own state
  • Mark C_ki = ∅
  • Send Markers on all outgoing channels
  • Start recording on all other incoming channels

Else (already saw a marker):

  • C_ki state = msgs received since recording started
P1 P2 P3 S1 marker S3 S2 All channels recorded → snapshot complete
P1 initiates → markers (dashed) fan out → each Pᵢ records state Sᵢ on first marker and channel content on subsequent markers

✂️ Consistent Cut

A cut is consistent iff: for every event e in the cut, every event f with f→e is also in the cut.

Inconsistent: A receive is in the cut but its send is outside — a message from the "future"!

✅ Chandy-Lamport always produces a consistent cut.

W10 Hadoop & MapReduce

Open-source impl of Google MapReduce (Doug Cutting, 2005). Master-slave, shared-nothing, commodity hardware.

Two-Layer Architecture

Master Node NameNode + JobTracker Slave 1 DataNode TaskTracker Slave 2 DataNode TaskTracker Slave 3 DataNode TaskTracker Slave N DataNode TaskTracker B1 B2 B5 B1 B3 B4 B2 B4 B5 B2 B3 B5 Each block replicated 3× (default)

MapReduce Phases

Input HDFS blocks Map Map Map Shuffle + Sort Reduce Reduce Out <k,v> <k,[v]> <k',v'>

HDFS Quick Facts

NameNode

Master; metadata; file → block → datanode map

DataNode

Slaves; store blocks (64 or 128 MB), replicated

Hadoop vs RDBMS

RDBMSHadoop
SchemaFixed, ACIDSchema-on-read
ModeRead/WriteMostly read
HardwareExpensive serversCommodity
FailuresRareNormal
Unit of workTransactionJob

📈 PageRank

PR(p) = (1−d)/N + d · Σ [ PR(q) / L(q) ] where q ∈ in-links(p), d ≈ 0.85

Damping (d): models random surfer who jumps to random page with prob (1−d). Iterate until convergence.

PageInitIter 1Iter 2Rank
A11.51.5🥇 1
B11.251.375🥈 2
D10.750.625🥉 3
C10.50.54

W11 Paxos — Distributed Consensus

Goal & Properties

Validity

Chosen value must be one that was proposed

Majority

(N+1)/2 votes needed. N = 2m+1 tolerates m failures

3 Roles

Proposer

Receives client req, drives consensus

Acceptor

Votes on proposals

Learner

Announces outcome

In practice, all 3 roles co-exist on each node.

🎯 Two-Phase Protocol

Proposer Acceptors (majority) Learner PHASE 1 Prepare(N) Promise(N, accepted U) PHASE 2 Accept(N, V) Accepted

Phase 1: PREPARE → PROMISE

  1. Proposer picks unique ID N (e.g., counter.pid)
  2. Sends Prepare(N) to majority
  3. Acceptor: if N > all previously seen → return Promise(N, U) where U = highest already-accepted (or none)
  4. Promise = vow to refuse anything < N

Phase 2: ACCEPT → ACCEPTED

  1. If proposer got majority promises:
  2. If any promise carries accepted U → must propose U's value (validity!)
  3. Else propose own value V
  4. Send Accept(N, V) → acceptors confirm
  5. Majority Accepted → consensus reached

🇵🇰 Worked Example: NADRA CNIC Update

5 acceptors R1–R5. P1 (ID=15.1, "Islamabad") vs P2 (ID=15.2, "Karachi"). Higher ID wins promises.

  • P2's Prepare(15.2) arrives → all promise (15.2 > nothing)
  • P1's Prepare(15.1) → rejected (already promised 15.2)
  • P2's Accept(15.2, Karachi) → majority accepts → consensus: Karachi
  • P1 retries with ID=16.1; promises now carry accepted "Karachi" → P1 MUST use Karachi (validity)

R3 crashes: still 4 alive ≥ 3 majority → ✅ tolerated. 3 crashes: only 2 alive < 3 → halt.

W12 Fault Tolerance

Fault → Error → Failure

FAULT Root cause (bug, HW) ERROR Incorrect state FAILURE Cannot deliver service

Fault Types by Duration

Transient

Temporary, vanishes (cosmic ray, glitch)

Intermittent

Random/repeated (loose cable)

Permanent

Stays until repair (dead disk)

Fault Behavior

✅ Fail-silent (fail-stop)

Stops outputting → easy to detect

☠️ Byzantine

Wrong/malicious output → HARD to detect

4 Handling Strategies

StrategyWhat it doesExample
PreventionAvoid the faultBetter HW, careful coding
ToleranceMask the faultRedundancy, voting
RemovalReduce frequencyTesting, patches
ForecastingEstimate futureMonitoring, ML

Redundancy Types

Information

Extra bits (parity, Hamming, ECC)

Time

Retry on failure

Physical

Backup HW/SW (RAID, replicas)

🎯 Triple Modular Redundancy (TMR)

Module A Module B (faulty) Module C Voter
3 identical modules → majority voter → tolerates 1 faulty module

Primary-Backup

Primary handles requests; backup gets heartbeats; on primary failure → backup takes over.

Availability

NinesAnnual downtime
99% (two)~3.65 days
99.9% (three)~8.76 hours
99.99% (four)~52 minutes
99.999% (five)~5.26 minutes

W13 CAP, Consistency, Eventual

ACID (single DB)

Atomic

All-or-nothing

Consistent

Preserves invariants

Isolated

No interference

Durable

Commit = permanent

2PC — Two-Phase Commit

Coordinator Participants Phase 1 PREPARE VOTE (commit/abort) Phase 2 GLOBAL COMMIT ACK

⚠️ Blocking Problem

If coordinator crashes after Phase 1 but before Phase 2 → participants stuck "in-doubt", resources locked. Fix: 3PC or Paxos/Raft.

🔺 CAP Theorem

C (Consistency) A (Availability) P (Partition) CP Banking AP Cassandra CA Single-node

Partitions are inevitable in real networks → you MUST pick P → real choice is C vs A.

CP Banking, ZooKeeper, HBase

Refuse minority side during partition → no split-brain. Safety over uptime.

AP Cassandra, DynamoDB

Accept writes both sides → reconcile later. Uptime over safety.

BASE (alternative to ACID)

Basically Available

Uptime first

Soft state

State may shift without input (bg sync)

Eventual consistency

Replicas eventually converge

Conflict Resolution (Eventual Consistency)

LWW

Last Writer Wins by timestamp → may lose data

Vector Clocks

Detect causality, surface real conflicts

CRDTs

Auto-merge data types (Counter, Set)

⚖️ Quorum Math

N = total replicas, W = write quorum, R = read quorum.

W + R > N ⟺ Strong Consistency

Why? Read set and write set must overlap on ≥1 node → reader sees latest write.

Examples

  • N=3, W=2, R=2: 4 > 3 ✓ strong
  • N=3, W=1, R=1: 2 < 3 ✗ eventual only
  • N=5, W=3, R=3: 6 > 5 ✓ tolerates 2 failures
Write set (W) Read set (R) Overlap ≥1 node sees latest

Summary Table

ProtocolStrengthWeaknessUse Case
2PCStrict ACIDBlockingLocal Tx
Paxos/RaftStrong consistencyLatencyDistributed ledger
EventualMassive scaleDivergenceFeeds, carts
QuorumConfigurableTuning complexNoSQL

W14 Distributed AI — Billion-Parameter Training

💾 Memory Math (memorize!)

Per parameter with Adam optimizer + FP16 training:

ComponentBytes/param30B model175B model
Weights (FP16)260 GB350 GB
Gradients (FP16)260 GB350 GB
Adam states (FP32: master+momentum+variance)12360 GB2.1 TB
Total16~480 GB~2.8 TB
Training FLOPs ≈ 6 · N · D (2 fwd + 4 bwd; N=params, D=tokens)

Interconnect Hierarchy

LinkBandwidthScope
NVLink600 GB/sIntra-node (8 GPUs)
InfiniBand50–100 GB/sInter-node
PCIe Gen4/532–64 GB/sGPU ↔ CPU

🎲 3D Parallelism

Data Parallel replicate model, split batch Full Full Full All-Reduce grads Tensor Parallel split single layer (intra-node) slice 1 slice 2 slice 3 needs NVLink ⚡ Pipeline Parallel split layers (inter-node OK) L1-12 L13-24 L25-36 ⚠ pipeline bubbles 3D Parallelism = DP + TP + PP Megatron-LM, DeepSpeed — industry standard for LLMs

⚡ Topology Rule (Activity 2)

TP must stay intra-node — communicates every layer, needs NVLink 600 GB/s.

PP can span nodes — only sends activations between stages, tolerates InfiniBand 50 GB/s.

NCCL Collective Primitives

PrimitiveWhat it does
All-ReduceSum from all → result to all (core of DP)
BroadcastRank 0 → everyone
Scatter / GatherSplit / collect
Reduce-ScatterReduce, then distribute shards
All-GatherCollect shards from all

FSDP & ZeRO

FSDP (Fully Sharded Data Parallel)

Shards weights + grads + optimizer across GPUs. 1/N memory vs DDP.

  • Forward: All-Gather weights → compute → discard
  • Backward: All-Gather + Reduce-Scatter grads

ZeRO Stages (DeepSpeed)

StageShardsSaving
1Optimizer~4×
2+ Gradients~8×
3+ Parameters~64× on 64 GPUs (= FSDP)

ZeRO-Offload: push to CPU RAM / NVMe → 10× larger models, slower (PCIe latency).

Other Tricks

BF16 / FP16

BF16 = same range as FP32; 2× throughput on Tensor Cores; 50% memory.

Activation Checkpointing

Discard activations → recompute on backward. 33% extra compute for 5× memory.

DDP vs DataParallel

DDP (multi-process, no GIL) overlaps comm + compute. DataParallel is GIL-bound.

Industry Cases

Llama-2 (Meta, 2000 A100)

HW failures certain → frequent checkpoints to NVMe + DCGM monitoring + auto-drain via Slurm.

BLOOM Straggler

20% slower nodes blocked All-Reduce → NCCL_DEBUG profiling → found 5 faulty IB cables → replaced → 100% speed.

⚡ Practice MCQs (Q1 style)

Click each to reveal answer. 30 questions across all topics.

1. Which P2P system uses a centralized index but P2P file transfer?
A) Gnutella · B) BitTorrent · C) Napster · D) Skype
→ C) Napster
2. In Lamport timestamps, if a→b then:
A) C(a) > C(b) · B) C(a) < C(b) · C) C(a) = C(b) · D) Cannot say
→ B) C(a) < C(b)
3. If C(a) < C(b), does a→b necessarily hold?
A) No · B) Yes always · C) Only same process · D) Only on receive
→ A) No — only the converse holds
4. Vector clocks (2,0,0) vs (0,0,1) are:
A) (2,0,0) → (0,0,1) · B) (0,0,1) → (2,0,0) · C) Concurrent · D) Equal
→ C) Concurrent — neither dominates
5. Cristian's delay estimate:
A) T4−T1 · B) ((T2−T1)+(T4−T3))/2 · C) T3−T2 · D) T4−T2
→ B) ((T2−T1)+(T4−T3))/2
6. Berkeley algorithm differs from Cristian's because:
A) Uses GPS · B) Master computes offsets, no accurate clock · C) No master · D) Physical clocks only
→ B) Master computes offsets
7. Ricart-Agrawala messages per CS:
A) 3(N−1) · B) 2(N−1) · C) N · D) N²
→ B) 2(N−1)
8. Lamport's mutex messages per CS:
A) 3(N−1) · B) 2(N−1) · C) N · D) N+1
→ A) 3(N−1)
9. Suzuki-Kasami is:
A) Permission-based · B) Token + broadcast · C) Coordinator-based · D) Ring-based
→ B) Token-based with broadcast requests
10. In the Bully algorithm, who wins?
A) Lowest ID · B) Random · C) Highest live ID · D) Initiator
→ C) Highest live ID
11. Chandy-Lamport requires channels to be:
A) Synchronous · B) FIFO · C) Lossy · D) Bounded
→ B) FIFO
12. The marker message in Chandy-Lamport is used to:
A) Carry application data · B) Separate pre/post-snapshot messages on channel · C) Acknowledge state · D) Elect coordinator
→ B) Separate pre/post-snapshot messages
13. A consistent cut requires:
A) Same wall-clock · B) For every receive in cut, matching send also in cut · C) No msgs in flight · D) Logical clocks match
→ B) For every receive in cut, the matching send is also in cut
14. HDFS default replication factor:
A) 1 · B) 2 · C) 3 · D) 5
→ C) 3
15. Shuffle & Sort happens:
A) Before Map · B) Between Map and Reduce · C) After Reduce · D) Only on failure
→ B) Between Map and Reduce
16. Paxos can tolerate m failures with how many nodes?
A) m · B) m+1 · C) 2m+1 · D) 2m
→ C) 2m+1
17. In Paxos, a proposer whose promises contain accepted values:
A) Picks any value · B) Must use highest-numbered accepted value · C) Aborts · D) Picks own value
→ B) Must use highest-numbered accepted value (validity)
18. CAP theorem — during a partition you must:
A) Give up P · B) Give up both C and A · C) Pick C or A · D) Have all three
→ C) Pick C or A — P is mandatory
19. BASE stands for:
A) Banking, ACID, Strong, Eventual · B) Basically Available, Soft state, Eventual consistency · C) Backup, Atomic, Stable, Exact · D) None
→ B) Basically Available, Soft state, Eventual
20. For strong consistency with quorums:
A) W=R=1 · B) W + R > N · C) W·R = N · D) W = N always
→ B) W + R > N
21. TMR can mask how many faulty modules?
A) 0 · B) 1 · C) 2 · D) 3
→ B) 1
22. Fail-silent faults are:
A) Hard to detect · B) Random outputs · C) Easy to detect — no output · D) Malicious
→ C) Easy to detect — no output
23. Byzantine faults:
A) Stop output · B) Produce arbitrary/incorrect output · C) Always transient · D) Never occur
→ B) Arbitrary/incorrect output
24. Tensor Parallelism is typically confined to:
A) Single GPU · B) Single node (NVLink) · C) Inter-node · D) CPU only
→ B) Single node — needs NVLink bandwidth
25. ZeRO Stage 3 / FSDP shards:
A) Optimizer only · B) Gradients only · C) Weights + grads + optimizer · D) Activations
→ C) All three
26. NCCL primitive used to sum gradients in DDP:
A) Broadcast · B) All-Reduce · C) Scatter · D) Gather
→ B) All-Reduce
27. In Chord with m-bit IDs, lookup complexity:
A) O(1) · B) O(N) · C) O(log N) · D) O(N log N)
→ C) O(log N)
28. Idempotent operation example:
A) Increment X · B) Return value of X · C) Append to log · D) Charge credit card
→ B) Return value of X
29. 2PC blocking problem occurs when:
A) Phase 1 slow · B) Coordinator crashes after Phase 1 → participants in-doubt · C) Participants can't vote · D) Network too fast
→ B) Coordinator crash → participants stuck in-doubt
30. Vector clock receive rule:
A) VC[j]++ · B) VC[k]=max(VC[k], ts(m)[k]) ∀k, then VC[i]++ · C) Reset to 0 · D) Copy ts(m)
→ B) Pairwise max, then increment own index

📝 Worked Cases (Q2-Q4 style)

Case A: Lamport Trace

P1 sends m1 to P2 at C1=2. P2 sends m2 to P3 at next event. P3 sends m3 back to P1.

P1: 0 → 1 → 2 (send m1, ts=2)

P2: on receive: max(0,2)+1 = 3 → 4 (send m2)

P3: on receive: max(0,4)+1 = 5 → 6 (send m3)

P1: on receive: max(2,6)+1 = 7

Case B: Ricart-Agrawala

P1 (ts=10), P2 (ts=8), P3 idle.

  1. P1 multicasts REQ<10,1>; P2 multicasts REQ<8,2>
  2. P2 receives <10,1>: P2 Wanted, (8,2)<(10,1) → queues
  3. P1 receives <8,2>: P1 Wanted, (10,1)>(8,2) → sends REPLY
  4. P3 has no request → REPLYs to both
  5. P2 has all REPLYs → enters CS
  6. P2 exits → REPLYs to queued P1 → P1 enters CS

Case C: Bully Election

5 processes {1..5}, P5 crashes, P2 notices.

P2 sends ELECTION to {3,4} → both reply OK and start own elections. P3 sends to {4} → P4 replies OK. P4 sends to nobody (P5 dead). P4 wins → broadcasts COORDINATOR.

Case D: 2PC Blocking

Coordinator C with P1, P2, P3. C → Prepare. All vote Commit, enter wait. C crashes. P1-P3 stuck "in-doubt" — must hold locks until C recovers. Throughput on affected records = 0. Fix: 3PC or Paxos/Raft.

Case E: Quorum Design (TV Voting)

N=5, want strong consistency, tolerate failures.

W=3, R=3 → W+R=6 > N=5 ✓

With 2 node failures: 3 alive ≥ 3 → ✅ still works

With 3 failures: 2 alive < 3 → halt (correctness preserved)

Case F: Paxos Contention

P1 (ID=10) and P2 (ID=12). 3 acceptors.

  1. P1 sends Prepare(10). All promise.
  2. P2 sends Prepare(12). All promise (12>10).
  3. P1's Accept(10, "X") → rejected (promised ≥12)
  4. P2's Accept(12, "Y") → accepted → consensus on Y
  5. P1 retries with Prepare(13). Promises now carry ("Y"). P1 must use "Y" (validity)

Case G: PageRank One Iteration

Pages A,B,C,D with links A→B, A→C, B→C, C→A, D→C. d=0.85. Init PR=1.

Out: |A|=2, |B|=1, |C|=1, |D|=1.

  • PR(A) = 0.0375 + 0.85·(1/1) = 0.8875
  • PR(B) = 0.0375 + 0.85·(1/2) = 0.4625
  • PR(C) = 0.0375 + 0.85·(0.5 + 1 + 1) = 2.1625
  • PR(D) = 0.0375 + 0 = 0.0375

Case H: 13B Llama Memory (BF16)

  • Weights: 13 × 2 = 26 GB
  • Gradients: 13 × 2 = 26 GB
  • Adam (FP32, 12 B/param): 13 × 12 = 156 GB
  • Total ≈ 208 GB

One 80 GB H100 can't hold it → need FSDP/ZeRO across multi-GPU.