### Scheduling Multithreaded Programs on Bandwidth-Limited Networks

by Bradley C. Kuszmaul
I show how to run multithreaded programs on a DRAM (Distributed Random
Access Memory) parallel computer and demonstrate that such programs
can run efficiently on a collection of machines distributed across
thousands of miles over the internet. Suppose we have a fully strict
multithreaded program has work $T_1$ and critical-path length
$T_\infty$, and we have a $P$ processor DRAM machine with $\lambda$ an
upper bound to the cost of routing any permutation. I present a
deterministic conservative DRAM scheduling algorithm that runs in time
$O((\lambda+\log P)(T_1/P+T_\infty))$ and a randomized conservative
DRAM scheduling algorithm that runs in time $T_1/P+O((\lambda{}+\log
P)T_\infty)$. Mike Bernstein and I have modified the Cilk
multithreaded runtime system to use our randomized conservative DRAM
scheduler. Surprisingly the modified system, called TreeCilk, often
achieves a performance improvement when one 2000-mile-away machine is
added to a tightly-bound cluster of machines. Also, surprisingly, the
bandwidth required across the long-haul internet links is only a few
hundred bytes per second for the knary benchmark.