Simultrain Solution -
[ w^(e) \leftarrow \beta w^(e) + (1-\beta) w^(c) ]
SimulTrain reduces latency by 78% on 4G and 71% on 5G compared to SyncSGD. FedAvg hides latency via local steps but suffers from model drift. | Method | Upload per step (KB) | Download per step (KB) | |----------------|----------------------|------------------------| | Centralized | 7,500 (video frame) | 75 (weights) | | SyncSGD | 75 (gradients) | 75 (weights) | | SimulTrain | 30 (activations) | 75 (delta weights) | simultrain solution
where ( \sigma^2 ) is gradient noise variance. This matches the rate of synchronous SGD when ( \tau ) is bounded. [ w^(e) \leftarrow \beta w^(e) + (1-\beta) w^(c)
Proof sketch: The forecast term cancels first-order bias from staleness. Weight reconciliation prevents error accumulation. The pipeline yields the same effective gradient steps per unit time. Hardware: Edge = Raspberry Pi 4 (4GB RAM), Cloud = AWS g4dn.xlarge (NVIDIA T4). Network: emulated 4G (50 Mbps, 30 ms RTT) and 5G (300 Mbps, 10 ms RTT). This matches the rate of synchronous SGD when
In edge-cloud setting, data is at edge, compute is in cloud. The sequential round-trip time is:
[ \tilde\nabla_k = \nabla \ell(w^(e)_k; x_k) + \alpha \cdot (w^(c)_k - w^(e)_k) ]