Scaffold Extensions for Client Drift Mitigation in Federated Learning: A Synthesis of Approaches, Limitations, and Future Directions

Main Article Content

James Mburu Muthii
Dr. Stephen Kahara Wanjau
Dr. Stephen Njenga

Abstract

Client drift arising from non-independent and identically distributed (non-IID) data across participating clients remains one of the most critical obstacles to effective Federated Learning. The Scaffold algorithm, which introduces control variates to correct local gradient updates, has emerged as one of the most prominent variance reduction methods for mitigating this drift. Although numerous extensions to Scaffold have been proposed, no systematic review has exclusively examined the Scaffold algorithm and the control variate mechanism for client drift mitigation, leaving the research community without a consolidated understanding of how Scaffold has been extended, what limitations persist, and which characteristics remain underexplored. This study addresses that gap through a systematic literature review guided by PRISMA 2020 guidelines. Seven electronic databases were searched for publications from 2016 to 2026, yielding 1,847 records, from which 33 studies were included after duplicate removal, screening, and full-text eligibility assessment based on criteria requiring each study to address Scaffold or control variates for client drift in FL and cover at least two performance metrics. Data were synthesized thematically using frequency counts and tabular summaries. The review reveals nine distinct extension approaches: variance reduction via gradient estimation techniques was the most prevalent (11 studies, 34%), followed by integration with advanced optimization algorithms (8 studies, 25%), together accounting for 59% of the reviewed work. Twelve Scaffold characteristics were targeted for extension, with variance reduction the most commonly modified (37%, rising to 50% with combined categories), while communication mechanism, privacy budget allocation, and similarity-based approaches remained significantly underexplored. Recurring limitations across all approaches included communication and computational overhead, hyperparameter sensitivity, restrictive theoretical assumptions, performance degradation under extreme data heterogeneity, and limited large-scale empirical validation. A notable finding is that similarity-based approaches for client drift mitigation are largely absent from the literature, with only one study employing a similarity measure. The review, therefore, recommends future investigation of similarity-based methods as adaptive control variates within the Scaffold protocol, alongside prioritization ofcommunication-efficient, privacy-preserving designs validated at scale. This research was self-sponsored with no external funding.

Downloads

Download data is not yet available.

Article Details

Section

Articles

How to Cite

Scaffold Extensions for Client Drift Mitigation in Federated Learning: A Synthesis of Approaches, Limitations, and Future Directions (James Mburu Muthii, Dr. Stephen Kahara Wanjau, & Dr. Stephen Njenga , Trans.). (2026). International Journal of Emerging Science and Engineering (IJESE), 14(4), 1-12. https://doi.org/10.35940/ijese.D2641.14040326
Share |

References

“Advances and Open Problems in Federated Learning | now eBooks | IEEE Xplore.” Accessed: Feb. 13, 2026. [Online]. Available: https://ieeexplore.ieee.org/book/9464278

“Federated Machine Learning: Concept and Applications: ACM Transactions on Intelligent Systems and Technology: Vol 10, No 2,” ACM Trans. Intell. Syst. Technol. TIST, Accessed: Feb. 13, 2026. [Online]. Available: https://dl.acm.org/doi/10.1145/3298981

T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Federated Learning: Challenges, Methods, and Future Directions,” IEEE Signal Process. Mag., vol. 37, no. 3, pp. 50–60, May 2020, DOI: https://doi.org/10.1109/MSP.2020.2975749

B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-Efficient Learning of Deep Networks from Decentralized Data,” in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR, Apr. 2017, pp. 1273–1282. Accessed: Feb. 13, 2026. [Online]. Available: https://proceedings.mlr.press/v54/mcmahan17a.html

N. Rieke et al., “The future of digital health with federated learning,” NPJ Digit. Med., vol. 3, p. 119, 2020, DOI: https://doi.org/10.1038/s41746-020-00323-1.

T. Yang et al., “Applied Federated Learning: Improving Google Keyboard Query Suggestions,” Dec. 07, 2018, arXiv: arXiv:1812.02903. DOI: https://doi.org/10.48550/arXiv.1812.02903.

S. R. Pokhrel and J. Choi, “Federated Learning With Blockchain for Autonomous Vehicles: Analysis and Design Challenges,” IEEE Trans. Commun., vol. 68, no. 8, pp. 4734–4746, Aug. 2020, DOI: https://doi.org/10.1109/TCOMM.2020.2990686.

W. Y. B. Lim et al., “Federated Learning in Mobile Edge Networks: A Comprehensive Survey,” IEEE Commun. Surv. Tutor, vol. 22, no. 3, pp. 2031–2063, 2020, DOI: https://doi.org/10.1109/COMST.2020.2986024.

L. Zhu, Z. Liu, and S. Han, “Deep Leakage from Gradients,” in Advances in Neural Information Processing Systems, Curran Associates, Inc., 2019. Accessed: Feb. 13, 2026. [Online]. Available: https://proceedings.neurips.cc/paper/2019/hash/60a6c4002cc7b29142def8871531281a-Abstract.html

V. Mothukuri, R. M. Parizi, S. Pouriyeh, Y. Huang, A. Dehghantanha, and G. Srivastava, “A survey on security and privacy of federated learning,” Future Gener. Comput. Syst., vol. 115, pp. 619–640, Feb. 2021, DOI: https://doi.org/10.1016/j.future.2020.10.007.

T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith, “Federated Optimization in Heterogeneous Networks,” Proc. Mach. Learn. Syst., vol. 2, pp. 429–450, Mar. 2020.https://proceedings.mlsys.org/paper_files/paper/2020/hash/1f5fe83998a09396ebe6477d9475ba0c-Abstract.html

S. P. Karimireddy, S. Kale, M. Mohri, S. J. Reddi, S. U. Stich, and A. T. Suresh, “SCAFFOLD: stochastic controlled averaging for federated learning,” in Proceedings of the 37th International Conference on Machine Learning, in ICML’20, vol. 119. JMLR.org, Jul. 2020, pp. 5132–5143. Accessed: Feb. 13, 2026.

[Online]. Available: https://dl.acm.org/doi/10.5555/3524938.3525414

J. Wang, Q. Liu, H. Liang, G. Joshi, and H. V. Poor, “Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization,” in Advances in Neural Information Processing Systems, Curran Associates, Inc., 2020, pp. 7611–7623. Accessed: Feb. 13, 2026. [Online]. Available: https://proceedings.neurips.cc/paper/2020/hash/564127c03caab942e503ee6f810f54fd-Abstract.html

X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang, “On the Convergence of FedAvg on Non-IID Data,” presented at the Eighth International Conference on Learning Representations, Apr. 2020. Accessed: Feb. 13, 2026. [Online]. Available: https://iclr.cc/virtual_2020/poster_HJxNAnVtDS.html

S. J. Reddi et al., “ADAPTIVE FEDERATED OPTIMIZATION,” ICLR, 2021, [Online]. Available: https://iclr.cc/virtual/2021/poster/2691

K. Mishchenko, G. Malinovsky, S. Stich, and P. Richtarik, “ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally!,” in Proceedings of the 39th International Conference on Machine Learning, PMLR, Jun. 2022, pp. 15750–15769. Accessed: Feb. 13, 2026. [Online]. Available: https://proceedings.mlr.press/v162/mishchenko22b.html

S. U. Stich, “Local SGD Converges Fast and Communicates Little,” presented at the International Conference on Learning Representations, Sep. 2018. Accessed: Feb. 13, 2026. [Online]. Available: https://openreview.net/forum?id=S1g2JnRcFX

G. Malinovskiy, D. Kovalev, E. Gasanov, L. Condat, and P. Richtarik, “From Local SGD to Local Fixed-Point Methods for Federated Learning,” in Proceedings of the 37th International Conference on Machine Learning, PMLR, Nov. 2020, pp. 6692–6701. Accessed: Feb. 13, 2026. [Online]. Available: https://proceedings.mlr.press/v119/malinovskiy20a.html

S. P. Karimireddy et al., “Breaking the centralized barrier for cross-device federated learning,” in Advances in Neural Information Processing Systems, Curran Associates, Inc., 2021, pp. 28663–28676. Accessed: Feb. 03, 2026. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2021/hash/f0e6be4ce76ccfa73c5a540d992d0756-Abstract.html

Y. J. Cho, J. Wang, and G. Joshi, “Towards Understanding Biased Client Selection in Federated Learning,” in Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR, May 2022, pp. 10351–10375. Accessed: Feb. 13, 2026. [Online]. Available: https://proceedings.mlr.press/v151/jee-cho22a.html

D. A. E. Acar, Y. Zhao, R. Matas, M. Mattina, P. Whatmough, and V. Saligrama, “Federated Learning Based on Dynamic Regularization,” presented at the International Conference on Learning Representations, Oct. 2020. Accessed: Feb. 13, 2026. [Online]. Available: https://openreview.net/forum?id=B7v4QMR6Z9w

“[2107.06917] A Field Guide to Federated Optimization.” Accessed: Feb. 13, 2026. [Online]. Available: https://arxiv.org/abs/2107.06917

Q. Li et al., “A Survey on Federated Learning Systems: Vision, Hype and Reality for Data Privacy and Protection,” IEEE Trans. Knowl. Data Eng., vol. 35, no. 4, pp. 3347–3366, Apr. 2023,

DOI: https://doi.org/10.1109/TKDE.2021.3124599.

“Non-IID data in Federated Learning: A Survey with Taxonomy, Metrics...” Accessed: Feb. 14, 2026. [Online]. Available: https://openreview.net/forum?id=7tSnVgHG7F

M. Ye, X. Fang, B. Du, P. C. Yuen, and D. Tao, “Heterogeneous Federated Learning: State-of-the-art and Research Challenges,” ACM Comput Surv, vol. 56, no. 3, p. 79:1-79:44, Oct. 2023, DOI: https://doi.org/10.1145/3625558.

H. Zhu, J. Xu, S. Liu, and Y. Jin, “Federated learning on non-IID data: A survey,” Neurocomputing, vol. 465, pp. 371–390, Nov. 2021, DOI: https://doi.org/10.1016/j.neucom.2021.07.098.

C. Chen, T. Liao, X. Deng, Z. Wu, S. Huang, and Z. Zheng, “Advances in Robust Federated Learning: A Survey With Heterogeneity Considerations,” IEEE Trans. Big Data, vol. 11, no. 3, pp. 1548–1567, Jun. 2025, DOI: https://doi.org/10.1109/TBDATA.2025.3527202.

M. J. Page et al., “PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews,” BMJ, vol. 372, p. n160, Mar. 2021, DOI: https://doi.org/10.1136/bmj.n160.

M. J. Page et al., “The PRISMA 2020 statement: an updated guideline for reporting systematic reviews,” BMJ, vol. 372, p. n71, Mar. 2021, DOI: https://doi.org/10.1136/bmj.n71.

H. Zhang, C. Li, W. Dai, J. Zou, and H. Xiong, “Federated Learning Based on Model Discrepancy and Variance Reduction,” IEEE Trans. Neural Netw. Learn. Syst., vol. 36, no. 6, pp. 10407–10421, Jun. 2025, DOI: https://doi.org/10.1109/TNNLS.2024.3517658.

J. Xue and C. Wang, “FedSAGA: Composite Federated Learning with Inertial Douglas-Rachford Splitting and Variance Reduction Method,” in 2025 IEEE 31st International Conference on Parallel and Distributed Systems (ICPADS), Dec. 2025, pp. 1–9. DOI: https://doi.org/10.1109/ICPADS67057.2025.11322895.

T. Murata and T. Suzuki, “Bias-Variance Reduced Local SGD for Less Heterogeneous Federated Learning,” in Proceedings of the 38th International Conference on Machine Learning, PMLR, Jul. 2021, pp. 7872–7881. Accessed: Feb. 07, 2026. [Online]. Available: https://proceedings.mlr.press/v139/murata21a.html

D. Jhunjhunwala, P. Sharma, A. Nagarkatti, and G. Joshi, “FedVARP: Tackling the Variance Due to Partial Client Participation in Federated Learning,” presented at the 38th Conference on Uncertainty in Artificial Intelligence, May 2022. Accessed: Feb. 07, 2026. [Online]. Available: https://openreview.net/forum?id=HlWLLdUocx5

C. T. Dinh, N. H. Tran, T. D. Nguyen, W. Bao, A. Y. Zomaya, and B. B. Zhou, “Federated Learning with Proximal Stochastic Variance Reduced Gradient Algorithms,” in Proceedings of the 49th International Conference on Parallel Processing, in ICPP ’20. New York, NY, USA: Association for Computing Machinery, Aug. 2020, pp. 1–11. DOI: https://doi.org/10.1145/3404397.3404457.

M. Crawshaw, Y. Bao, and M. Liu, “Federated Learning with Client Subsampling, Data Heterogeneity, and Unbounded Smoothness: A New Algorithm and Lower Bounds,” Adv. Neural Inf. Process. Syst., vol. 36, pp. 6467–6508, Dec. 2023. https://proceedings.neurips.cc/paper_files/paper/2023/hash/14ecbfb2216bab76195b60bfac7efb1f-Abstract-Conference.html

K. Oko, S. Akiyama, T. Murata, and T. Suzuki, “Reducing Communication in Nonconvex Federated Learning with a Novel Single-Loop Variance Reduction Method,” presented at the OPT 2022: Optimization for Machine Learning (NeurIPS 2022 Workshop), Nov. 2022. Accessed: Feb. 12, 2026. [Online]. Available: https://openreview.net/forum?id=pYBZZzbJtE

P. Mangold et al., “SCAFFLSA: taming heterogeneity in federated linear stochastic approximation and TD learning,” in Proceedings of the 38th International Conference on Neural Information Processing Systems, in NIPS ’24, vol. 37. Red Hook, NY, USA: Curran Associates Inc., Dec. 2024, pp. 13927–13981.DOI: https://dl.acm.org/doi/10.5555/3737916.3738363

K. Oko, S. Akiyama, D. Wu, T. Murata, and T. Suzuki, “SILVER: Single-loop variance reduction and application to federated learning,” in Proceedings of the 41st International Conference on Machine Learning, PMLR, Jul. 2024, pp. 38683–38739. Accessed: Feb. 12, 2026. [Online]. Available: https://proceedings.mlr.press/v235/oko24a.html

Y. Liu, X. Yang, X. Chen, H. Du, and L. Xu, “FedNCV: Optimizing Federated Learning With Networked Control Variates,” Trans. Emerg. Telecommun. Technol., vol. 36, no. 11, p. e70287, 2025, DOI: https://doi.org/10.1002/ett.70287.

O. Mashaal, S. Baadel, and H. AbouZied, “Extending Control Theory into Federated Learning Data Heterogeneity Problem,” in 2024 IEEE International Conference on Artificial Intelligence and Mechatronics Systems (AIMS), Feb. 2024, pp. 1–4. DOI: https://doi.org/10.1109/AIMS61812.2024.10512401.

R. Dai et al., “FedGAMMA: Federated Learning With Global Sharpness-Aware Minimization,” IEEE Trans. Neural Netw. Learn. Syst., vol. 35, no. 12, pp. 17479–17492, Dec. 2024, DOI: https://doi.org/10.1109/TNNLS.2023.3304453.

X. Feng, M. P. Laiu, and T. Strohmer, “FedOSAA: Improving Federated Learning with One-Step Anderson Acceleration,” presented at the Second Conference on Parsimony and Learning (Proceedings Track), Mar. 2025. Accessed: Feb. 03, 2026. [Online]. Available: https://openreview.net/forum?id=OoYcaWhfwB#discussion

Y. Xu, W. Ma, C. Dai, Y. Wu, and H. Zhou, “Generalized Federated Learning via Gradient Norm-Aware Minimization and Control Variables,” Mathematics, vol. 12, no. 17, p. 2644, Jan. 2024, DOI: https://doi.org/10.3390/math12172644.

Z. Cheng, X. Huang, P. Wu, and K. Yuan, “Momentum Benefits Non-iid Federated Learning Simply and Provably,” Int. Conf. Learn. Represent., vol. 2024, pp. 9815–9848, May 2024. https://proceedings.iclr.cc/paper_files/paper/2024/hash/291d92e9097276da23f9aba4a45a5041-Abstract-Conference.html

S. Wang, Y. Xu, Z. Wang, T.-H. Chang, T. Q. S. Quek, and D. Sun, “Beyond ADMM: A Unified Client-Variance-Reduced Adaptive Federated Learning Framework,” Proc. AAAI Conf. Artif. Intell., vol. 37, no. 8, pp. 10175–10183, Jun. 2023, DOI: https://doi.org/ 10.1609/aaai.v37i8.26212.

W. Yan, K. Zhang, X. Wang, and X.

Cao, “Problem-Parameter-Free Federated Learning,” presented at the Thirteenth International Conference on Learning Representations, Oct. 2024. Accessed: Feb. 07, 2026. [Online].

Available: https://openreview.net/forum?id=ZuazHmXTns

Y. Yu, S. P. Karimireddy, Y. Ma, and M. I. Jordan, “Scaff-PD: Communication-Efficient Fair and Robust Federated Learning,” Jul. 2023, DOI: https:/doi.org/10.48550/arXiv.2307.13381.

X. Wang, L. Tian, C. Yang, F. Lin, and M. Li, “Enhancing Federated Learning under Partial Participation via Proportional Variance Reduction,” in 2025 IEEE 24th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Nov. 2025, pp. 344–351. DOI: https://doi.org/10.1109/Trustcom66490.2025.00045.

M. Crawshaw and M. Liu, “Federated Learning under Periodic Client Participation and Heterogeneous Data: A New Communication-Efficient Algorithm and Analysis”.

https://dl.acm.org/doi/10.5555/3737916.3738181

B. Li, M. N. Schmidt, T. S. Alstrøm, and S. U. Stich, “On the Effectiveness of Partial Variance Reduction in Federated Learning with Heterogeneous Data,” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2023, pp. 3964–3973. DOI: https://doi.org/10.1109/CVPR52729.2023.00386.

I. Iwan, B. N. Yahya, and S.-L. Lee, “Federated Model with Contrastive Learning and Adaptive Control Variates for Human Activity Recognition,” Front. Inf. Technol. Electron. Eng., vol. 26, no. 6, pp. 896–911, Jun. 2025, DOI: https://doi.org/10.1631/FITEE.2400797.

N. Moneesh, M. S, D. G. Nair, and J. J. Nair, “FedHybrid: Unifying Aggregation Strategies to Optimise Federated Learning on Non-IID Datasets,” Procedia Comput. Sci., vol. 258, pp. 3126–3134, Jan. 2025, DOI: https://doi.org/10.1016/j.procs.2025.04.570

Q. Zhang et al., “Robust Federated Fuzzy C-Means Algorithm in Heterogeneous Scenarios,” IEEE Trans. Fuzzy Syst., vol. 33, no. 9, pp. 3168–3181, Sep. 2025, DOI: https://doi.org/10.1109/TFUZZ.2025.3584697.

W.-T. Chang, M. Seif, and R. Tandon, “Differentially Private Federated Learning with Drift Control,” in 2022 56th Annual Conference on Information Sciences and Systems (CISS), Mar. 2022, pp. 240–245. DOI: https://doi.org/10.1109/CISS53076.2022.9751200.

W. Ding, H. Guo, Z. Yan, and M. Wang, “Secure Federated Learning Schemes Based on Multi-Key Homomorphic Encryption,” in 2024 IEEE 23rd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Dec. 2024, pp. 551–558. DOI: https://doi.org/10.1109/TrustCom63139.2024.00091.

T. Gafni, K. Cohen, and Y. C. Eldar, “Federated Learning from Heterogeneous Data via Controlled Air Aggregation with Bayesian Estimation,” IEEE Trans. Signal Process., vol. 72, pp. 1928–1943, 2024, DOI: https://doi.org/10.1109/TSP.2024.3351469.

X. Huang, P. Li, and X. Li, “Stochastic Controlled Averaging for Federated Learning with Communication Compression,” presented at the Twelfth International Conference on Learning Representations, Oct. 2023. Accessed: Feb. 07, 2026. [Online]. Available: https://openreview.net/forum?id=jj5ZjZsWJe

M. G. Rahmat and M. Khalilian, “A Novel Pearson Correlation-Based Merging Algorithm for Robust Distributed Machine Learning with Heterogeneous Data,” Jan. 24, 2025, arXiv: arXiv:2501.11112. DOI: https://doi.org/10.48550/arXiv.2501.11112.

H. Yang, Z. Liu, X. Zhang, and J. Liu, “SAGDA: achieving O(ε-2) communication complexity in federated min-max learning,” Adv. Neural Inf. Process. Syst., vol. 35, pp. 7142–7154, Dec. 2022, Accessed: Feb. 03, 2026. [Online]. Available:

https://proceedings.neurips.cc/paper_files/paper/2022/hash/2f13806d6580db60d9d7d6f89ba529ca-Abstract-Conference.html

P. Mangold, A. O. Durmus, A. Dieuleveut, and E. Moulines, “Scaffold with Stochastic Gradients: New Analysis with Linear Speed-Up,” presented at the Forty-second International Conference on Machine Learning, Jun. 2025. Accessed: Feb. 03, 2026. [Online]. Available: https://openreview.net/forum?id=2XvOJvUlKc

Most read articles by the same author(s)

<< < 2 3 4 5 6 7 8 9 10 11 > >>