Convergence of Update Aware Device Scheduling for Federated Learning at the Wireless Edge
ABSTARCT :
We study federated learning (FL) at the wireless edge, where power-limited devices with local datasets collaboratively train a joint model with the help of a remote parameter server (PS). We assume that the devices are connected to the PS through a bandwidth-limited shared wireless channel. At each iteration of FL, a subset of the devices are scheduled to transmit their local model updates to the PS over orthogonal channel resources, while each participating device must compress its model update to accommodate to its link capacity. We design novel scheduling and resource allocation policies that decide on the subset of the devices to transmit at each round, and how the resources should be allocated among the participating devices, not only based on their channel conditions, but also on the significance of their local model updates. We then establish convergence of a wireless FL algorithm with device scheduling, where devices have limited capacity to convey their messages. The results of numerical experiments show that the proposed scheduling policy, based on both the channel conditions and the significance of the local model updates, provides a better long-term performance than scheduling policies based only on either of the two metrics individually. Furthermore, we observe that when the data is independent and identically distributed (i.i.d.) across devices, selecting a single device at each round provides the best performance, while when the data distribution is non-i.i.d., scheduling multiple devices at each round improves the performance. This observation is verified by the convergence result, which shows that the number of scheduled devices should increase for a less diverse and more biased data distribution.
EXISTING SYSTEM :
? In addition to validating the theoretical convergence, our experiments also showed that the proposed algorithm can boost the convergence speed compared to an existing baseline approach.
? Contrary to most of these works which make use of existing, standard FL algorithms, our work proposes a new one.
? Nevertheless, these works lack studies on unbalanced and heterogeneous data among UEs.
? We study how the computation and communication characteristics of UEs can affect their energy consumption, training time, convergence and accuracy level of FL, considering heterogeneous UEs in terms of data size, channel gain and computational and transmission power capabilities.
DISADVANTAGE :
? In , resource allocation across devices for FL over wireless channels is formulated as an optimization problem aiming to minimize the learning empirical loss function.
? We highlight that, compared to the channel conditions, scheduling based on the significance of the model updates has a greater impact on the performance at the initial iterations when the gradients are more aggressive.
? Frequency of participation of the devices is introduced as a device scheduling metric in.
? Also, a device scheduling policy for FL over wireless channels is studied in to minimize the training delay.
PROPOSED SYSTEM :
• In , optimization over batch size and wireless resources is proposed to speed up FL.
• FL over a Gaussian multiple access channel (MAC) with limited bandwidth is studied in , and novel digital and analog approaches are proposed for the transmissions from the devices.
• We have proposed novel device scheduling algorithms that consider not only the channel conditions of the devices,but also the significance of their local model updates.
• Numerical results show that the proposed scheduling policy provides a better long-term performance than scheduling policies based only on either of the two metrics individually.
ADVANTAGE :
? To the best of our knowledge, this is the first convergence result evaluating the performance of FL as a function of the number of scheduled devices at each round, as well as the number of bits each participating device can transmit, which reduces with the number of participating devices.
? The goal is to identify the set of scheduled devices at each iteration that results in the best performance.
? We compare the performance of different scheduling policies for the IID data distribution scenario. The goal here is to find the value of K resulting in the best performance for each scheduling policy.
? As can be seen, for all the scheduling policies, unlike in the IID case, scheduling a single device results in instability of the learning performance appearing as fluctuations in their accuracy levels over iterations.
|