Multi-Agent DRL for Task Offloading and Resource Allocation in Multi-UAV Enabled IoT Edge Network

Abstract : The Internet of Things (IoT) edge network has connected lots of heterogeneous smart devices, thanks to unmanned aerial vehicles (UAVs) and their groundbreaking emerging applications. Limited computational capacity and energy availability have been major factors hindering the performance of edge user equipment (UE) and IoT devices in IoT edge networks. Besides, the edge base station (BS) with the computation server is allowed massive traffic and is vulnerable to disasters. The UAV is a promising technology that provides aerial base stations (ABSs) to assist the edge network in enhancing the ground network performance, extending network coverage, and offloading computationally intensive tasks from UEs or IoT devices. In this paper, we deploy a clustered multi-UAV to provide computing task offloading and resource allocation services to IoT devices. We propose a multi-agent deep reinforcement learning (MADRL)-based approach to minimize the overall network computation cost while ensuring the quality of service (QoS) requirements of IoT devices or UEs in the IoT network. We formulate our problem as a natural extension of the Markov decision process (MDP) concerning stochastic game, to minimize the long-term computation cost in terms of energy and delay. We consider the stochastic time-varying UAVs’ channel strength and dynamic resource requests to obtain optimal resource allocation policies and computation offloading in aerial to ground (A2G) network infrastructure. Simulation results show that our proposed MADRL method reduces the average costs by 38.643%, and 55.621% and increases the reward by 58.289% and 85.289% compared with the different single agent DRL and heuristic schemes, respectively.
 EXISTING SYSTEM :
 ? The existing Q-learning method works for a limited environment, action, and decision or reward. ? However, most existing works aim at resource allocation, while time scheduling has been seldomly studied. ? The channel selection and fractional spectrum access are considered as resources to be managed.However, the existing works do not consider the use of the actual dataset and focus only on the request queue of the user service. ? If the existing RM process is not finished, add the existing request to the end of the queue. ? It is difficult for a large environment to manage the optimal solution with the Q-learning method.
 DISADVANTAGE :
 ? Some works have investigated computation offloading to MEC servers and resource allocation for IoT devices to maximize the network performance and optimize the problem in ultra-dense heterogeneous network. ? The essence of our proposed DDPG algorithm emanates from our problem complexity i.e. the Markov decision process (MDP) involves complicated large dimensional continuous state and action spaces in dynamic environment. ? We formulate an optimization problem to minimize the computation costs (energy consumption and delay) and resource allocation (power and computational resource) to satisfy the QoS of EIoT devices. ? One sure way to address the curse of dimensionality problem is by strong function approximation technique.
 PROPOSED SYSTEM :
 • A policy-based REINFORCE algorithm is proposed for the task scheduling problem, and a fully-connected neural network (FCN) is utilized to extract the features. • In, a general online scheduling model was proposed to minimize the task response time when tasks are offloaded to the edge servers. • In, a dual-scheduling framework in heterogeneous vehicular edge computing was proposed to adapt to the unstable capacity of servers and the task arrival rate. • In, the joint optimization problem of task allocation and the time scheduling problem were formulated as mixed-integer programming (MIP), and the logicbased Benders decomposition (LBBD) approach was proposed to maximize the number of admitted tasks.
 ADVANTAGE :
 ? We evaluate the proposed framework’s performance in terms of convergence, time-delay performance, energy performance, and UAV resource usage by EIoT devices. ? We compare the convergence performance of greedy-based, DQN-based, A3C-based and DDPGbased computation offloading and resource allocation in terms of average rewards. ? However, we can also observe that the A3C scheme’s performance is better than DQN, greedy and LC but slightly lower than the DDPG scheme, which was used to control large dimensional computation in our scenario. ? We evaluate dynamic computation offloading and resource allocation in emergency mIoT network performance of the proposed DDPG scheme for minimizing the computation cost of EIoT devices.

We have more than 145000 Documents , PPT and Research Papers

Have a question ?

Mail us : info@nibode.com