A Multiple Gradient Descent Design for Multitask Learning on Edge Computing Multiobjective Machine Learning Approach
Abstract : Multitask learning technique is widely utilized in machine learning modeling where commonalities and differences across multiple tasks are exploited. However, multiple conflicting objectives often occur in multitask learning. Conventionally, a common compromise is to minimize the weighted sum of multiple objectives which may be invalid if the objectives are competing. In this paper, a novel multiobjective machine learning approach is proposed to solve this challenging issue, which formulates the multitask learning as multiobjective optimization. To address the timeconsuming problem contributed by the multiobjective evolution algorithms, a multigradient descent algorithm is introduced for a multiobjective machine learning problem by which an innovative gradientbased optimization is leveraged to converge to an optimal solution of the Pareto set. Moreover, the gradient surgery for the multigradient descent algorithm is proposed to obtain a stable Pareto optimal solution. As most of the edge computing devices are computational resourceconstrained, the proposed method is implemented for optimizing the edge device's memory, computation and communication demands. The proposed method is applied to the multiple license plate recognition problem. The experimental results show that the proposed method outperforms stateoftheart learning methods and can successfully find solutions that balance multiple objectives of the learning task over the different datasets.
? We now propose an efficient method that optimizes an upper bound of the objective and requires only a single backward pass. We further show that optimizing this upper bound yields a Pareto optimal solution under realistic assumptions. The architectures we address conjoin a shared representation function with taskspecific decision functions.
? We use a similar construction. For each image, a different one is chosen uniformly in random. Then one of these images is put at the topleft and the other one is at the bottomright. The resulting tasks are: classifying the digit on the topleft (taskL) and classifying the digit on the bottomright (taskR). We use 60K examples and directly apply existing singletask MNIST models.
? In this paper, we assume that each edge server has the same limited resources to handle the request of the mobile vehicle, i.e., each edge server has the same processing power and these servers are arranged at certain BS locations for mobile vehicle access.
? We treat multiple vehicles as multiple computing tasks. Multi computing tasks scheduling problem is analogous to multi task learning (MTL) model.
? Due to the sharing process that produces data, even realworld tasks that appear to be unrelated have strong dependencies.
? This causes the application of multiple tasks to become the inductive bias in the learning model
? Algorithms proposed in the computational geometry literature address the problem of finding minimumnorm points in the convex hull of a large number of points in a lowdimensional space (typically of dimensionality 2 or 3).
? In our setting, the number of points is the number of tasks and is typically low; in contrast, the dimensionality is the number of shared parameters and can be in the millions.
? On the accuracy side, we expect both methods to perform similarly as long as the fullrank assumption is satisfied.
? As expected, the accuracy of both methods is very similar.
It takes advantage of the gradient of each task and solves the optimization problem while determining updates over global parameters. However, the largescale application of MGDA is still impractical due to two technical issues.
(1) Potential optimization problems cannot be extended to high dimensional gradients better, but this naturally occurs in deep networks. (2) The algorithm needs to clarify the gradient of each task, which will increase the number of backward propagations linearly, and multiply the training time by the amount of tasks.
