Integrating an Ensemble Surrogate Model’s Estimation into Test data Generation

Abstract

For the path coverage testing of a Message-Passing Interface (MPI) program, test data generation based on an evolutionary optimization algorithm (EOA) has been widely known. However, during the use of the above technique, it is necessary to evaluate the fitness of each evolutionary individual by executing the program, which is generally computationally expensive. In order to reduce the computational cost, this paper proposes a method of integrating an ensemble surrogate model’s estimation into the process of generating test data. The proposed method first produces a number of test inputs using an EOA, and forms a training set together with their real fitness. Then, this paper trains an ensemble surrogate model (ESM) based on the training set, which is employed to estimate the fitness of each individual.Finally, a small number of individuals with good estimations are selected to further execute the program, so as to have their real fitness for the subsequent evolution. This paper applies the proposed method to seven benchmark MPI programs, which is compared with several state-of-the-art approaches. The experimental results show that the proposed method can generate test data with significantly low computational cost.

Existing System

? We developed a simplified version of Energy Exa scale Earth System (E3SM) land model (ELM), or sELM, to simulate carbon cycle processes relevant for Earth system models in a computationally efficient framework . ? Surrogate modeling assisted by a neural network (NN)also suffers from high computational costs when applied to a large-scale problem with many QoIs. ? The sELM is a regional-scale terrestrial ecosystem model that simulates terrestrial water, energy, and biogeochemical processes in terrestrial surfaces . Simulation of sELM is important for improving our under-standing of ecosystem responses to climate change.

Disadvantages

• When executing Algorithm 4, all the RBFNs will have an inadequate difference if the value of Npis too small. Atthis time, the ESM may result in the over-fitting or under-fitting problem. Conversely, a large computational resources will be required to implement the ESM. ? These concurrent algorithms have not provided effective strategies for solving the problem of high computational cost when generating test data. ? The number of individual e-valuations when solving an optimization problem, Sun etal. proposed a strategy for estimating the fitness ofan individual based on the Euclidean distance.

Proposed System

? We use very few simulation model runs to build an accurate and quickly evaluated surrogate system of a large-scale problem based on advanced machine-learning methods. ? we propose using singular value de composition (SVD) to reduce model output dimensions and to improve the computational efficiency of both building and evaluating the surrogates. ? we propose an SVD-enhanced, Bayesian-optimized, and NN-based surrogate method and aim to build an accurate and fast-to-evaluate surrogate system of a large-scale model using few model runs to improve computational efficiency in surrogate modeling and thus advance the data–model integration.

Advantages

? MPI programs have advantages of high efficiency, good portability, and simple implementation. ? We proposed an effective method to improve the efficiency of generating test data by combining the advantages of both bug-driven and coverage-guided technique. ? When using the proposed method to generate test data that cover the target paths, if such operations as forming the training set, training an ESM, and selecting superior individuals to execute a program are helpful for improving the efficiency of generating test data, then it will be shown that the proposed method is advantageous.

Download DOC Download PPT