Making Frequent-Pattern Mining Scalable, Efficient, and Compact on Nonvolatile Memories

Abstract

Frequent-pattern mining is a common means to reveal the hidden trends behind data. However, most frequent-pattern mining algorithms are designed for DRAM, instead of nonvolatile memories (NVMs) which are preferred by energy-limited systems.Due to the huge differences between the characteristics of NVMs and those of DRAM, existing frequent-pattern mining algorithms encounter the issues of write amplification and energy waste when they are run on NVMs.Moreover, the design complexity is exaggerated when parallel computing architecture is introduced to speedup the mining process. A scalable, time-efficient, and energy-economic solution to the frequent-pattern mining problem is thus urgently needed. By considering the NVM characteristics, Pev FP-tree accelerates the mining process and enhances the energy efficiency, as compared to a straight for-ward design of FP-trees on the parallel architecture. Moreover, Pev FP-tree offers superior scalability in terms of the degrees of parallelism of the mining algorithm and the branching factor of its tree structure.Observing that keys are often sparsely distributed in FP-trees, we also propose a compression technique to PevFP-tree, namely compressed Pev FP-tree (CpevFP-tree), which further enhances the time and energy efficiencies of PevFP-tree. The proposed PevFP-tree and CpevFP-tree are evaluated by a series of experiments based on realistic datasets from diversified application scenarios, where CpevFP-tree achieves 88.73% of performance improvements over a straightforward design of FP-trees in the parallel architecture, and 79.47% of performance improvements over PevFP-tree, on average.

Existing System

? By considering the NVM characteristics, Pev FP-tree accelerates the mining process and enhances the energy efficiency, as compared to a straight for-ward design of FP-trees on the parallel architecture. ? Moreover, Pev FP-tree offers superior scalability in terms of the degrees of parallelism of the mining algorithm and the branching factor of its tree structure. ? Observing that keys are often sparsely distributed in FP-trees, we also propose a compression technique to PevFP-tree, namely compressed Pev FP-tree (CpevFP-tree), which further enhances the time and energy efficiencies of PevFP-tree. ? The proposed PevFP-tree and CpevFP-tree are evaluated by a series of experiments based on realistic datasets from diversified application scenarios, where CpevFP-tree achieves 88.73% of performance improvements over a straightforward design of FP-trees in the parallel architecture, and 79.47% of performance improvements over PevFP-tree, on average.

Disadvantages

? This work aims to greatly enhance the scalability of in-memory frequent-pattern mining, there exist other solutions to the scalability problem, such as implementing the frequent-pattern mining methods with Map Reduce over a disk-based Hadoop data analytics framework. ? The problem could be further exacerbated with the skewed write performance and energy consumption of NVMs. ? when the to-be-mined dataset grows rapidly with time and the number of children of an FP-tree node could become unlimited. ? Based on EvFP-tree, this work proposes effectivemeans to extend the FP-tree structure so that the scalabilityissue could be solved.

Proposed System

? we propose a compact, low-overhead, and yet flexible in-memory interconnect architecture that efficiently implements routing for next-state activation, and can be applied to the existing in-memory automata processing architectures. ? We use SRAM 8Tsubarrays to evaluate our interconnect. ? We present a new place-and-route algorithm that is 1-2orders of magnitude faster than the AP compiler, and pro-vide an open-source cycle-accurate automata simulator toper form software optimization on the automata and map them to the proposed architecture. ? We develop an in-house cycle-accurate automata simulator1to perform software optimization on the automata, map them to the proposed architecture, and extract per-cycle statistics.

Advantages

? NVMs provide outstanding access performance and energy consumption as compared with DRAM or hard disk, revision of existing algorithms designed for DRAM or hard disk is often necessary to adapt to the intrinsic characteristics of NVMs, so as to fully exploit the advantages of NVMs. ? This paper proposes parallel evergreen frequent-pattern tree (PevFP-tree), a highly scalable frequent-pattern mining method for multi-core processors and a shared main memory based on byte-addressable NVMs.

Download DOC Download PPT