Top k algorthim based it sector coding search

      

ABSTARCT :

Traditional top-k algorithms, e.g., TA and NRA, have been successfully applied in many areas such as information retrieval, data mining and databases. They are designed to discover k objects, e.g., top-k restaurants, with highest overall scores aggregated from different attributes, e.g., price and location. However, new emerging applications like query recommendation require providing the best combinations of attributes, instead of objects. The straightforward extension based on the existing top-k algorithms is prohibitively expensive to answer top-k combinations because they need to enumerate all the possible combinations, which is exponential to the number of attributes. In this article, we formalize a novel type of top-k query, called top-k, m, which aims to find top-k combinations of attributes based on the overall scores of the top-m objects within each combination, where m is the number of objects forming a combination. We propose a family of efficient top-k, m algorithms with different data access methods, i.e., sorted accesses and random accesses and different query certainties, i.e., exact query processing and approximate query processing. Theoretically, we prove that our algorithms are instance optimal and analyze the bound of the depth of accesses. We further develop optimizations for efficient query evaluation to reduce the computational and the memory costs and the number of accesses. We provide a case study on the real applications of top-k, m queries for an online biomedical search engine. Finally, we perform comprehensive experiments to demonstrate the scalability and efficiency of top-k, m algorithms on multiple real-life datasets.

EXISTING SYSTEM :

? Top-k queries were studied extensively in many areas including relational databases, XML data and graph data. ? They identify two types of accesses to the ranked lists: sorted accesses and random accesses. ? In particular, sorted accesses read the tuple of lists sequentially and random accesses quickly locate tuples whose ID has been seen by sorted access. ? When designing an efficient top-k, m algorithm, informally, we observe that a combination cannot contribute to the final answer if there exist k distinct combinations whose lower bounds are greater than the upper bounds.

DISADVANTAGE :

? In this paper, we model such combination selection astop-k,m problems, which find top-k combinations with the highest overall scores based on the scores of their top-m objects. ? We can model this problem as a top-k, m problem again, i.e., it selects the top-k combinations of athletes according to their best top-m aggregate scores for games where they played together. ? We provide a case study on biomedical query refinement to demonstrate how to apply top-k, m algorithms into real-life problems. ? It is important to note that top-k, m problems cannot be reduced to existing top-k problems.

PROPOSED SYSTEM :

• We propose a family of efficient top-k, m algorithms with different data access methods, sorted accesses and random accesses and different query certainties, exact query processing and approximate query processing. • In this paper, we propose a new family of efficient top-k, m algorithms which avoid the expensive computation of top-m objects of each combination. • We propose a new type of top-k query, called top-k, m query, targeting at finding best k attribute combinations according to the overall scores of the corresponding topm objects. • If exact aggregate scores are required, proposes another version of NRA that outputs exact scores on-line (like SC) and can be applied for any join predicate.

ADVANTAGE :

? We find that our top-k, m algorithms result in order-of-magnitude performance improvements when compared to baseline algorithms. ? This quantitative analysis reveals the average performance of the algorithm. ? This is because YQL has less number of combinations than that of the NBA, which acts as a key factor to impact the run-time performance. ? To evaluate the effects is plotted to evaluate the performance of different optimizations in terms of the number of accessed tuples. ? To evaluate the performance gain from different approximate ratio settings, we plotted to show how much speedup is obtained (in percentage) with different approximate ratios.

Download DOC Download PPT

We have more than 145000 Documents , PPT and Research Papers

Have a question ?

Chat on WhatsApp