SEMANTICS OF DATA MINING SERVICES IN CLOUD COMPUTING

Abstract

The recent incorporation of new Data Mining and Machine Learning services within Cloud Computing providers is empowering users with extremely comprehensive data analysis tools including all the advantages of this type of environment. Providers of Cloud Computing services for Data Mining publish the descriptions and definitions in many formats and often not compatible with other providers. From a functional point of view, having the possibility to describe complete Data Mining services is fundamental to maintain the usability and especially the portability of these services independently of the software/hardware support or even the differences between cloud platforms. The main objective of this paper is to design a Data Mining service definition which allows to compose with a single and simple definition a complete service, in such way a data mining workflow can be ported and deployed in different providers or even in a Market Place of this type of ready-to-consume services. This article presents a semantic scheme for the definition and description of complete Data Mining services considering both the management of the service by the provider (price, authentication, Service Level Agreement, ...) and the definition of the Data Mining workflow as a service. It represents a solid contribution for paving the way to the standardization and industrialization of Data Mining services.To asses the validity of the scheme a list of services from Data Mining providers have been described and an example of a full service for a Random Forest algorithm has been defined as a service. In addition, a practical scenario has been developed, creating a deployment platform for Data Mining services to give functional support to the scheme, illustrating the practical benefits of the proposal for the end user.

Existing System

• Our proposal has been designed to reuse existing vocabularies and ontologies, being able to compose data mining services in the cloud. • The idea is to improvise the results of web mining by taking advantage of the new semantic structures on the Web; and also, making use of web mining, for building up the semantic web by extracting similar meanings, useful patterns, structures, and semantic relations from existing web resources. • Most of the business information exists as unstructured data – commonly appearing in emails, blogs, discussion forums, wikis, official memos, news, user groups, chatting scripts on social networking sites, project reports, business proposals, public surveys, research and white papers. • As an emerging computing model, although cloud computing has many advantages that existing computing models do not have, there are still some problems. • Existing information organization and service means are difficult to adapt to the development and change of agricultural scientific research environment and scientific research methods. • Some enterprises have very strict internal management and operation and maintenance systems and do not want to be grasped and interfered by related industries outside the company. • Although cloud computing can provide protection for enterprises and users through security isolation measures, it still cannot meet the needs of all users.

Disadvantages

• The problem is computationally difficult (NPhard), however there are efficient heuristic algorithms that are commonly employed and converge fast to a local optimum. • The data and processing is distributed to the machines in the cluster to reduce the impact of any particular machine being overloaded that damages its processing. • MEX vocabulary also addresses the problem of sharing specific information about processing Machine Learning techniques in a lightweight way. • Our proposal is more concise to address the problem of defining these cloud services as it is based on the study of how different Internet providers and data mining platforms define these services with their specifications. • Regarding cloud data mining services, specific issues of experimentation and the execution process should be included, among others. • RDF is used for organizing information. RDFa solves data linking problem.RDF consider everything as Resource (Named things). • Semantic web also reduces cost and complexity of cloud computing by the use of rules laid down in the issue of security, one of the major roadblocks in the success of cloud computing, is resolvable by a wide range of security mechanisms that the semantic web provides.

Proposed System

• In this paper we propose a schema of definition of data mining service in cloud computing using Linked Data and validate its operation by defining a complete service. • In order to improve the data retrieval and mining ability of agricultural information management system, an agricultural information management data model based on cloud computing and semantic technology was proposed. • This study proposes a data mining technique for a large information management system based on semantic correlation feature extraction. • It is proposed to introduce new service concepts and technical means and other related data to innovate service content and functions and improve the role and contribution of knowledge service to scientific and technological innovation. • An agricultural information management data model based on cloud computing and semantic technology is proposed. • Although the massive data contains a large amount of valuable information, the vast majority of them are semistructured or unstructured and isolated data, lacking normative control, deep disclosure, and semantic correlation, making it difficult for computers to process these data automatically.

Advantages

• The main advantage of using dmcc-schema is that it greatly simplifies the design of a cloud service focused on data mining. This is because it unifies two environments: the cloud computing and services aspects, and the execution of data mining algorithms. • One of the advantages of using these services in the cloud is the ability to support large datasets. Another advantage of using the proposed occml specification is the simplicity of creating the complete service. • RDF which is semantic web technology that can be utilized to build efficient and scalable systems for Cloud. Resource Description Framework (RDF) which is semantic web technology that can be utilized to build efficient and scalable systems for Cloud . • When the users search for any keyword in the web, for the efficient and exact search result the Relational Based Search Engine is used. The collected Web pages are transported to a Web page database to be stored for the use of future retrieving URLs and corresponding Web pages. • By using the Relational Graph between all the resources the searching can be made easy and efficient.

Download DOC Download PPT