Logo en.artbmxmagazine.com

Advantages of using data mining

Table of contents:

Anonim

Information is generated in stratospheric quantities and at a scandalous speed, with the arrival of ICTs the information practically rains, reaches the computers and servers of companies from all over the world and due to the storage capacity nowadays the Information can get lost in the sea of ​​data, but not all that data is useful for organizations.

That is why in order to better process the information stored and that arrives, it is necessary to use tools that help in the search for the information but not only with that, it is also necessary to have tools that allow having clear and precise information to obtain better productivity with the data obtained.

Data mining is a tool that helps to perform these tasks in order to take advantage of the information that has been stored, however its use is not something that all organizations know or do, since there are other tools such as big data that perform similar tasks however each one has its characteristics.

In this article we can see how data mining has become a very helpful tool in the productivity of companies and in the same way we will see how it can interact with other tools, the benefits of using data mining and its particular characteristics.

Definitions.

According to (wikipedia.org, 2018) data mining means the following:

It uses the methods of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for later use. In addition to the raw analysis stage, it involves data and database management aspects, data processing, model and inference considerations, interest metrics, computational complexity theory considerations, post-processing of discovered structures, visualization and online updating.

Another definition given by (Marqués, 2014) tells us that data mining is:

Data mining is the process of discovering actionable information from large data sets. Use mathematical analysis to deduce patterns and trends that exist in the data. Typically, these patterns cannot be detected by traditional data exploration because the relationships are too complex or because there is too much data.

These patterns and trends can be collected and defined as a data mining model. Data mining models can be applied in scenarios such as the following:

  • Forecast: calculation of sales and prediction of server loads or server downtime Risk and probability: choosing the best clients for direct mail distribution, determining the probable break-even point for risk scenarios, and assignment of probabilities to diagnoses and other results Recommendations: determination of the products that can be sold together and generation of recommendations Sequence search: analysis of the items that customers have placed in the shopping cart and prediction of possible events. Grouping: distribution of customers or events into groups of related items, and analysis and prediction of affinities. (microsoft.com, 2018)

Main stages of data mining.

It is said that data mining is the set of techniques and technologies that allow exploring large databases in order to find patterns that can provide us with valuable information in future decision-making. The data mining process typically has four main stages:

  • Determination of the objectives Data processing Determination of the model Analysis of the results

The first of the steps deals with the type of information that the client wants to extract from the database. The second stage is the one that requires more work since it has to select, clean, enrich, reduce and transform the database that the client has provided us. Once we have processed it and it is ready to implement the artificial intelligence algorithm, we have to choose which algorithm will give us the best results.

When choosing the best algorithm for a specific analytical task, it is a great challenge and depends on the problem to be solved. There are basically five different problems in data mining: classification, regression, segmentation, association, and sequence analysis.

To solve these problems there are many algorithms to use, the main ones are: association, clusters, decision tree, linear regression, naive Bayesian classifier, neural network, sequence and series clusters temporary.

Finally, the last step is to analyze the results. Currently this work is being done in many business areas such as data security, finance, health, marketing, fraud detection, online searches, natural language processing or the new smart cars. (veryinteresting.es)

Integration of Data Mining and Big data.

Data Mining consists of the set of techniques for the extraction of information and that Big Data is the technology capable of capturing, managing and processing this data in a reasonable time and in a truthful way. Data Mining requires Big Data to streamline its data processing and management and, at the same time, Big Data requires Data Mining for predictive data analysis and to detect trends. We could say that there is a mutual integration between technique and tool.

Big Data technology is capable of quickly and accurately capturing, storing, managing and processing large amounts of data, taking advantage of them.

Fundamentally, it focuses on predictive analysis and detecting trends, using different techniques, including data mining. Through the definition of models and the use of different technologies, the aim is to turn data into a valuable asset.

Using this technology, we are able to identify common patterns that can be used to find new market niches, define key characteristics about current or future clients, generate parameters, metrics or processes.

It consists of a transformation in the way of doing business, increasing in many cases the profitability and productivity of companies.

Data Mining is versatile and in the same way that it can be used to carry out a conventional analysis, it is a good resource to extract value from Big Data. The combination of the two makes both tools have even greater potential. (Balagueró, 2017)

Examples of application of data mining and Big Data.

Because Big Data and Data Mining have different functions and, therefore, are applied for different contexts, we are going to see some examples of their scope.

Waltt Disney made use of Big Data to analyze the routes of its customers and improve their experience in real time, allowing users or consumers to be better understood.

Data Mining analyzes the information to discover and discover suspicious patterns of behavior. It would be applicable in the search for patterns of criminal behavior, analyzing patterns of behavior linked to fraud or fraud in banking or microbiology studies to establish behavior patterns among bacteria. (Balagueró, 2017)

Data mining cycle

  1. Information users must identify business problems and areas where data can add value to the company. Likewise, it is important to identify the areas where the information is very changeable, but essential for the competitiveness of the company. For this, various criteria can be handled and there are no specific ones that can be called correct. The objective is to determine the criteria, ideas, norms and questions that will serve as input to the data mining process. To analyze the historical information, the user will select the appropriate mining algorithm or algorithms. Subsequently, these algorithms are translated into mining programs that will perform the searches with the previously defined criteria.Incorporate the information obtained through the data mining process into the decision-making process; as well as present the findings found to those responsible for operations so that the information obtained can be integrated into the company's processes and can be applied in solving problems.Measure the results: Measure the value of the findings found, which are provided to the decision maker regarding the solution of the problems identified and the criteria defined in the first point.that are provided to the decision maker in relation to the solution of the problems identified and the criteria defined in the first point.that are provided to the decision maker in relation to the solution of the problems identified and the criteria defined in the first point.(Lagunés, 2016).

Text mining

Text mining is a new emerging field that attempts to extract meaningful information from the natural text of the language. It can be broadly characterized as the process of analyzing text to extract information that is useful for particular purposes. Compared to the type of data stored in databases, text is structured, amorphous, and difficult to handle algorithmically. However, in modern culture, the text is the most common vehicle for the formal exchange of information. The fields of text mining generally deal with texts whose function is the communication of facts, information or opinions, and the motivation to try to extract information from said text automatically is convincing, even if the success is only partial.

The phrase "text mining" is generally used to refer to any system that analyzes large amounts of text and natural language and detects lexical or linguistic usage patterns in an attempt to extract likely useful information. (Ramírez, 2016)

Advantages of using data mining over other information management techniques.

  • Data mining arises from the needs to manage information contained within the databases of organizations, this procedure has a series of advantages over other processes that are used for information management such as: Data mining provides business leaders a set of relationships and knowledge that in many cases were not known to exist within the organization Data mining helps companies choose the routes through which they will take the course of companies, as well as achieve advantages competitive against their market rivals, since through the use of data mining information will be known that only the company knows exclusively.We as human beings have the ability to detect patterns and anomalies in a superficial way, so to speak, that is why through the use of data mining it will be possible to better perceive patterns that at first glance are difficult to locate by our simple appreciation.(Franco, 2016)

Data mining and other disciplines.

There are certain boundaries between data mining and analogous disciplines, such as statistics, artificial intelligence, etc. Some argue that data mining is nothing more than statistics wrapped in business jargon that turns it into a salable product. Others, on the other hand, find in it a series of specific problems and methods that make it different from other disciplines.

The fact is that, in practice, all the models and algorithms commonly used in data mining - neural networks, regression and classification trees, logistic models, principal component analysis, etc. - have a relatively long tradition in other fields. (wikipedia.org, 2018)

From statistics.

Certainly, data mining drinks from statistics, from which it takes the following techniques:

  • Analysis of variance, by which the existence of significant differences between the means of one or more continuous variables in different populations is evaluated. Regression: defines the relationship between one or more variables and a set of predictive variables of the first ones. Chi-test. square: through which the hypothesis of dependence between variables is tested. Clustering analysis: allows the classification of a population of individuals characterized by multiple attributes (binary, qualitative or quantitative) in a determined number of groups, based on the similarities or differences of the individuals Discriminant analysis: allows the classification of individuals in groups that have previously been established, allows finding the classification rule of the elements of these groups,and therefore a better identification of which are the variables that define group membership. Time series: allows the study of the evolution of a variable over time in order to make predictions, based on that knowledge and under the assumption of that there will be no structural changes.(wikipedia.org, 2018)

Of computing.

From computing he takes the following techniques:

  • Genetic algorithms: They are numerical optimization methods, in which the variable or variables to be optimized together with the study variables constitute a segment of information. Those configurations of the analysis variables that obtain better values ​​for the response variable will correspond to segments with greater reproductive capacity. Through reproduction, the best segments endure and their proportion grows from generation to generation. You can also introduce random elements to modify the variables (mutations). After a certain number of iterations, the population will be made up of good solutions to the optimization problem, since bad solutions have been discarded, iteration after iteration. Artificial Intelligence:Using a computer system that simulates an intelligent system, the available data is analyzed. Among the Artificial Intelligence systems, Expert Systems and Neural Networks would be framed. Expert Systems: These are systems that have been created from practical rules drawn from the knowledge of experts. Mainly based on inferences or cause-effect. Intelligent Systems: They are similar to expert systems, but with greater advantage in new situations unknown to the expert. Neural networks: Generically, they are parallel numerical process methods, in which the variables interact by means of linear or non-linear transformations, until obtaining outputs. These outputs are contrasted with those that should have left, based on test data,giving rise to a feedback process through which the network is reconfigured, until a suitable model is obtained.(wikipedia.org, 2018)

By data mining, you can query your data much more complexly than using conventional query methods. The information mining provides can dramatically improve the quality and reliability of business decision-making.

For example, conventional methods can tell a bank which type of bank account it provides is the most profitable. Instead, data mining allows the bank to create profiles of customers who already have that type of account. The bank can then use data mining to find other customers that match that profile, and thus be able to launch a marketing campaign specifically targeting those customers.

Data mining can identify patterns in business data, for example in a supermarket's purchase records. If, for example, customers buy products A and B, which product C are they most likely to buy as well? Accurately answering questions like these are invaluable help in creating business strategies.

Data mining can identify the characteristics of a known group of customers, for example customers with little credit. The company can then use these characteristics to select new customers and predict whether they, too, will have poor credit. Data mining tools make it easy and automated to discover this kind of information in large databases. (ibm.com)

Conclusion.

Information has become an important asset for companies, recently Facebook was immersed in a very serious problem derived from the mismanagement of its users' information, referring to the case of Cambridge analytica, since it allowed this company to process the user data all for the purpose of campaigning better for then-candidate Donald Trump.

This serious problem led the owner of the company (facebook) Mark Zuckerberg to have to declare before the United States Congress why he allowed such a thing, what he wants to refer to is that the information that is generated today it carries a lot of value, it has a lot of weight because with the arrival of ITCs, organizations are bombarded with information.

There is so much information that is generated today that artificial intelligences have had to be developed that are capable of handling it, neural networks have been made that can process in a more sophisticated and efficient way the information that is generated for example in google or in youtube, since its users exceed billions.

Undoubtedly, data mining has come to support part of the weight that information generates, however we must always be careful how the information is handled and who we allow it to handle.

Thesis proposal.

Make agreements with companies specializing in data mining to be able to involve students within them.

Overall objective.

That the students carry out projects or works that are related to the management of data mining and soak up more of this topic to be more prepared.

Thanks.

I thank my mother who is the strength to continue every day and who has made me get to where I am, to my teachers who have contributed their time and their knowledge to be able to continue with my studies, to Doctor Fernando Aguirre and Hernández since He has given us all his experience and knowledge in this matter of Fundamentals of Administrative Engineering, as well as CONACYT since it gives us its support to motivate us to move forward in our adventure for mastery.

Bibliography.

Balagueró, T. (November 1, 2017).

Retrieved on May 26, 2018, from https://www.deustoformacion.com:

Franco, LG (April 6, 2016). https://www.gestiopolis.com. Retrieved on May 26, 2018, from

www.gestiopolis.com/mineria-datos-textos/

ibm.com. (sf). https://www.ibm.com. Retrieved May 26, 2018, from https://www.ibm.com: https://www.ibm.com/support/knowledgecenter/es/SSEPGG_10.5.0/com.ibm.im.ov erview.doc / c_dm_goals.html

Lagunés, XA (June 2, 2016). https://www.gestiopolis.com. Recovered on

May 26, 2018, from

www.gestiopolis.com/mineria-datos-informacion/

Marqués, P. (2014). DATA MINING THROUGH EXAMPLES. Spain:

  1. Books.

microsoft.com. (May 1, 2018). https://docs.microsoft.com. Retrieved May 26, 2018, from https://docs.microsoft.com: https://docs.microsoft.com/eses/sql/analysis-services/data-mining/data-mining-concepts?view=sql -analysisservices-2017

veryinteresante.es. (sf). https://www.muyinteresante.es. Retrieved on 26

May 2018, from

www.muyinteresante.es/tecnologia/preguntas-respuestas/que-es-la-mineriade-datos-311477406441

Ramírez, AA (September 21, 2016).

Retrieved May 26, 2018, from https://www.gestiopolis.com:

wikipedia.org. (April 27, 2018). https://es.wikipedia.org. Retrieved on 26

May 2018, from

es.wikipedia.org/wiki/Miner%C3%ADa_de_datos

Advantages of using data mining