Logo en.artbmxmagazine.com

Data mining

Table of contents:

Anonim

Something that is usually worse than not having information available is having a lot of information and not knowing how to handle it. During the last years, humanity has developed a great ability and capacity to generate and collect data, because there are machines that can process it at a low storage cost.

But nevertheless, within these huge amounts of data, there is a lot of hidden information that is of great importance but which cannot be accessed using classical information retrieval techniques.

All the discovery of this information is possible thanks to data mining or Datamining, which among other characteristics that we will see in the development of this article is the use of artificial intelligence that allows identifying patterns and relationships in the data allowing creating models.

During the development of this article, the subject of data mining will be developed, what are its characteristics, methodology and its main areas of application.

Keywords:

  • DataminingData miningInformation analysisData processingDatabases

DATA MINING

General

It is curious that in the era we currently live in, considered the information age since thanks to it we can quickly find out about events around the world, data can form a confusing and motley wall.

The raw material used to make decisions is not always the most affordable, it is therefore necessary to go out and look for it and although it seems simple it is not, since it is necessary to extract data selectively in order to obtain an economic benefit.

All this is known as data mining, although it will be defined correctly later, it is like saying that if the Snow White dwarves went to work to find precious stones, they also had to take the suitable picks and shovels to get them In this case it would be to get the correct information.

The main objective of data mining is to extract the information from a data set, to work it and polish it to be able to transform it into a structure that is understandable for later use.

Organizations that use data mining can quickly see the return on their investment since they stop taking wrong steps, an application is the detection of consumption habits in a supermarket. (Durán Mena, 2014)

What is data mining?

Here are some definitions of data mining from various authors:

  • Data mining is the process that allows you to detect additional information from large data sets because it uses a mathematical analysis that allows you to deduce the patterns and trends that exist in the data. (SQL Server, 2014) Data mining is that process whose purpose is to extract, discover and store certain information that is relevant from large databases, through search programs and other indicators that have an explanation and that can be discovered by applying these tools. (Larrieta & Santillán Gómez, 2007) Data mining, also known as «data or knowledge discovery» and is the process of analyzing data from different points of view and summarizing them into useful information. (Information technologies,2009) Datamining or data mining is a set of techniques and technologies that allow exploring large databases, automatically or semi-automatically, in order to find patterns that are repetitive, trends or rules that allow explaining behavior of data in a given context. (Sinnexus, 2007)

Application of data mining

(SQL Server, 2014) Data mining models can be applied in the following scenarios:

  • Forecast: It allows calculating sales and predicting loads or server downtime. Risk and probability: Helps to choose the best customers for the correct mail distribution and assigns diagnostic probabilities or some other results. Recommendations: Used to determine products that can be sold together and generate some recommendations.Search sequences: Analyze items that customers have entered in a shopping cart and thus predict possible events.Grouping: Separate customers or events into specific clusters and thus analyze or predict affinities.

Main characteristics and objectives of Data Mining

(Vallejos, 2006) The most important characteristics of data mining are:

  • Explore data that is located deep within databases, or data warehouses as they tend to store a lot of information over time. In certain cases, those databases or data warehouses become data markets or are usually kept in Internet or Intranet servers The external environment of mining is usually the server-client relationship Datamining tools help to extract information ore that is buried in corporate archives or public records Data mining produces some types of information:
  • Associations Classifications Sequences Forecasts Groupings

In data mining, data are selected hoping that some hypotheses emerge from them and the data is sought to describe or indicate why they are the way they are.

Subsequently, the hypothesis is validated and hence the data mining must present an exploratory approach, although using datamining to confirm hypotheses is somewhat dangerous since a somewhat valid inference is made.

Datamining is a technology that is made up of stages that integrate several areas but should not be confused with software.

Currently there are some applications or tools of data mining that are powerful and that facilitate the development of projects, although they are usually complemented with another tool.

Stages of the Datamining Process

Although data mining tends to be different from each other, the common process of them consists of four main stages:

Determination of objectives

This first stage deals with the delimitation of objectives that the client may want under the guidance of a

datamining specialist.

Data reprocessing

In the second stage, it basically refers to the selection, cleaning, enrichment, reduction and transformation of databases, this stage generally consumes around seventy percent of the total time of a datamining project.

Determination of the model

Third stage, in this one begins carrying out a statistical analysis of data and later it is visualized graphically to have an approximation.

Depending on the objectives set and the tasks to be carried out, some algorithms developed in different areas of Artificial Intelligence can be used.

Analysis of the results

Phases of the Datamining project (Vallejos, 2006)

During the last stage, the results obtained are verified and compared with statistical analyzes and graphs.

The client must decide if they are new and if they provide new knowledge that allows them to make decisions.

Applications of Use

During each year, in different congresses and workshops, researchers with different applications meet, especially in the United States, data mining has been incorporated into the life of organizations, universities, governments, hospitals and various companies are interested in exploring its bases of data.

In government

The FBI will analyze commercial databases in order to detect terrorists.

In the company

  • It enables the detection of credit card fraud Discover why people defect from a mobile phone company Identify shopping habits in supermarkets Predict the size of television audiences

In college

It allows to know if recent graduates from a university carry out professional activities that are related to what they studied.

In Special Investigations

Development of the SKYCAT project, which is based on grouping techniques and decision trees to be able to classify objects with high reliability.

In sports clubs

NBA teams use apps that are smart to support their coaching team. (Vallejos, 2006)

conclusion

Datamining or data mining, as we saw in the development of this article, serves to cultivate customer loyalty since it allows them to offer something that they perceive as valuable, since one of its characteristics is to identify patterns of behavior with a certain tendency to give up. Drop based on data from those customers who have already done so, so organizations can always stay one step ahead and offer some incentives to retain their customers.

There are also many important application areas for this type of information analysis such as medicine, fraud prevention and control, investigation of acts that are linked to terrorism, engineering and genetics.

Data mining people say it is basically statistics mixed with business, and they argue that the methods it uses and the kinds of problems it may face make it unique and highly relevant.

In summary, data mining is presented as a technology that is emerging with several advantages of course, such as the meeting point between researchers and business people, and saving large amounts of money to the organization as well as allowing to open new business opportunities. In addition to working with datamining implies taking care of so many details that in the end it allows making decisions accurately.

References

  • Durán Mena, C. (August 6, 2014). Forbes Mexico. Obtained from https://www.forbes.com.mx/mineria-de-datos-informacion-precisa-y-relevante/Larrieta, MI, & Santillán Gómez, AM (2007). EJournal UNAM. Retrieved March 2016, from Data Mining: Concept, characteristics, structure and applications: http://www.ejournal.unam.mx/rca/190/RCA19007.pdfRAE. (2014). Royal Spanish Academy. Obtained from http://dle.rae.es/srv/search? M = 30 & w = variegated Sinnexus. (2007). Business Intelligence Strategic computing. Retrieved March 2016, from: http://www.sinnexus.com/business_intelligence/datamining.aspxSQL Server. (2014). Microsoft. Obtained from https://msdn.microsoft.com/esmx/library/ms174949(v=sql.120).aspx Information Technology. (2009). Information Systems: Data processing, planning and resource management.Retrieved March 2016, from http://www.tecnologias-informacion.com/mineria-de-datos.html Vallejos, SJ (2006). ExaUnne.edu. Retrieved March 2016, from

Variegated: Heterogeneous, gathered without uncertain. (RAE, 2014)

Download the original file

Data mining