Data mining. information as invaluable asset

Information as a priceless asset

Within neoclassical economic models, an assumption abounds that allows conditions to be established in order to explain the behavior of agents within the market. However, this assumption is rarely found in real situations: This assumption is perfect information.

In an ideal world, all the agents that compete within the market have perfect knowledge about the preferences of their consumers and the decisions of their competitors. Furthermore, consumers are perfectly aware of all the events that occur in the market and make decisions based on this perfect information.

Perfect information refers to a situation in which all agents are fully aware of all the events that occurred in the market. However, perfect information is an impossible situation to achieve because markets are in constant and unstoppable transformation.

Companies have the need to generate methodologies that allow them to define trends and preferences in their potential consumers, as well as analyze the potential movements of their competitors. Knowing all these variables is invaluable for the company in question because the survival and growth of the company is linked to the decisions that are made.

However, market information tends to be as static as possible. Due to social trends, political movements, religious beliefs and ideological trends, consumers constantly adapt their ideas and preferences. Therefore, knowing the market preferences at a certain time is not a guarantee that this information will remain current in the long term.

Companies need to design tools that allow them to obtain and analyze large amounts of information in order to identify trends that allow them to make the best possible choice. In a world where there are millions of different preferences, companies must select the option that allows them to satisfy as many customers as possible.

Having the correct information allows individuals to make appropriate decisions for each situation. In a scenario where sufficient information is lacking, it is clear that the trend tends towards failure or luck. Because information is essential for decision making, its value is incalculable.

In the real world, perfect information is a non-existent element. Information has a cost and companies need to include in their costs the expenses incurred to obtain the relevant information. An organization that obtains information efficiently is an organization with infinite potential.

How can companies obtain information on market trends? With technological evolution, users are increasingly closer to companies. Some years ago, companies obtained their data through surveys carried out by themselves with their consumers. However, with the introduction of the Internet as a means of mass communication, individuals can show their preferences through social networks in which ways of identifying preferences are used.

Companies can obtain market information by various means, which, depending on their methodology and the size of the selected sample, have a different degree of veracity. However, these analyzes will usually result in a large raw database, from which a trend cannot be obtained with the naked eye.

To analyze a database, which contains millions of qualitative and quantitative data, it is necessary to use methodologies that allow us to identify the most important parts of the information, trends and opportunities. There is an analogy between database analysis and mining. A large amount of resources are used in mining, however, they require specific planning to be used.

A mining company does not spend millions of dollars excavating without first conducting a specific analysis of the subsurface content. The company does not randomly enter a territory and start a mine. In the same way, a data analyst has specific methodologies that avoid performing an analysis in which he spends extra resources.

Data mining is a process that involves multiple sciences and disciplines ranging from psychology, through statistics, computing, mathematics, even using applied technology in artificial intelligence. The objective of data mining is to identify trends that allow analysts to make correct decisions.

The use of data mining does not correspond only to market analysis, this tool can be used in any investigation that requires the analysis of large amounts of information. However, within this brief essay the use of the tool will be analyzed within the context of markets and organizations.

What is data mining?

Taking into account the analogy in which a few paragraphs are referenced, a vague concept of what data mining is can be constructed. According to Vallejos (2006, p. 11), which cites Fayyad and others (1996) defines that data mining is “A non-trivial process of identification validity, novel, potentially useful and understandable of understandable patterns found hidden in data. "

This technical definition offers certain fundamental concepts to understand the use of the tool. Vallejos (2006) also cites Molina and others (2001) to explain the concept from the business point of view, defining the concept as: “The integration of a set of areas whose purpose is to identify a set of areas from databases that provide a bias towards decision making. "

The purpose of data mining is the analysis of the available information that seeks to find patterns that determine the flow of actions of organizations. Data mining is one stage in a much larger process, known as knowledge extraction from databases.

The disciplines that involve database analysis involve statistics, artificial intelligence, computer graphics, and massive processing power. It would be impossible to think of an adequate data analysis methodology without a computational power capable of managing and calculating millions of results per minute.

However, according to Vallejos (2006) the idea of data mining is not a concept that arises from the hand of the birth of modern computing. To explain the concepts of data mining it is necessary to understand the concept of "Knowledge discovery in databases ". This concept is a fundamental element to be able to carry out correct data analysis based on correct methodologies.

Knowledge discovery in databases

With the evolution of computing in the 20th century, the cost of storing information has decreased significantly in addition to reducing the costs of processing the information itself. With the reduction in costs, the analysis of the information itself has been transformed until it is possible to carry out highly detailed studies at a reduced cost.

However, it is useless to have a large amount of data if you cannot analyze the hidden information that the patterns form within the raw information. The real value of the data is in the information that can be extracted from it. Successful businesses are founded on the correct exploration of patterns and decision-making based on anticipation and preparation.

According to Vallejos (2006), the capacity to produce and analyze the world's information has grown so much that it doubles every 20 months. Organizations perform SQL sequencesin order to get the basic information. However, they require more advanced techniques to be able to define the most important trends in the data.

CBD according to Vallejos (2006) aims to automatically process large amounts of data to find useful knowledge in them. In this way the user can use the information for their convenience. Knowledge has a specific hierarchy that needs to be analyzed from the general to the particular.

Techniques based on data mining

The fundamentals of data mining are the result of a long research process. The development of the techniques began when the information was stored in a computer equipment. Data mining depends on 3 technologies which are:

Massive data collection Powerful microprocessor computers Data mining algorithms

Vallejos (2006) mentions that commercial availability databases are growing at an unprecedented rate. Data mining algorithms find a way to consciously overcome classical statistical methods.

The main characteristics and objectives of data mining are the following: (Vallejos, 2006)

Explore data deep within databases stored in data warehouses. Data can be obtained from internet or intranet sources. Data mining environment maintains client-server architecture. Tools allow extract ore from information buried in public records Miner is an end user who is empowered by data drills Digging through data allows for unexpected results Data mining tools are easily combined and appropriately analyzed Mining produces 5 types of information o Associations o Sequences o Classifications o Groupings or Forecasts

Data mining saves a scientific method, because it formulates the hypothesis where the experiment is designed to collect data. With this system new knowledge can be obtained. Data mining presents an exploratory rather than a confirmatory approach.

The scope of data mining

Data mining technology has had multiple advances in recent years. With current technology, new business opportunities can be generated by providing new capabilities. However, the costs of data mining tend to increase with the increase in the degree of specialization. According to Vallejos (2006) the scopes are the following:

Automated trend and behavior prediction:

Data mining allows you to automate the process of finding predictable information in large databases. Questions normally require manual analysis, however they can now be answered directly.

An example where this analysis is observed is when systematic marketing is carried out aiming at objectives. Data mining uses results analytics to target new marketing campaigns. With this technique we can identify the behavior of certain population sectors and repeat their behaviors.

Automated discovery of previously known models:

Data mining tools allow you to identify previously known models in one step. This method may also identify fraudulent transactions in banking systems and find abnormalities.

Automated mass analysis:

When automation techniques are implemented in parallel processing systems, it is possible to analyze databases in minutes. Users have the ability to perform automated analyzes in increasingly complex minutes. This high speed allows for better predictions.

How to solve a problem with data mining?

Having defined that data mining is a process with which it is possible to detect information in large data sets, we can approach the methodology used by the tool. We must be aware that the fundamental advance of mining is to analyze complex relationships that are not visible with conventional techniques.

Microsoft's SQL server (Microsoft, 2014) offers us a methodology that is easy to understand for beginners in the field of data analysis. As we mentioned before, SQL sequences have a profound utility when it comes to performing simple analyzes, but more advanced techniques are required to obtain all the information. In this case we will analyze the SQL methodology in an introductory way.

To perform a data mining model it is necessary to define the following moments (Microsoft, 2014):

Define the problem Prepare the data Explore the data Generate models Explore and validate the models Deploy and update the models

However, this process is not one-way, but is cyclical. After implementing the model, it is necessary to perform the process again to confirm that the development of new models is possible. The data mining that SQL allows to perform tends to improve itself.

Defining the problem:

The first step in the data mining process, as highlighted in the following diagram, is to clearly define the problem and consider ways to use the data to provide an answer to the problem. (Microsoft, 2014)

Preparing the data:

Data cleansing involves not only removing invalid data or interpolating missing values, but also looking for hidden correlations in the data, identifying the data sources that are most accurate, and determining which columns are best suited for analysis. (Microsoft, 2014)

Exploring the data:

By exploring the data to understand the business problem, you can decide whether the data set contains bad data, and then you can invent a strategy to correct the problems or get a more in-depth description of the behaviors that are typical of your business. (Microsoft, 2014) Generating the model:

Before processing the structure and the model, a mining model is simply a container that specifies the columns to use for input, the attribute that it is predicting, and parameters that tell the algorithm how to process the data. (Microsoft, 2014)

Exploiting and validating the model:

The training dataset is used to build the model and the test dataset is used to check the accuracy of the model by creating prediction queries. (Microsoft, 2014)

Implementing and updating the model:

Once the mining models are in the production environment, you can perform different tasks, depending on your needs, such as using prediction models, creating statistical queries, or creating reports. (Microsoft, 2014)

The future of data analytics

With the evolution of social media, individuals with sufficient purchasing power to access the internet have become an infinite and invaluable source of information. Currently, users themselves outsource their tastes and consumption habits over the Internet, which makes obtaining data easier.

With this kind of control within information systems, companies have the ability to know the consumption habits of each individual and generate advertising in accordance with what is demonstrated in the information that the user makes available.

Currently, Internet advertising is based mainly on the individual study of the preferences of individuals. It is essential that the development of data mining allows to generate systems that not only identify trends, but also identify individual behaviors.

Thesis proposal:

The topic is proposed: "Data mining: Use tool for marketing oriented to the individual consumer" in order to develop data mining tools that allow managing the information available through social networks and focus them on targeted advertising objectives.

The objectives of the thesis are:

Development of data mining Development of marketing techniques Technical application of data analysis

Bibliography

Microsoft. (2014). SQL server 2014. Obtained from

Tips And Tips. (2012). Basic statements in SQL. Retrieved from

Vallejos, S. (2006). National University of the Northeast. Obtained from Data Mining:

exa.unne.edu.ar/informatica/SO/Mineria_Datos_Vallejos.pdf

SQL statements are a declarative language for accessing databases, which allow you to specify various types of operations. These sentences allow the handling of algebra and relational calculus in order to retrieve information. Source: (Tips And Tips, 2012).

Download the original file