Logo en.artbmxmagazine.com

Mining: data, texts, feelings

Table of contents:

Anonim

Since the first human beings began to communicate with each other, that need to be always informed of everything that arose around them began to emerge, also, they looked for a way to have that information at hand to be able to carry out certain activities, these being for a personal or group action, in a society, an organization or even a country.

In the past, to be able to access data, information, statistics of some kind, among others, it was only possible through books, texts, talking to other people, through our own experiences, or the most fortunate already had the first computers, which practically did not let them save much information or even transmit much of it, among other tools. All this greatly hindered the process of accessing and sharing information, since it took a lot of time and effort to find that data or information that the person needed.

Today, the way data and information is created, generated and distributed has changed for the better, as it is very easy for anyone, no matter where they are in the world. We can find information about the economy of a particular country, the marketing of a product, about new technologies that are emerging and facilitate our way of living life, among many other things; This information is stored in large databases.

And yes, although it seems that everything is perfect because it is very easy to find any information practically at the distance of a click, this makes it a bit difficult to choose which is the best or most reliable information, since millions of data are generated up to date.

It is very common for us to listen to data mining in various situations, which is a very effective tool to be able to better choose the data and information that the person or organization needs at the time that it is required. Likewise, from this tool others emerge, which are text and sentiment mining, which share the same foundations as data mining, only they are heading towards other semblances.

Key concepts.

To facilitate the reading process on the topic "Mining (Data, Texts, Feelings)", some definitions will be cited that are considered important for the reader to know:

Data mining

"It is the set of techniques and technologies that allow exploring large databases, automatically or semi-automatically, with the aim of finding repetitive patterns, trends or rules that explain the behavior of the data in a given context." (Sinnexus, 2016)

Text mining

"It is the process in charge of discovering information that did not exist explicitly in any text of the collection, but that arises from relating the content of several of them." (Rochina, 2017)

Sentiment mining

"It refers to the use of natural language processing, text analysis, and computational linguistics to identify and extract subjective information from resources." (Wikipedia, 2018)

Origin of data mining

Data mining is a topic that could be considered relatively recent, since it has not been used for many years. The key parts of data mining, in other words, the elements that allow its correct operation have already been used for more time in the investigation of different areas, such as statistics, autonomous learning, artificial intelligence, among others.. Today the data mining tool has advanced considerably, thanks to the improvement of powerful information search engines and databases much more extensive than those that were had in the past.

The first ideas that were had about the concept of data mining, came during the decade of the 50's thanks to the study of data engineering. Those who were dedicated to computing at that time, generated lists of information of different types, such as about certain products, processes, among other things, all this information was stored in a kind of computer that served as the main one in the organization and it helped managers to make the best decision on a certain issue.

With all this, the first systems dedicated to the treatment of information were conceived for the directors or heads of the company, unfortunately, these systems were too heavy and not much information could be saved, also for those who were not familiar with computing They were not easy to understand.

In the 1960s, the first database management systems were generated, but they were not yet fully "digestible" for someone who was not familiar with these concepts.

Already during the 80's, the system called Data Warehouse was created, which fixed the inconveniences that previous database systems had. The presence of the Data Warehouse, made the experts on the subject develop new perspectives on all this, where these analyzes became autonomous and give the opportunity to remove certain information.

History of databases and data mining, taken from (Martínez, 2010)

History of databases and data mining, taken from (Martínez, 2010)

Data mining

Nowadays the revolution that digital culture brought with it has given the ability that the capture, storage and processing of data and information is a relatively easy job, also, the price for doing all this procedure is relatively very low at what it cost a few years ago.

The increase in the volume and diversity of data that is stored in computer systems or digital databases has increased in an unimaginable way in recent years.

All the data that has been accumulating since an organization was founded or created, must have the function of serving as the memory of the company itself, and likewise, be useful to presage certain data or information in the near future.

In order to carry out the process of analyzing the large volumes of data generated by any company, the traditional procedures to manage data and information, as well as the different statistical methodologies, are no longer enough, or in other words, those that are needed.

When an organization needs to make a decision, it will always be based on the information or data that is had about past events that have been collected in some data source. The extraction of this information from its corresponding database, which may be automatic or semi-automatic, has begun to have a lot of relevance in the present, so different procedures were developed to be able to do it in an efficient way, one of these tools is Data mining.

The main objective of data mining is to detect the knowledge that the organization has acquired from a database, which will provide various facilities to collaborators when making decisions.

Data mining combines different semi-automatic techniques of artificial intelligence, graphical visualization, databases and statistical analysis, so that the organization can obtain some knowledge based on all the data and information collected, since mining alone could not represent some value for the company. Data mining may be at the top of the top in the evolution of data analysis technology tools.

The concept of data mining (or DataMining for its name in English) comes from the analogy of a hill and the gigantic amount of data stored in an organization. These data are located inside the hill, hidden between rocks and brush; If you dig deeply, you could find different rocks that could be classified as "jewels" of important value, in other words, if you search for data in a deep way, you can locate information that could be of great value in order to build knowledge.

Process that takes data mining

The first step to be able to carry out a correct data mining is to identify what kind of data is being searched for. For this, you have to think about what data is required, where it can be located and how to get it.

Once they are in our possession, they must be prepared, storing them in the databases with the format that they require or allow or there is also the option of generating a warehouse (which is one of the most complicated parts within the mining of data). When the data has already been stored in the format accepted by the database, the selection of the merely necessary data is continued and those that are not of great importance to the organization are deleted.

We must be clear about what it is we want to achieve or find (this will have to be done before continuing with the data analysis using data mining), also, we must keep in mind what tools or processes they are vital to continue the process. After making use of the tool we decided to use, you should have an idea on how to decipher the results that were obtained, in order to be able to conclude if they are really useful for the organization and to be able to classify them for possible later use.

Since you have the data and information that are useful for the current moment of the organization, they will be discussed and analyzed, in order to make the best possible decision about the situation that is being discussed.

Once the decision has been made based on the data obtained with data mining, we proceed to evaluate what happened, in order to achieve this, the results must be observed and studied, if there were benefits and what were the total costs to be able to make a total evaluation of the process as feedback. Throughout this feedback period, the data will tend to change, it is possible that new tools or methodologies will be found, and obviously the next data mining cycle will have to be re-planned.

By way of synthesis, the data mining process should include the following steps:

  • Process the data Choose the characteristics that best suit the situation Choose an algorithm to remove the required data and information Analysis, interpretation and evaluation

Data mining process, taken from (Egonzales, 2008)

Data mining techniques

According to (Ahumada, 2016) data mining techniques are usually classified as: predictive, descriptive and auxiliary, and are as follows:

  • Regression, Analysis of variance and covariance, Time series, Boyesian method, Genetic algorithms.

Ad hoc classification:

  • Discriminant, decision trees and neural networks.

Post hoc classification:

  • ClusterinSegmentation
  • Dependency Association Multidimensional scaling Dimension reduction Exploratory analysis
  • SQL and query tool.

What does data mining do?

Data mining by nature is a process, which is why a model adjustment must be included or the standards must be specified based on certain data. Usually, these adjustments are of a statistical class, since the slack will be given so that the model can have a certain error.

Data mining requires algorithms, which will have the function of predicting (based on data that are already known) and describing (based on the patterns that were established). Some of these tasks are as follows:

  • This task aims to identify groups of categories to describe the data. These categories can be exclusive or exhaustive, likewise, be based on a hierarchical representation, and may even allow overlaps.
  • Data mining has the ability to map, in other words, catalog some data in any of the pre-established classes, and this will serve to find certain data in a much shorter time.
  • This component of data mining is based on locating a method that helps us find somewhat compressed descriptions of a subset of data. The more sophisticated processes include comprehension rules, multivariate visualization, and the ability to interpret functional relationships between various variables. It is very common that the aforementioned processes are used in the analysis and study of data interactively and in the generation of automatic reports.
  • The main objective of this task is to locate a model through which the dependencies between the variables are defined. We can find two levels within these models, which are:
    • Structural level: It is very common that we find this level as a graph, where the variables depend on each other locally. Quantitative level: It details what will be the "size" of these dependencies, with the help of numerical scales.

Probabilistic dependency networks must make use of conditional independence to be able to specify what the structural design of the model and its probabilities will be.

  • The main objective of this task is to achieve the mapping position for a data and make it a prediction variable with a real value. Some examples that can be given to this task are: Predicting how much biomass there is in some section of a particular forest, which is being analyzed by a microwave; Likewise, one has the ability to calculate the probability that a patient has of not perishing, based on the results of a previous diagnosis.

Text mining

Data mining is a technically young area of ​​research and study for word processing. It is interpreted in the same way as data mining, in other words, it is a methodology by which new attractive patterns or standards can be stipulated and new knowledge produced, but instead of occupying data, large amounts of texts will be used.

From what we can say that text mining has as its main objective to find new knowledge that is not clearly stipulated in some text.

Stages of text mining, taken from (Gómez, 2001)

Likewise, data mining tends to perform the following tasks:

  • Retrieve data and information, that is, select texts that are the most appropriate to what the organization is looking for Extract valuable information that is embedded in some texts and that has been overlooked, which may be: facts, keywords, important events, relationships between texts, among others. By having a methodology similar to that of data mining, text mining also wants to find essential data with which to create new knowledge for the company.

According to (Nuño & Machado) some techniques used by text mining are the following:

  • Text classification Retrieve information and extract key texts Machine learning Natural language processing

Text mining process

As mentioned above, text mining is a relatively young technique, which can vary its process and can be molded to different situations, there is still no established methodology to guide us.

But, you can make use of the following steps:

Steps of text mining, own elaboration with information from (Gómez, 2001)

Sentiment mining

Data mining is a series of technical executions of natural language processing, computational linguistics and text mining, the main objective of which is the extirpation of intrinsic information from content developed by collaborators or any other individual, for example: Comments that are carried out day by day in the different social networks that exist, blogs or comment groups to evaluate products.

Sentiment mining spans various fields of study that bear a certain relationship with the analysis of subjective elements that are implicit in the elements that have been generated by different users. So, therefore, feeling mining can find two kinds of tasks that can be carried out.

Polarity characterization

It is about being able to establish if an opinion can be classified as positive or negative, if it will be useful for the user or not. Also, there is the possibility of being able to generate a numerical value within an established range.

Study of feeling based on characteristics.

It emphasizes the ability to find the different characteristics of the product or service that were stipulated in the opinion that some user wrote.

Thesis proposal.

Implement data mining within SMEs in the Córdoba - Orizaba area for better management of the organization.

Objective.

Optimize the flow of information within the organization, separating what is useful from what is not, and thereby speeding up decision-making.

Thanks.

I thank my family, for giving me all the support and encouragement to continue day by day, the Orizaba Technological Institute and CONACYT for opening their doors to me and allowing me to continue my studies with the Master's Degree in Administrative Engineering and to Doctor Fernando Aguirre y Hernández for motivate me with your knowledge in the Fundamentals of Administrative Engineering seminar to carry out each of the assigned articles.

Conclusion.

Organizations and any collaborator who works in them, being involved in large volumes of information, mining, which can be of any type (data, texts or feelings), will provide certain tools and skills that are very necessary for the identification, choice, processing, study and evaluation of the data that have been collected in order to be able to produce information and later turn it into knowledge that can be extremely useful for organizations and collaborators who work in them.

Mining can be very helpful when making different decisions about the future of the company, since all the information that is produced with the help of these, has the purpose of structuring the ideas in a better way and ensuring the veracity of these, so that there is no doubt when making the best decision.

Likewise, it functions as a technological strategy, mining of any type enhances competitive advantage, since it optimizes various processes in organizations, especially the decision-making described above.

Bibliography.

Ahumada, AM (April 7, 2016). Gestiopolis. Obtained from

Egonzales. (April 4, 2008). Monographs. Obtained from

Gómez, MM (2001). Text Mining: A New Computational Challenge. National Polytechnic Institute, 2-13.

Martínez, BB (2010). BUAP. Obtained from

Nuño, RR, & Machado, EF (nd). Galeon.com. Retrieved from

Orallo, JH, Quintana, MJ, & Ramírez, CF (2014). Automatic Knowledge Extraction in Databases and Software Engineering. Polytechnic University of Valencia.

Rochina, P. (April 25, 2017). INESEM digital magazine. Obtained from

Sinnexus. (2016). Sinnexus. Retrieved from

Wikipedia. (April 18, 2018). Wikipedia, the free encyclopedia. Obtained from

Mining: data, texts, feelings