Data mining. using technology to our advantage

INTRODUCTION

(Riquelme, Ruiz, & Gilbert, 2006) state that the digital revolution has made digitized information easy to capture, process, store, distribute, and transmit. With the significant progress in computing and related technologies and the expansion of their use in different aspects of life, a large amount of information continues to be collected and stored in databases.

Therefore, it is important that organizations use this type of technology to their advantage, in order to face the challenges that exist today, such as maintaining long-lasting relationships with customers. (Braga, Valencia, & Carvajal, 2009) state that the greatest challenge that organizations can face today is to maintain a lucrative client portfolio, it is no longer about organizing production, reducing costs or the care provided, that although they are necessary conditions, they are no longer sufficient to win in the global and highly competitive market.

The authors suggest that it is through the knowledge gained about customers that organizations should be able to interpret their goals, expectations and wishes. This can be achieved, they argue (Braga et al., 2009) by “data mining”, or by “customer centric data mining”, which is a collection of techniques and methods that facilitate the acquisition and retention of the part of the market that fits a company (market share). The goals of good care and cost reduction also apply to non-profit organizations, governmental or not. The business that knows its customers is going to serve them even better.

This article will explain what data, text and sentiment mining consists of and how organizations can use this type of technology, not only to gain a competitive advantage over other organizations, but also to achieve a better focus on their products. and / or services of the information obtained from its consumers and clients.

DATA MINING

(Braga et al., 2009) explain that data mining provides an automatic method to discover patterns in data, without the bias and limitation of an analysis based solely on human intuition.

They also explain that customer-directed data mining provides knowledge of the characteristics and behavior of customers. Because retaining customers costs less than acquiring new ones.

The authors explain that data mining comprises a set of techniques for the description and prediction from large masses of data. For this reason, it is generally associated with databases especially called “data warehouse”. These databases allow the rapid integration of data from different sources.

(Joyanes, 2016) argues that data mining refers to the process of searching for valuable business information in a database, data warehouse or data mart.

Data mining can perform two basic operations:

Predicting trends and behaviors Identifying previously unknown patterns. Normal Business Intelligence applications typically provide users with insight into what has happened, data mining helps explain what is happening and predicts what will happen in the future.

Data mining is a process that uses statistical, mathematical, artificial intelligence and machine learning techniques to extract and identify useful information that turns into knowledge from large databases, data warehouse or data mart.

(Pérez, 2007) defines data mining as a process of discovering new and significant relationships, patterns and trends by examining large amounts of data.

(Riquelme et al., 2006) states that knowledge discovery in databases (KDD) is defined as the process of identifying significant patterns in the data that are valid, novel, potentially useful and understandable for a user, and this process It is interactive and iterative containing the following steps:

Understand the application domain: This step includes relevant prior knowledge and application goals. Extract the target database: collect the data, assess the quality of the data, and use exploratory analysis of the data to become familiar with it. Prepare the data: includes data cleansing, transformation, integration and reduction. An attempt is made to improve the quality of the data while reducing the time required by the learning algorithm applied subsequently. Data Mining: As noted above, this is the fundamental phase of the process. It is made up of one or more of the following functions, classification, regression, clustering, summary, image retrieval, rule extraction, etc. Interpretation: explain the discovered patterns, as well as the possibility of visualizing them.Use the discovered knowledge: make use of the created model

As we have been able to visualize the fundamental phase of KDD is data mining, therefore, below, we will explain what its functions consist of:

Classification: classifies a data within the predefined categorical classes. Regression: the purpose of this model is to match a data with a real value of a variable. Clustering: refers to the grouping of records, observations, or cases in classes of objects Similar. A cluster is a collection of records that are similar to and different from the records of another cluster. Rule Generation - Here rules are extracted or generated from the data. These rules refer to the discovery of association relationships and functional dependencies between the different attributes. Summary or summarization: these models provide a compact description of a subset of data. Sequence analysis: sequential patterns are modeled, such as analysis of temporal beings, sequences of genes, etc.The objective is to model the states of the process, or to extract and report the deviation and trends over time.

DATA MINING APPLICATIONS

(Riquelme et al., 2006) mentions that some of the important tasks of data mining include the identification of applications for existing techniques, and developing new techniques for traditional or new application domains, such as electronic commerce and bioinformatics.

The areas where data mining can be applied is practically in all human activities that generate data such as:

Commerce and banking: customer segmentation, sales forecast, risk analysis Medicine and Pharmacy: diagnosis of diseases and the effectiveness of treatments Security and fraud detection: facial recognition, biometric identifications, access to networks not allowed, etc. Non-numerical information retrieval: text mining, web mining, image, video, voice and text search and identification from multimedia databases Astronomy: identification of new stars and galaxies Geology, mining, agriculture and fishing: area identification of use for different crops or fishing or mining exploitation in satellite image databases Environmental Sciences: identification of functioning models of natural and / or artificial ecosystems to improve their observation, management and / or control Social Sciences:Study of the flows of public opinion. City planning: identify neighborhoods with conflict based on sociodemographic values.

TEXT MINING

(Matallana & Delgado, 2010) consider that text mining is a particular form of data mining that allows the extraction of knowledge from large information repositories, structured or not, and in the form of text. The objective is similar to that of data mining, discovering invisible patterns of behavior and new knowledge within a documentary collection.

Text mining applies mathematical and statistical techniques, as well as semantic analysis of the text. Text mining is the process of applying automatic methods to analyze and structure text data in order to create useful knowledge from structured and unstructured information.

Text mining, according to these authors, focuses on the discovery of interesting patterns and new knowledge in a set of texts, its objective is to discover new trends, deviations and associations within large volumes of textual information.

(Joyanes, 2016) explains that text mining searches, mines and discovers text in documents of all kinds, it is also called text data mining, the author argues that a practical sense is the process of deducing high quality information from of a certain text.

Text analysis tries to find patterns within a set of texts that facilitate better decision making, its objective is to improve decision making. Text analysis aims to capture unstructured, processed data and create structured data from it that can be used in the analysis and reporting processes.

(Joyanes, 2016) argues that text data has a great impact potential in almost all organizations and companies as well as in the industry. Learning methods that allow captures, parsing, and final text analysis is critical for organizations.

FEELING MINING

(Joyanes, 2016) argues that sentiment mining focuses on the analysis of feelings and opinions present in text messages and other media formats, and allows to discover the opinion or the feeling embedded, for example, in text messages, in twitter posts, etc. of a tangible benefit for the shareholders and workers that comprise it.

Opinion or sentiment mining as it is known, refers to natural language processing, computational linguistics and text analytics to identify and extract subjective information from material sources.

Classic sentiment analysis has undergone a dramatic change since the introduction of Web 2.0 and the growing use of blogs and social networks. A web application that measures sentiment analysis is “twitter sentiment”. Sentiment analysis is now a popular use of text analysis, it examines and obtains the general direction of opinion from a large number of people that provides information on what the market is saying, thinking and feeling about a organization or person. Sentiment analysis uses data from social media sites.

From the perspective of an organization or company, sentiment analysis allows you to quickly and efficiently analyze what is said about a brand or product, follow the opinions or conversations of certain influential users, detect trends on the Internet, etc.

(Joyanes, 2016) declares that sentiment analysis is a method that attempts to translate human emotions into data, but with the use of modern tools it is possible to achieve that the spontaneity and immediacy of the opinion in social media makes those feelings become more authentic and preserve their emotional content. The analysis related to unstructured content can be measured with the following fundamental characteristics:

Polarity: positive, negative or even neutral in a matter of opinion.

Intensity: degree of emotion that is expressed. Subjectivity: the source that emits the expression is objective, partial or impartial.

Sentiment mining can have different applications such as:

Measurement of employee satisfaction and the work environment Measurement of customer satisfaction Prevent customer churn by detecting situations of risk of loss of a customer by detecting negative opinions that are interpreted as possible customer abandonment Comparison with the competition by evaluating the opinion about the competition of the brand, company, products… and comparing it with ours. Detection of strengths and weaknesses in different areas of our organization, by detecting positive or negative opinions of impact. corporate reputation Prediction of the evolution of certain actions, product launches, etc. Analysis of the opinion of the electorate in the case of political votes.

Sentiment analysis is framed within natural language processing (NLP), artificial intelligence and text mining, among other techniques, since it fundamentally seeks to extract subjective information from a text, such as a tweet, a blog post, etc.

THANKS

I thank God for all his blessings, as well as the opportunity to work through the process of improving myself. I thank my parents for supporting me at all times in this new adventure, the National Council of Science and Technology for their support in my postgraduate studies, the Orizaba Technological Institute, the Master of Administrative Engineering, as well as the subject of Fundamentals of Administrative Engineering, for providing me with the necessary bases to be better as a professional and as a human being.

REFERENCES CONSULTED

Braga, LPV, Valencia, IOL, & Carvajal, SS (2009). Introduction to data mining. Rio de Janeiro: Editora E-papers.Joyanes, LA (2016). Big Data, Analysis of large volumes of data in organizations. Alfaomega Grupo Editor.Matallana, FE, & Delgado, JMC (2010). Big to small: The strategies of large corporations within the reach of medium-sized companies. Netbiblo. Pérez, CL (2007). Data mining: techniques and tools. Editorial Paraninfo.Riquelme, JC, Ruiz, R., & Gilbert, K. (2006). Data Mining: Concepts and Trends. Artificial intelligence. Ibero-American Journal of Artificial Intelligence, 10 (29). Recovered from

Download the original file