Logo en.artbmxmagazine.com

Data and text mining

Table of contents:


Make mention of what Data Mining is and give a brief description of each of the steps and what is the purpose of using this technique.


Today business organizations have a lot of information which they must handle in the most efficient way possible, among the information that must be handled are sales, customers, collections, in case of being a hospital, patients, treatments, and so on. Depending on the turn to which the organization is destined, that is why to have a better appreciation and control of the information of the organizations, computer and storage equipment such as personal computers, USB memories, CDs, flash memories, are used. etc, in the same way the use of these devices is done since the investment impact to acquire one of these equipment has been getting cheaper over the years,However, the same is done since these information systems become much more reliable than the human being himself.

All this information that is stored within the information systems resides in the databases that are used in the work of the organizations, which are called operational databases, which receive their name because through them the organizations carry out different activities such as shipments of goods to customers, student registration, patient treatment, collection procedures and others.

Once these operations are carried out, a purification is carried out and the information obtained through the primary sources is summarized, which would be all the tasks listed above, to collection, purification and summary of information it is transferred to what is called as data warehouses, which to say a metaphor could be periodic photographs which are used to have a notion of the state in which the company has been and how to have one learning from the past.

It is in this way that entrepreneurs can have ideal indicators to control the course of the company, since they can have the opportunity to investigate and explore in many situations in which they can be considered to be of interest and concern for the achievement of the objectives. business objectives.

This is how data mining appeared a short time ago, which helps the top managers of organizations to make the best decisions for the company in which they work. Data mining works through a series of "miners" which are a series of previously created algorithms which are tasked with carrying out an exhaustive task within the information stored by the organization in its information systems, the The material that these algorithms look for is nothing more than a series of trends, anomalies, deviations or situations that could be of interest, which could be unknown by companies. These algorithms, or miners, help managers to be able to direct the organization in an easier way, thus leading it on the right path.

The miners use “in addition to databases, artificial intelligence (procedures to find groups in similar situations, classify new events in known categories, etc.) and statistics. But unlike the latter, which takes a sample of the data and studies it, data mining studies all the data. The more data that is analyzed, the more accurate it is, and its detection and prediction power increases. " (Martínez Luna, 2011)

All of the above was narrated with respect to what data mining comprises, however there is another type of mining that in the same way could help companies to achieve those desired objectives that it has established in the beginning.

For us as human beings, knowledge is one of the bases of our existence, which defines where we are going and also our ambitions. Most of the knowledge that the human race has generated is in written form which can be named as natural language, which are newspapers, magazines, books, technical reports, and so on. However, not all people have the same ability to handle bibliographic contents, this is how we can say that the most conventional tasks to which all human beings are needed throughout our lives is to interact with written in order to have some benefit. The skills that a good reader and information seeker should have would be:

  • Find the necessary information Compare different sources of information and draw conclusions Manage texts, for example, translate, edit, etc. (Montes and Gómez, 2011)

When observing our deficiencies for information management, computational linguistics becomes a very strong tool to help us with word processing, since through this technique, the information analysis can be carried out automatically, thus solving the problems that most people have.

Just as data mining looks for a series of patterns within a data set, text mining performs the same activity but taking as data the texts that can be fed to a computer system, in addition to also looking at the task of being able to detect deviations and associations between each of the texts that can be analyzed.


Through the revolution of the digital age, the processes to handle information have become more efficient than in the past, this is how we say that the information process within digital systems basically consists of five steps which are:

  • Capture Process Store Distribute Transmit

Through the use of information technology, large organizations around the globe have been collecting large amounts of historical data that have been obtained with experience, however the information continues to grow in computer information systems, making these amounts increasingly large.

However, managing these large amounts of information is somewhat complicated and that is why data mining was born, "it arises as an attempt to make sense of the explosion of information that can currently be stored" (Mitra & Acharya, 2003)

Thus, through the use of technology, it is possible to store different types of data, be it images, videos, texts and numerical data, in a relatively simple interface that facilitates good multimedia management of the information.

It can be said that through this type of information mixture, conventional statistical processes to analyze the information obtained are insufficient since statistical techniques focus on the use of samples, contrary to data mining that uses the entire universe of data to have a better appreciation and solution.

In this way we arrive at the definition of what data mining is, which is “the process that aims to discover, extract and store relevant information from large databases, through search programs and identification of patterns and relationships. global figures, trends, deviations and other seemingly chaotic indicators that have an explanation that can be discovered through various techniques of this tool. " (Ángeles Larrieta & Santillán Gómez, 2001)

Data mining is used within companies to be able to take advantage of the value of the information contained within the databases to detect, as mentioned above, pre-established patterns so that the top managers of the organizations can have a better knowledge of the business they manage and thus carry out more efficient decision-making processes.


Data mining arises from the needs to manage information contained within the databases of organizations, this procedure has a series of advantages over other processes that are used for information management such as:

  • Data mining provides senior business managers with a set of relationships and knowledge that in many cases were not known to exist within the organization. Data mining helps companies choose the routes through which they will take the course of companies, as well as to achieve competitive advantages against its market rivals, since through the use of data mining, information that only the company knows exclusively will be known.We as human beings have the ability to detect patterns and anomalies in a way So to speak superficially, that is why by using data mining it will be possible to perceive in a better way patterns that at first glance are difficult to locate by our simple appreciation.


Now, speaking about the structure of data mining, it basically consists of the use of an algorithm or some computer program to carry out search activities within the large amounts of information contained in the database.

The use of these programs and algorithms is in order to be able to detect trends and patterns that are somehow hidden in the historical data of organizations.

These programs are what we previously called miners, these miners, programs or algorithms, are created by users in which various data exploration techniques are used, the techniques that can be used are:

  • ClusterAssociationsClassificationsVisualizationsNeural networksGeneric algorithmsDeviations detection

All these aforementioned methods require a very large database so that they can have greater efficiency.

These programs have the function of compiling the information previously obtained and as a consequence they carry out the activities of selection and search in the historical data, after doing the above, if something interesting is found, it is shown to the user.

The "miners" have an advantage over other information search methods, which is that they do not need any specialized software to perform searches. These search activities are carried out on the company servers and the entire PC network that are used to capture data and information.


Data mining works on a cycle that contains four steps, since the results obtained after the cycle ends, can be fed back to the cycle and so on.

  1. Firstly, the users who will carry out the data mining process must identify the problems that the organization, company or business has in the same way, they must locate the data that can give a kind of added value to the company and must also be located the areas of the company where the information is extremely changeable.Once the above is done, the user will be faced with the task of detecting the best algorithm to use to analyze the historical data obtained so that the mining programs can work efficiently According to the previously established search criteria, the information obtained through the data mining process must be incorporated into decision-making, providing the findings obtained to the committee that is involved in making decisions,In the same way, knowledge of the problems detected must be given to the areas involved so that a correct solution can be applied.Finally, a measurement of the results obtained provided to the person or committee in charge of making decisions about according to the problems found according to the previously established search criteria.


Some of the most important tasks that can be performed by using data mining are as follows:

  • Commerce and banking: customer segmentation, sales forecast, risk analysis. Medicine and pharmacy: diagnosis of diseases and the effectiveness of treatments Security and fraud detection: facial recognition, biometric identifications, access to networks, etc. Non-numerical information retrieval: text mining, web mining, image, video, voice and text search and identification from multimedia databases. Astronomy: identification of new stars and galaxies. Geology, mining, agriculture and fishing: identification of areas of use for different crops or fishing or exploration in databases of satellite images. Environmental Sciences:identification of models of operation of natural and / or artificial ecosystems (wastewater treatment plants) to improve their observation, management and / or control. Social sciences: studies of the flows of public opinion. City planning: identify neighborhoods with conflict based on sociodemographic values. (Riquelme, Ruíz, & Gilbert, 2006)


Text mining is the newest part of the research area focused on word processing. The definition that can be given to text mining is very similar to that of data mining since both seek the same thing but attached to different types of information.

Text mining is “the process of discovering interesting patterns and new knowledge in a collection of texts, that is, text mining is the process in charge of discovering knowledge that does not exist explicitly in any text in the collection, but that arise from relating the content of several of them (Hearst, 1999)

The text mining process basically consists of two stages which are:

  • Processing stage: In the first stage, the texts that can be manipulated are transformed into a series of representations structured in such a way as to promote the ease of further analysis. Discovery stage : In this stage an analysis of the intermediate representations is carried out, this task is carried out in order to discover and find interesting patterns within the texts of interest, as well as to obtain new knowledge.

According to the procedures used in the text processing stage, it is the type of content representation that will be obtained. The strategies that can be used for word processing in data mining are as follows.



Representation type Type of discoveries
1. Categorization

2. Full-text

3. Information extraction

4. Vector themes

5. Sequence of words

6. Data table

7. Thematic level

8. Language patterns

9. Relations between entities

Fig. 1.1 State of the art of text mining (Montes and Gómez, 2011)

As could be seen in figure 1.1, the three types of methods that exist for text analysis are somewhat limited in the presentation of the results, which makes it very difficult to discover and have knowledge of some more complex things such as they can be:

  1. Consensus Trends Deviations

However, in order to get a better appreciation of the aforementioned, it is recommended that you can make use of conceptual graphics, with which you can have a better representation of the analyzed texts.

Even so, the analysis of information according to conceptual graphics brings with it two types of problems which are related to the syntactic analysis and the semantic analysis of texts. Some examples of texts transformed into conceptual graphics are:

  • Parts of scientific articles Parts of medical records Parts of legal cases

However, there are no methods that allow the correct interpretation of conceptual graphics, to which text mining can be a fundamental part for the treatment of this type of information and give it the best possible sense according to the parameters that are being used. using for text mining process.


As could be seen, data mining is a very important tool to be able to interpret the directions of a company, taking into account historical data obtained over time, this type of mining will be able to discover trends that exist about a problem related to the organization or may grant you some type of advantage with the knowledge of certain exclusive information discovered through the use of the data mining cycle, for its part, text mining provides almost the same as data mining, but attached to the discovery of new knowledge starting from a large set of texts.


  1. Ángeles Larrieta, MI, & Santillán Gómez, AM (2001). Data mining: concept, characteristics, structure and applications. (1999). Untangling Tet Data Mining Proc. of ACL ´99: The 37th Annual Metting of the Association for Computational Linguistics. Maryland: University of Maryland. Martínez Luna, GL (October 2011). Data Mining: How to Find a Needle in a Haystack. (UANL, Ed.) Ingenierías, XIV (53), 63. Retrieved on March 23, 2016 Mitra, S., & Acharya, T. (2003). Data mining: multimedia, soft computing and bioinformatics. John Wiley & Sons.Montes y Gómez, M. (2011). Text Mining: A New Computational Challenge. México, DF: Instituto Politécnico Nacional, Riquelme, JC, Ruíz, R., & Gilbert, K. (2006). Data mining: concepts and trends. Artificial Intelligence, 10 (29).


I want to thank in a very special way the subject of Fundamentals of Administrative Engineering of the Master in Administrative Engineering that I study at the Technological Institute of Orizaba, but mainly to my professor Dr. Fernando Aguirre y Hernandez for encouraging the desire to research and read about different interesting topics.

Download the original file

Data and text mining