How to quantitatively analyze qualitative data

ANALYSIS OF QUALITATIVE DATA

Qualitative data analysis techniques applied to market and public opinion research, or to institutional and marketing communications research, have for decades provided valuable information about the structure, dimensions, and meanings of consumer, user, and consumer discourse., citizens, or recipients of mass media, respect for social, political or product purchasing behavior, images of corporate brands, or media consumption.

It has been and is a necessary phase in the investigation of communication, marketing or public opinion problems that require a preliminary exploratory-qualitative study. Before describing it is necessary to explore.

The qualitative phase of unstructured research covers the following general objectives:

Know the spontaneous discourse of the group to investigate in relation to knowledge, identification of the product or brand, as well as the imaginary and symbolic axes on which they spontaneously position, from their own way of thinking and feeling, opinions about a specific product or service. Know the spontaneous discourse of the group to be investigated regarding expectations, demands, opinions in general, as well as the degree of satisfaction and level of information about the object or product.

These discourses - individual or group - are qualitative data expressed in the form of verbal strings ('string' = string) or linguistic phrases.

These materials ('verbatim') can not only be analyzed, interpreted and modeled from qualitative theoretical frameworks (psychological, psychoanalytic, psychosocial, anthropological, cultural, linguistic, semiological or rhetorical, etc.), but can also be complemented with the use of analysis Statistical in qualitative research.

STATISTICAL ANALYSIS OF QUALITATIVE DATA

Recorded data - printed, manuscript or ungraded - in the form of notes taken during an observation, free responses to open-ended questions, transcripts of individual interviews or group discussions, books, newspaper articles, etc. they can be processed through the quantitative treatment of the qualitative.

This approach is not new to market research. The standard interpretive procedure that is given, both for open questions and content analysis, includes: data reduction, selection of key words, grouping of sentences in dimensions, editing of exhaustive categories, coding of categories. But the analysis is transformed into a quantification of numerical codes, the counting of codes and the obtaining of frequency distributions; regardless of the structure and significance of the content of the categories.

The traditional procedure for quantifying qualitative data is categorization, coding, and tabulation. In this way the textual data is reduced to a treatment and analysis of numerical data. The frequency of the codes is more interesting than the content of the categories.

The lexicometric approach and textual statistics

Lexicometric or textual statistical approaches are supported by statistical techniques developed by the French School of Data Analysis (Analyze des Données), (Benzécri, JP 1973, 1976).

The Statistical Analysis of Textual Data (ADT) refers to procedures that involve counting the occurrences of the basic verbal units (generally words) and operating some type of statistical analysis from the results of such counts. The quantification of the texts is used from the first moment, without prior coding operations.

The development of textual statistical techniques has meant that statistical analysis of texts has become an interdisciplinary tool, made up of: statistics, discourse analysis, linguistics, computing, survey processing, documentary research; and it is increasingly used in various fields of the social sciences: history, politics, economics, sociology, psychology, etc. And specifically in the analysis of social discourses in the investigation of the consumer, the citizen, and in general, the media subject.

The data analysis techniques developed from the contributions of Jean Paúl Benzécri have allowed the analysis of large data matrices, application of Factorial Analysis to contingency tables of n (rows) xp (columns) from large extended data matrices. and displaying the results on a perceptual map.

TEXUAL DATA ANALYSIS METHODOLOGY

STATISTICAL ANALYSIS OF TEXUTAL DATA (ADT)

Preparation of lexicometric documents

Definition of the procedures associated with data collection and data cleaning for the proper recording of textual data. First, the corpus (components of the corpus: narrations, newspaper articles, reports, recordings of interviews and groups, free answers to open questions, and sociodemographic, socioeconomic, attitudinal variables, which typify or segment interviews or groups, variables that act as predictors - independent variable - of the criterion - dependent variable -) Study of the statistical units (forms, slogans, segments) that the algorithms of textual analysis recognize in the collected data and identification of the statistically significant sentences.The preparation of lexicometric documents involves a second step of segmenting the text into units. The segmentation of the textual corpus involves differentiating the elementary units: the graphic form (a sequence of letters between two spaces), the motto (all the words that have the same root and with equivalent meaning, that is, a family of words), repeated segments (a sequence of two or more words that appear more than once in a corpus of textual data), quasi-segments (words that appear in a certain sequence but have some difference in gender or number). vocabulary richness: frequency of repeated segments. Once the texts are segmented, the third step is to build the vocabulary of the text.This is presented in a lexicometric order table where the identifying number of each word, the corpus glossary word, the frequency of appearance and the length of the unit measured in number of characters are shown. Multivariate analysis of textual data. The fourth step is the application of the Correspondence Factor Analysis (AFC), on the lexicographic tables or the Automatic Classification (ascending hierarchical Classification) of the lexical forms and texts. Identification of answers and / or modal phrases combining the results of the textual analysis with sociodemographic and attitudinal variables, a typology of individuals or groups is obtained from responses and texts. Visualization of the results of the multivariate analysis.Positioning representation of the lexicographic corpus using preceptual maps. Textual discriminant analysis. Prediction of the variables object of the study (opinions, attitudes, predispositions, image profile, etc.) from the text.

The textual statistics in the SPAD.T Package

The SPAD.T (Système Portable pour L'Analyse des Donees Textuelle) program package is specifically designed to perform statistical analysis of textual data. Among the existing software on the market, we believe that this software package is the most complete product and has the most diffusion among the applicators of textual statistical analysis (ADT).

The operational steps of word data processing are as follows:

Textual data processing is carried out by entering textual data on magnetic media (floppy disks or CD-ROMs) in.doc or.txt format (using MS Word type word processors).The files have a previous edition (revision and correction) according to the precise instructions of the ADT Project Director. The recordings of the 'verbatins' of the group discussions or of the records of the individual interviews are textual transcripts of the most significant paragraphs of the interventions of the participants of the' focus groups 'or of the individual interviewees.The Guidelines of guidelines for the groups or the interviews is made in such a way that each pattern of inquiry can be identified numerically so that in the recording of the main emergent groups or individuals each pattern is perfectly identified. The Project Director coordinates with the Qualitative data processing area, a) the items of the Guideline Guidelines and b) the main criteria for the organization of the discourses in order to better conceptualize the identification of the factors that the Correspondence Factor Analysis produces and facilitate their visualization in the Positioning Maps: From the SPAD-T outputs (simply SPAD), both tables and graphs of positioning maps,The analysis of the data is carried out based on the objectives of the investigation for the purposes of the production of all the information found, in the final report.

QUALITATIVE ANALYSIS OF TEXTUAL DATA

There are other programs that facilitate the task of entering, organizing and analyzing textual data but that do not use multivariate statistical procedures for data mining (DataMining), such as SPAD-T, STATISTICA Text Miner, DB2 Intelligent Miner for Text, etc.

One of the best known is the QSR NUD * IST (Non Numerical Unstructed Data Indexing Searching and Theorizing) program.

QSR NUD * IST NVivo Nvivo is qualitative data processing software including text, images, sounds and video. It allows to encode, retrieve, annotate, and search for texts. It has no predefined minimum textual units. The analyst can encode one character if desired. Accepts rich text (in RTF format) with different font types, sizes and colors. The primary documents can be linked through hyperlinks with each other and with memos as well as with Data Bites (image, audio, video files, spreadsheets, databases, graphics, etc.) to view which is required by the respective external viewer. Generate reports in ASCII format, in RTF or in HTML.

Memos are documents in their own right, so they can also be edited, encoded, and linked like primary documents. It can be encoded using the drag and drop technique. Also from the quick coding bar, where the most recently used codes are found. It allows to create new codes (nodes) simply by clicking on a word in the primary document. You can display the codes applied to a text as a series of different colored marginal brackets that scroll along with the text. It allows to automatically precode the documents based on the structure of the same in sections, subsections and headings.

You can define document sets (sets) by dragging, assigning properties to them, and treating them together.

Performs textual searches for character strings and character patterns, as well as for regions encoded by a wide range of operators. Automatically encode the results.

It incorporates a Modeler and a Model Explorer, which allows you to create full-color graphic representations of the relationships between data and our ideas. Even models of the relationships between different models can be made. You can immediately go from the graphic to any of the objects that make it up, until you reach the text of the primary document or our own annotations.

It allows you to print reports of all objects in ASCII or RTF files and save or print the models as bitmap images. It generates tables with different types of quantitative information exportable to SPSS for further statistical processing and can import data from SPSS or from any other program that uses tables.

A node or a group of them can be exported to Decision Explorer for further analysis.

It has tools to facilitate teamwork and networking, managing passwords and access levels.

You can generate self-executing, read-only copies to share your data securely with third parties, preventing them from being modified without authorization.

To see a PowerPoint presentation about NUD * IST you can visit:

Another program for the analysis of qualitative textual data, as an example, is ATLAS / ti-Qualitative Analysis of Textual Data:

Atlas / ti belongs to the family of qualitative research or

qualitative data analysis programs (in which we will also find NUD * IST, among several dozen more). In recent years they have begun to be used in different disciplines: sociology, anthropology, psychology, pedagogy.

Although there are documentary databases -relational database management systems- such as Micro ISIS, or the latest version Win ISIS, developed by UNESCO.

ATRLAS / ti is more than just data storage and easy access later. In the case of Atlas / ti, the localization and recovery of the data takes place without problems. But it has the added advantage that it provides a series of tools to weave relationships between the most varied data elements, to make the interpretations explicit and to be able, at a certain moment to "call", all the elements that can support such or which argument or conclusion. The latter can be especially valuable when it comes time to write, to communicate the results to others.

Both the original data and the relationships that are created between them constitute knowledge. Here we consider knowledge, in the context of an investigation, to the sum of our data at the moment when a structure of relationships, of associations, that is focused on an end, is superimposed on them. This purpose may be a study to improve the customer service of a company, a Doctoral Thesis, the management of a catalog of parts and components that have to deal with complex relationships, inquiries for the resolution of a crime… or any question on which we intend to deepen and expand what we know.

The appearance of this program resembles a word processor.

ATLAS / ti is a computer tool whose objective is to facilitate the qualitative analysis of, mainly, large volumes of textual data.

Its focus is on qualitative analysis, it is not intended to automate the analysis process, but simply to help the human interpreter considerably speed up many of the activities involved in the analysis of texts and their interpretation.

As for example the segmentation of the text in passages or citations, the coding, the writing of comments or annotations.

All these activities belong to a Textual Level where the program operates. But it is complemented by a Conceptual Level, such as the establishment of relationships between the elements and the elaboration of models through graphic representation.

For an introductory presentation to the qualitative textual data analysis program, you can visit the website:

STATISTICAL-METHODOLOGICAL APPENDIX

The application of Factor Analysis (AF) in the field of ADT, focuses mainly on Factorial Analysis of Correspondences (AFC), a statistical algorithm developed by Jean Pau Benzécri (1973, 1976).

It is a descriptive (non-explanatory) method that is classified among the multivariate interdependence methods and allows visualizing the data (which can be qualitative or quantitative) by representing a point cloud in a space of reduced dimensions, depending on the geometric distances between the points.

The analysis process is carried out following four stages:

It is based on a set of typification characteristics (attributes or semantic items) of a product and another set of brands of that generic product. The marks are submitted to the typification by a group of individuals based on the characteristics taken in the form of semantic scales. Taking as a basis the values given to the different semantics for each proposed mark, an input matrix is constructed. In the rows the evaluated marks are located (Object = O j), in the columns the attributes or characteristics (Criteria = C i) and in the cells or boxes, the frequency of attribution (nij) of the characteristics to the marks appears. The calculation algorithm used explains each of the two sets (marks and attributes) in relation to the other, since there are simple relationships between the factors obtained,and it achieves a graphical representation that allows the proximity relations to be visualized (by means of the distances in the representation) in the following way:
- Each element of the column set (mark) with the other elements of the column Each element of the row set (attributes) with the other elements of the row Each element of the row set (attribute) with each element of the column set (mark).
The factorial correspondence analysis (AFC) is executed, first between the lines (attributes) and then, between the columns (marks), combining both analyzes, since the data is identical both if they are read according to the rows and if they are according to the columns. Therefore, there is a duality between the analysis of the lines and the columns of the input data matrix, so that the best approximation plane is the same in both cases. And the center of gravity or, above all, the inertia of the factors taken from the lines coincides with the center of gravity of the factors taken from the columns.

As a consequence of the whole process, a positioning map is obtained between all the attributes considered in the two treated sets (row variables and column variables).

The result is, therefore, a single homogeneous set that includes all the elements of the matrix.

An application case of ADT with SPAD-T can be seen in Moscoloni, N and Satriano, C. (2000)

Thus, and as a conclusion, in this way a synthetic representation of the typing attributes considered and the brands analyzed can be obtained, in their main axes of differentiation.

The projection on the plane of the individual points that constitute the attributes of the product will allow us to interpret the significance of the factor axes obtained.

AFC is a recently developed interdependence technique that facilitates both the dimensional reduction of a classification of objects (brands, companies, people, words, phrases, texts, etc.) on a set of attributes and the perceptual map of objects related to these attributes.

Researchers constantly face the need to "quantify qualitative data" that they find in nominal variables. The AFC adjusts both non-metric and non-linear data.

In its most basic form, the AFC uses a contingency table that is the cross tabulation of two categorical variables. It then transforms the non-metric data to a metric level and performs a dimensional reduction and a perceptual map.

AFC provides a multivariate representation of the interdependence of non-metric data that cannot be performed with other multivariate methods.

LINKOTECA ON SOFTWARE REFERRED TO QUALITATIVE ANALYSIS AND STATISTICAL ANALYSIS OF TEXTUAL DATA (ADT)

Site about ADT software (List provided by Lic. Ana Feldman, Buenos Aires, Argentina): TALTAC: www.taltac.it; CORDIAL: www.synapse-fre.com; Other software: LEXICO (France), INTEX, THEME EDITOR, ALCESTE by M. Reinert, STELLA (Search engine on which the Theory of Textual Objects is used), SATIM, HYPERBASE, ETIENE: [email protected]; SPHINXSpanish Behavioral Sciences Methodology Association> Software, Editorials and Magazines: SPAD Home Page Version 5.5.:SOLARI Publishing House, Software for Qualitative Analysis: Sphinx Development UK. Quantitative and Qualitative Analysis Program for numerical and textual data: SphinxSurvey Version 4.0: http://www.sphinxdevelopment.co.uk/Products_sphinx.htmA Anthropology and Qualitative Data Analysis, Pablo Gustavo Rodríguez, Home Page: http: // www. qualitative analysis.com.ar /> Software for Analysis of Qualitative Data StatSoft, Inc. DataMining Program for Textual Data: STATISTICA Text Miner: IBM, Inc. DB2 Intelligent Miner for Text: ATLAS / ti: Qualitative Analysis of Textual Data: QSR-Qualitative Solution for Resercher, Inc. Qualitative analysis software: QSR NUD * IST NVivo Version 2.0 (latest version of NUD * IST): Presentation Slides on QSR NUD * IST NVivo: http://www.analisiscualitativo.com.ar/n4index.htmPresentation Slides on QSR NUD * IST NVivo: http://www.analisiscualitativo.com.ar/n4index.htmPresentation Slides on QSR NUD * IST NVivo:

NOTE: To share your concerns regarding these key concepts and discover the resources you have to apply them in your particular case, visit:

REFERENCE BIBLIOGRAPHY

Lebart, Ludovic, Morineau, Alain and Bécue, Mónica (1989): Système Portable pour l'Analyse des Données Textualles. SPAD-T. Manuel de l'utilisateur. CISIA, Paris, 1989 Lebart, Ludovic and Salem, André (1994): Statistique Textuelle. Dunod, Paris, 1994. JADT90 (1990): Actes of 'Jornades Internacionals d'Anàlisi de Dades Textuals', JADT90, Barcelona 1990, Servei de publicacions de la UPC. Bécue, Lebart, Rajadell ed. JADT93 (1993): Proceedings of 'Secondes Journées Internationales d'Analyse Statistique de Donnée Textuelles', JADT93, Montpellier 1993, Telecom, Paris S.HJ. Anastex ed. Benzécri, Jean Paul (1988): »Quality and quantity in the tradition of philosophers and in Data Analysis», Les Cahiers de l'Analyse des Données, XIII (I): 131-152. Translation by Nora Moscoloni, IRICE-Rosario Institute for Research in Educational Sciences, 1993, Benzécri,Jean Paul (1973, 1976): L'Analyse des Données, Tome I: La Taxinomie, 1973; Volume II: L 'Analyze des Correspondances, Paris, Dunod, 2de, Éd. 1976. Etxeberría, Juan, et. to the. (1995): Analysis of Data and Texts: Madrid, Editorial Ra-Ma, 1995. Lebart, Ludovic; Salem, André and Bécue, Mónica (2000): Statistical Analysis of Texts: Editorial Milenio, Madrid, 2000.Bécue, Mónica (1991): Analysis of Textual Data, CISIA-, Paris, 1991Berelson, Bernard (1952): Content Analysis in communication research: New cork, III, Universite Press, Hafner Publications & Co, 1971 Pecheux, Michel (1969): Towards Automatic Discourse Analysis: Madrid, Editorial Gredos, 1969Bardín, Lawrence (1977): Content Analysis: Madrid, Akal, 1986. Kientz, Albert (1971): To Analyze the Mass Media. Content Analysis: Valencia, Fernando Torres Editor, 1976.Delgado,Juan Manuel and Gutierrez, Juan (1995): Methods and Qualitative Techniques of Research in Social Sciences: Madrid, Editorial Síntesis, 1998.Galindo, Cáceres (1998): Research techniques in Society, Culture and Communication: Mexico, Addison Wsley Longman, 1999.Soler, Peré (1991): Motivational research in Marketing and Advertising: Ediciones Deusto, Bilbao, 1991.Moscoloni, Nora and Satriano Cecilia Raquel (2000): «Importance of Textual Analysis as a Tool for Discourse Analysis. Application in an investigation about the abandonment of the treatment in drug addicts ", in Electronic Magazine 'Cinta de Moebio', nº 9, November 2000, Faculty of Social Sciences, University of Chile, in; 24 pp. Moscoloni, Nora (2000): «Characteristics of Multidimensional Data Analysis»,paper in 'Conference on Introduction to Multidimensional Data Analysis', (August 25, 2000), UNTREF-National University of Tres de Febrero, Argentina, Booklet 1, Series: Multidimensional Data Analysis, pp. 5-19.PIAD-Interdisciplinary Program of Data Analysis, Multidimensional Data Analysis (AMD) and Intelligent Data Analysis (AID), National University of Rosario, Argentina: