Data warehouses and information systems in microsoft sql server 2008

Information systems are divided into two categories (Casares, 2003):

Operational systems: They are those that have as objectives to reflect the state and operation of the company recording daily transactions or operations, hence they are known as Online Transaction Processing (OLTP) systems. Systems to support decisions: They are those whose objectives are to measure and control the development of important business variables, seeking to identify, project and predict trends from the accumulated data.

Since the computer age began, organizations have used data from their operational systems to meet their information needs. Some provide direct access to the information contained within operational applications. Others have extracted the data from their operational databases to combine them in various unstructured ways, in their attempt to serve users with their information needs (Casares, 2003).

Bill Inmon was one of the first authors to write on the subject of data warehouses, he defines a data warehouse in terms of the characteristics of the data repository (Inmon, 2007):

Topic Oriented: The data in the database is organized so that all data elements related to the same event or real-world object are linked together.

Variant over time: The changes produced in the data over time are recorded so that the reports that can be generated reflect those variations.

Non-volatile: The information is not modified or deleted, once data is stored, it becomes read-only information and is kept for future reference.

Integrated: The database contains the data of all the operational systems of the organization and said data must be consistent.

Data markets are subsets of data from a data warehouse for specific areas. From a design point of view, everything that is applicable to a data warehouse is applied in the data market (Inmon, 2007).

The Dimensional Model is the most used in data warehouse systems, this is different from the relational model used in OLTP systems. This model is based on dimensions, which represent categories of information, attributes that represent a single level within a dimension, there may be hierarchies of attributes which express relationships between different attributes and finally fact tables, which contain data of interest, which have a level of granularity. Granularity is the lowest level of information that will be stored in the fact table. The first step in designing a fact table is to determine granularity.

Dimensional design diagrams:

Star Scheme: A fact table in the center connected to a set of dimension tables. Snowflake Scheme: A refinement of the previous one where some dimensions are normalized into smaller tables. Constellation of Facts: Multiple fact tables share dimension tables. they are visualized as a constellation of facts.

Administrative management recognizes that one way to increase its efficiency is to make the best use of the information resources that already exist within the organization. The data warehouse is currently the focus of large institutions, because it provides an environment for organizations to make better use of the information that is being managed by various operational applications (Casares, 2003).

The architecture of a data warehouse consists of three levels (Casares, 2003):

Source databases (production and historical). A database with summarized data extracted from the production databases (data warehouse). User-oriented interfaces that extract information for decision-making. The classic ones are: queries and reports, multidimensional analysis and Data Mining.

Source database: Consists of production databases as well as historical databases. These databases can be implemented in different types of systems: BD-Relational, BD-geographic, BD-texts, files, etc. A common feature is that they store atomic data items, which are relevant as production data, but may be too fine to serve as a basis for decision making. Furthermore, the notion of data quality in these databases is based on the consistency of these records, regardless of their relevance to the problem.

An important component in the data warehouse is the Data Dictionary (Meta-Data), in which the stored data is described in order to facilitate access to them through the data warehouse exploitation tools. The Data Dictionary establishes correspondences between the stored data and the concepts they represent to facilitate the extraction of information by the end user.

User-oriented interfaces that extract information for decision-making:

Interfaces for complex queries and reports: They allow the user to build graphs and reports from the information contained in the data warehouse and described through the Data Dictionary. Some typical functionalities of these tools are: dynamic grouping and ungrouping of data in reports, changes in the order of the report fields, visualization of the result of the queries in graphic form (bars, cake, points, etc.). These tools generate the expressions in the query language that retrieve the requested data (typically SQL), connect to the data store, retrieve the result and format it according to the given specification.

Data analysis products (OLAPs): They allow representing the problem data in terms of dimensions. For example, if it is about sales of products in different zones, one dimension of the problem is the zones, another the products and another the time. In this way, data analysis queries from one dimension based on the other are carried out immediately.

Data Mining Tools: They allow you to explore the data warehouse in search of unknown or unexpected relationships between the data.

The main motivations for building a data warehouse are the following (Casares, 2003):

Have Information Systems to support the decision. Have databases that allow to extract knowledge from the historical information stored in the organization. Design a database that allows executing unknown queries.

Microsoft SQL Server 2008 provides a platform for building and maintaining data warehouses, below are some of its new features and best practices associated with them:

- Data Compression

Data compression reduces the space required to store tables and indexes by allowing more efficient use of storage capacity.

There is the possibility of compression per article or per page. Compression by article stores all the fields in a variable width format, compression by page does the same thing but is done between articles on the same page. A page-level dictionary is used to store common values, plus common column value prefixes are stored only once on the page. Both forms of compression can be applied to tables and indexes.

- Transparent Data Encryption

Transparent data encryption allows data to be stored securely by encrypting database files. The SQL Server performs encryption and decryption directly, making the process transparent to the connected application. If data compression and encryption are used at the same time, the operations must be performed in this order.

- Governor of Resources

The Resource Governor allows administrators to control and assign resources like Processors and Memory to the highest priority applications.

- Hot Adding of Processors and Memory

The SQL Server Enterprise 64-bit edition enables hot-adding of processors and memory without the need to shutdown the server or limit existing connections.

- MERGE operator

The new MERGE operator simplifies the process of loading a data warehouse from its source. This new operator distinguishes new and updated articles in the source database and takes the appropriate action in the data warehouse.

- New Types of Spatial Data

The new spatial data types GEOGRAPHY and GEOMETRY allow spatial data to be stored directly in SQL Server 2008. GEOGRAPHY allows to represent geodetic data in three dimensions which are used by GPS applications and GEOMETRY allows to represent points in two-dimensional planes. There is also an integration with Virtual Earth which allows graphic representations of physical locations.

All of these new features make Microsoft SQL Server 2008 an advanced tool for creating and maintaining data warehouses.

Bibliography

CASARES, C. (2003) Data Warehousing.

INMON, B. (2007) Coporate Information Factory. Inmon Consulting Services.

MICROSOFT (2008) Best Practices for Data Warehousing with SQL Server 2008.

MICROSOFT (2008) What's New in SQL Server 2008