Why you need a data warehouse: definition, architecture and benefits
- Data Science, Data Warehousing
- Data Warehousing, datasphere, DWC, DWH
- 8 min reading time
Marco Ferrari
What is a data warehouse (chapter 1), how is it structured (chapter 2) and why do I need a data warehouse as an organisation (chapter 3)? These questions are answered in the following knowledge article.
Table of contents
1. What is a data warehouse?
1.1 Definition of data warehouse
“A data warehouse is a subject-oriented, integrated, time-related, and non-volatile data stock to support the decision-making, planning, and controlling processes of management” (Müller & Lenz, 2013, p. 14).
Data warehousing can be seen as one of the most important components of business intelligence. It is understood to be the entire process that is carried out in a data warehouse. The individual steps of this process are described below:
- The process starts with data collection, which is based on external as well as internal sources of a company.
- This is followed by data preparation and the subsequent data transformation.
- Ultimately, the prepared and transformed data is stored in the data warehouse – data storage. (Müller and Lenz, 2013, p. 11)
These process steps suggest that this process basically only describes the database maintenance of collected data. However, this assumption is far from the truth: Reality shows that this broad process consists of many individual processes, which involve an enormous amount of work. An example of this is the ETL process (Extraction, Transformation and Loading), which describes how the data assigned to the various departments is transferred to the data warehouse and linked together. (Müller and Lenz, 2013, pp. 11-12)
A detailed description of the characteristics of a data warehouse as well as the term "cloud computing" can be found in this Wiki article.
1.2 Where does the idea of the data warehouse come from?
Historically, the term data warehouse first appeared in 1988. Barry A. Delvin and Paul T. Murphy were the first to describe the use of a central location for data collection. (Köppen, 2014)
In IBM's "Systems Journal", they described that due to the advancing use of IT systems in companies, the need arose to be able to access the company data stored in the respective systems. Based on this need, it was evident that an architecture or a system was necessary that would bring together the collected company data from the various IT applications and enable uniform access to it. On almost 20 pages, Delvin and Murphy (1988) presented the specially constructed EBIS. The E stands for the abbreviations of the regions Europe (E), Middle East (ME) and Africa (A) and the letters BIS stand for Business Information System. This system, developed internally by IBM, was intended to make it possible for the first time to bring together IT data from the various regions and is thus regarded as the first example of a data warehouse system. (Devlin & Murphy, 1988, p. 60)
2. How is a data warehouse structured?
There are various requirements for a data warehouse, which can vary greatly depending on the industry and company. Therefore, there is no fixed architecture for a data warehouse. In the course of time, the structures of data warehouses have therefore changed again and again and different modelling variants have emerged that describe the architecture of a data warehouse. (Hahne, 2014)
Historically, many companies have developed their data warehouse concepts through the continuous evolution of their IT system landscapes, rather than based on a proactive plan for implementing Business Intelligence. When building a data warehouse, there are various technical and business aspects to consider. Therefore, it is beneficial to plan and carefully design the architecture of a data warehouse from the ground up. This structured approach provides advantages in the further use and development of the data warehouse. (Gluchowski, 2021)
Due to the advantages of a structured architecture for a data warehouse, a basic structure of a data warehouse is presented below. Subsequently, some examples of different architecture variants are presented.
Figure 1 illustrates the basic structure and necessary components of a data warehouse. The data from the various sources (Figure 1 area 1), as well as the ETL processes, which load the various data into the 3rd area and the centrally mapped data warehouse (Core Data Warehouse) here, are added to the overall concept of a data warehouse. Likewise, the provision of data from the Core Data Warehouse for the creation of data analyses and reports is included in the basic concept of a data warehouse. (Müller and Lenz, 2013)
The following five best-known modeling variants are described in more detail:
Independent Data Marts - Stove Pipe
Data Marts with common dimensions
Dependent Data Marts - Hub and Spoke
Federated architectures
Data warehouse layer architecture
2.1 Independent Data Marts - Stove Pipe
The first modeling approach for a data warehouse architecture involves independent data marts. This happens when individual departments in a company start building their own Business Intelligence solutions. As a result, each department creates its own data mart. From a company-wide perspective, the core data warehouse consists of several department-specific data marts that operate independently. This modeling approach is seen as a negative example because the company lacks a “Single point of truth” that represents the unified data truth. This is because the data in each data mart is categorized into different dimensions and labeled differently. Consequently, the data from different departments cannot be compared, and a separate ETL process must be created for each department or data mart. (Müller and Lenz, 2013, p. 21)
2.2 Data Marts with common dimensions
In another modeling approach, the data marts are connected to each other via common dimensions. This ensures that the individual data marts are no longer independent of each other. This allows switching back and forth between data marts that are connected to each other via a specific dimension. As an example, Müller and Lenz cite the linking of the sales data mart with the personnel deployment data mart. These two cubes are connected to each other, for example, by the branch structure, which is identical for both cases, and thus the location dimension. (Müller and Lenz, 2013, p. 21)
2.3 Dependent Data Marts - Hub and Spoke
The modeling variant with dependent data marts is often referred to as a hub and spoke architecture and, in addition to the architecture with data marts with shared dimensions, is the most frequently used architecture in practice. In this case, the data marts are supplied with data from the ODS according to a predefined schema that applies to the entire company. This approach is the opposite of independent data marts and follows the so-called top-down principle. (Müller and Lenz, 2013, p. 22)
2.4 Federated architectures
The federated architecture of a data warehouse is used when the data in the source systems is to be accessed directly and not extracted by an ETL process. In this case, the data is accessed virtually during queries so that it can be included in the analyses. This approach is used when data is fast moving or managed by third parties. The main components in the federated architecture are the wrapper and the mediator. The wrapper provides access to the data in the source systems and aligns the data, whereas the mediator coordinates the queries and connects the results.
2.5 Data Warehouse Layer Architecture
In addition to the classic structures of a data warehouse, another basic architecture for a data warehouse has now become established. This is the layer architecture, in which the individual components and processes of the data warehouse are divided into different layers. (Gluchowski, 2021)
Gluchowski also points out that in practice there is no uniform specification of how the layers are structured and named. Therefore, the following is an example of an ideal-typical layer architecture by Gluchowski.
As Figure 2 illustrates, the start and end points of the layered architecture correspond to the start and end points of the basic architecture of a data warehouse shown in Figure 1. In both architectures, the external sources are the origin and the analyses and reportings are the destination of the data warehousing process.
3. Why do you need a data warehouse?
Data warehousing is an essential component of business intelligence. Often, the term data warehousing is also used to describe the process behind business intelligence. The literature shows that the boundaries between the processes behind the two terms are not clearly defined and merge into each other.
Müller and Lenz (2013) define the term Data Warehousing as follows:
“Data warehousing is the business process that includes data acquisition from internal and externally accessible sources, data transformation and preparation according to the source and target database schemas, data quality assurance, and storage in the (central) data warehouse or in (decentralized) data marts (user views), and OLAP-based data analysis.” (Müller and Lenz, 2013, p. 11)
Müller and Lenz (2013) state in their book on Business Intelligence that data warehousing accounts for 25% of the overall concept described by the term Business Intelligence. This contribution is complemented by 25% from data mining and statistics, and 50% from business administration and operations research. (Müller & Lenz, 2013, p. 15)
Therefore, the question "Why is a data warehouse needed?" can be answered with the fact that a data warehouse and the process of data warehousing are necessary to carry out business intelligence in a company. In order to clearly answer the question of "why" in terms of a data warehouse, the question of the motivation for business intelligence in a company must be answered first.
Excursus: Business Intelligence
For more than 60 years, IT-based management support has been steadily advanced in companies. The focus was usually on supporting executives by using accumulated knowledge and experience in the form of IT-based figures, data and facts for future management decisions or for determining strategic action. However, it took several years until the first functional and successfully implemented systems were developed. At the beginning, these were still task-oriented. (Kemper, Baars, & Mehanna, 2010, p. 1)
The term "Management Support Systems" was coined in the 1980s to summarize the variety of existing information and communication systems. It is still used in science today. This is because it is applied to describe how IT-based management support not only refers to the use of computers, but is equally aimed at supporting management with the help of information and communication technologies. (Kemper, Baars, & Mehanna, 2010, p. 1)
The term Business Intelligence basically describes IT-based management support and has been used constantly since the 1990s to describe processes and applications related to IT. The term can mainly be traced back to considerations of the Gardner Group, which deals with market research and the analysis of developments in IT. It should also be noted that the term Business Intelligence cannot be clearly defined or described. The variety of definitions for this collective term is great and leads, among other things, to the fact that this term has continuously ensured that the principle of IT-based management support is discussed and constantly rethought. (Kemper, Baars, & Mehanna, 2010, p. 2)
With the knowledge of how a data warehouse works and why companies decide to implement business intelligence, the question of why a data warehouse is needed can be answered as follows:
On the one hand, there is the fundamental motivation behind the word Business Intelligence. This includes the goal of enabling IT-based, strategic management support. The aim is to provide management with a sound basis for making strategic decisions based on the linking and analysis of company-wide data.
This is where the data warehouse comes in, because the functionalities of a data warehouse enable companies to unite all data and to model them in such a way that complex queries as well as analyses based on the data stocks are possible.
Bibliography
Devlin, B. A. & Murphy, P. T. (1988). An architecture for a business and information system. IBM Systems Journal, 60-80. https://www.semanticscholar.org/paper/An-Architecture-for-a-Business-and-Information-Devlin-Murphy/c22ce1eeafb01f0682e194a2a22349aa141b78f6
Gluchowski, P. (2021, February 26). Data Warehouse. Encyclopedia of Business Informatics.
Hahne, M. (2014). Modelling Business Intelligence Systems: Guide for successful projects based on flexible data warehouse architectures. Edition TDWI. TDWI Europe. http://site.ebrary.com/lib/alltitles/Doc?id=10904187
Kemper, H.-G., Baars, H., & Mehanna, W. (2010). Business Intelligence - Fundamentals and Practical Applications: An introduction to IT-based management support (3rd , revised and expanded edition). Studies in Business Informatics. Vieweg + Teubner.
Köppen, V. (2014). Data Warehouse Technologies (2nd ed.). mitp Professional. mitp. https://ebookcentral. proquest.com/lib/kxp/detail.action?docID=6166362
Müller, R. M. & Lenz, H.-J. (2013). Business Intelligence. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-35560-8
Would you like to delve deeper into this topic? Then we look forward to a personal exchange on data warehousing. Simply get in touch with us!
Published by:
Marco Ferrari
Marco Ferrari
How did you like the article?
How helpful was this post?
Click on a star to rate!
Average rating 4 / 5.
Number of ratings: 13
No votes so far! Be the first person to rate this post!







