Why you need a data warehouse: definition, architecture and benefits
- Data Science, Data Warehousing
- Data Warehousing, datasphere, DWC, DWH
- 20 June 2022
- 8 min reading time

Marco Ferrari
What is a data warehouse (chapter 1), how is it structured (chapter 2) and why do I need a data warehouse as an organisation (chapter 3)? These questions are answered in the following knowledge article.
Table of contents
1. What is a data warehouse?
1.1 Definition of data warehouse
"A data warehouse is a subject-oriented, integrated, time-period-related, and non-volatile set of data to support management's decision-making, planning, and controlling processes". (Müller & Lenz, 2013, p. 14)
Data warehousing can be seen as one of the most important components of business intelligence. It is understood to be the entire process that is carried out in a data warehouse. The individual steps of this process are described below:
- The process starts with data collection, which is based on external as well as internal sources of a company.
- This is followed by data preparation and the subsequent data transformation.
- Finally, the processed and transformed data is stored in the data warehouse - data storage. (Müller & Lenz, 2013, p. 11)
These process steps suggest that the process basically only describes the database maintenance of collected data. However, this assumption is far from the truth: reality shows that this broad process consists of many individual processes that require an enormous amount of work. An example of this is the ETL process (Extraction, Transformation and Loading), which describes how the data assigned to the various departments is transferred to the data warehouse and linked together. (Müller & Lenz, 2013, p. 11-12)
A detailed description of the characteristics of a data warehouse as well as the term "cloud computing" can be found in this Wiki article.
1.2 Where does the idea of the data warehouse come from?
Historically, the term data warehouse first appeared in 1988. Barry A. Delvin and Paul T. Murphy were the first to describe the use of a central location for data collection. (Köppen, 2014)
In IBM's "Systems Journal", they described that due to the advancing use of IT systems in companies, the need arose to be able to access the company data stored in the respective systems. Based on this need, it was evident that an architecture or a system was necessary that would bring together the collected company data from the various IT applications and enable uniform access to it. On almost 20 pages, Delvin and Murphy (1988) presented the specially constructed EBIS. The E stands for the abbreviations of the regions Europe (E), Middle East (ME) and Africa (A) and the letters BIS stand for Business Information System. This system, developed internally by IBM, was intended to make it possible for the first time to bring together IT data from the various regions and is thus regarded as the first example of a data warehouse system. (Devlin & Murphy, 1988, p. 60)
2. How is a data warehouse structured?
There are various requirements for a data warehouse, which can vary greatly depending on the industry and company. Therefore, there is no fixed architecture for a data warehouse. In the course of time, the structures of data warehouses have therefore changed again and again and different modelling variants have emerged that describe the architecture of a data warehouse. (Hahne, 2014)
In many companies, data warehouse concepts did historically not emerge based on a proactive plan towards the implementation of business intelligence. Instead, they emerged in the course of the ongoing development of IT system landscapes in the respective companies. When setting up a data warehouse, there are various technical as well as business aspects that need to be considered. It is therefore advantageous if the architecture of a data warehouse is planned and thought through from the ground up. This structured approach provides advantages in the further course, both in the use and in the further development of the data warehouse. (Gluchowski, 2021)
Due to the advantages of a structured architecture for a data warehouse, a basic structure of a data warehouse is presented below. Subsequently, some examples of different architecture variants are presented.

Figure 1 shows the basic structure and necessary components of a data warehouse. Both the data from the various sources (Figure 1, area 1) and the ETL processes that load the various data into the 3rd area and the data warehouse (core data warehouse) shown here are included in the overall concept of a data warehouse. The basic concept of a data warehouse also includes the provision of data from the core data warehouse for the creation of data analyses and reports. (Müller & Lenz, 2013)
The following five best-known modeling variants are described in more detail:
Independent Data Marts - Stove Pipe
Data Marts with common dimensions
Dependent Data Marts - Hub and Spoke
Federated architectures
Data warehouse layer architecture
2.1 Independent Data Marts - Stove Pipe
The first modeling variant of the architecture of a data warehouse is referred to as 'independent data marts'. This describes the case when individual departments in a company begin to build their own business intelligence solutions. Such an approach results in each department building its own data mart. Thus, from the company's point of view, the core data warehouse consists of several departmental data marts that are independent of each other. This modeling variant is to be seen as a negative example, since in this case there is no "single point of truth" in the company that represents the common data truth. This is due to the fact that the data in the respective data marts is divided into different dimensions and is also labeled differently. As a result, the data from the respective departments cannot be compared with each other and, in addition, a separate ETL process must be created for each department or data mart. (Müller & Lenz, 2013, p. 21)
2.2 Data Marts with common dimensions
In another modeling variant, the data marts are connected to each other via common dimensions. This ensures that the individual data marts are no longer independent of each other. This makes it possible to switch back and forth between data marts that are linked to each other via a specific dimension. As an example, Müller and Lenz list the link between the sales data mart and the personnel deployment data mart. These two cubes are linked, for example, by the identical branch structure for both cases and thus by the location dimension. (Müller & Lenz, 2013, p. 21)
2.3 Dependent Data Marts - Hub and Spoke
The modelling variant with dependent data marts is often referred to as the hub and spoke architecture and, along with the architecture with data marts with common dimensions, is the most frequently used architecture in practice. In this case, the data marts are supplied with data by the ODS according to a predefined schema that applies to the entire company. This approach represents the opposite of independent data marts and follows the so-called top-down principle. (Müller & Lenz, 2013, p. 22)
2.4 Federated architectures
The federated architecture of a data warehouse is used when the data in the source systems is to be accessed directly and not extracted by an ETL process. In this case, the data is accessed virtually during queries so that it can be included in the analyses. This approach is used when data is fast moving or managed by third parties. The main components in the federated architecture are the wrapper and the mediator. The wrapper provides access to the data in the source systems and aligns the data, whereas the mediator coordinates the queries and connects the results.
2.5 Data Warehouse Layer Architecture
In addition to the classic structures of a data warehouse, another basic architecture for a data warehouse has now become established. This is the layered architecture, in which the individual components and processes of the data warehouse are divided into different layers. (Gluchowski, 2021)
Gluchowski also points out that in practice there is no uniform specification of how the layers are structured and named. Therefore, the following is an example of an ideal-typical layer architecture by Gluchowski.
As Figure 2 illustrates, the start and end points of the layered architecture correspond to the start and end points of the basic architecture of a data warehouse shown in Figure 1. In both architectures, the external sources are the origin and the analyses and reportings are the destination of the data warehousing process.
3. Why do you need a data warehouse?
Data warehousing is an essential component of business intelligence. Often, the term data warehousing is also used to describe the process behind business intelligence. The literature shows that the boundaries between the processes behind the two terms are not clearly defined and merge into each other.
Müller and Lenz (2013) define the term data warehousing as follows:
"Data warehousing is the business process that includes data procurement from internal and externally accessible sources, data transformation and preparation according to source and target database schemas, data quality assurance and storage in the (central) data warehouse or in (decentralised) data marts (user views) and OLAP-based data analysis". (Müller & Lenz, 2013, p. 11)
Müller and Lenz (2013) state in their book on business intelligence that data warehousing accounts for 25% of the total concept described by the word business intelligence. This contribution is complemented by 25% from data mining and statistics and 50% from business administration and operations research. (Müller & Lenz, 2013, p. 15)
Therefore, the question "Why is a data warehouse needed?" can be answered with the fact that a data warehouse and the process of data warehousing are necessary to carry out business intelligence in a company. In order to clearly answer the question of "why" in terms of a data warehouse, the question of the motivation for business intelligence in a company must be answered first.
Excursus: Business Intelligence
For more than 60 years, IT-based management support has been steadily advanced in companies. The focus was usually on supporting executives by using accumulated knowledge and experience in the form of IT-based figures, data and facts for future management decisions or for determining strategic action. However, it took several years until the first functional and successfully implemented systems were developed. At the beginning, these were still task-oriented. (Kemper, Baars, & Mehanna, 2010, p. 1)
The term "Management Support Systems" was coined in the 1980s to summarize the variety of existing information and communication systems. It is still used in science today. This is because it is applied to describe how IT-based management support not only refers to the use of computers, but is equally aimed at supporting management with the help of information and communication technologies. (Kemper, Baars, & Mehanna, 2010, p. 1)
The term Business Intelligence basically describes IT-based management support and has been used constantly since the 1990s to describe processes and applications related to IT. The term can mainly be traced back to considerations of the Gardner Group, which deals with market research and the analysis of developments in IT. It should also be noted that the term Business Intelligence cannot be clearly defined or described. The variety of definitions for this collective term is great and leads, among other things, to the fact that this term has continuously ensured that the principle of IT-based management support is discussed and constantly rethought. (Kemper, Baars, & Mehanna, 2010, p. 2)
With the knowledge of how a data warehouse works and why companies decide to implement business intelligence, the question of why a data warehouse is needed can be answered as follows:
On the one hand, there is the fundamental motivation behind the word Business Intelligence. This includes the goal of enabling IT-based, strategic management support. The aim is to provide management with a sound basis for making strategic decisions based on the linking and analysis of company-wide data.
This is where the data warehouse comes in, because the functionalities of a data warehouse enable companies to unite all data and to model them in such a way that complex queries as well as analyses based on the data stocks are possible.
Bibliography
Devlin, B. A. & Murphy, P. T. (1988). An architecture for a business and information system. IBM Systems Journal, 60-80. https://www.semanticscholar.org/paper/An-Architecture-for-a-Business-and-Information-Devlin-Murphy/c22ce1eeafb01f0682e194a2a22349aa141b78f6
Gluchowski, P. (2021, 26 February). Data Warehouse. Encyclopaedia of Business Informatics. https://enzyklopaedie-der-wirtschaftsinformatik.de/wi-enzyklopaedie/lexikon/daten-wissen/Business-Intelligence/Data-Warehouse
Hahne, M. (2014). Modelling Business Intelligence Systems: Guide for successful projects based on flexible data warehouse architectures. Edition TDWI. TDWI Europe. http://site.ebrary.com/lib/alltitles/Doc?id=10904187
Kemper, H.-G., Baars, H., & Mehanna, W. (2010). Business Intelligence - Fundamentals and Practical Applications: An introduction to IT-based management support (3rd , revised and expanded edition). Studies in Business Informatics. Vieweg + Teubner.
Köppen, V. (2014). Data Warehouse Technologies (2nd ed.). mitp Professional. mitp. https://ebookcentral. proquest.com/lib/kxp/detail.action?docID=6166362
Müller, R. M. & Lenz, H.-J. (2013). Business Intelligence. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-35560-8
Would you like to delve deeper into this topic? Then we look forward to a personal exchange on data warehousing. Simply get in touch with us!

Published by:

Marco Ferrari

Marco Ferrari
How did you like the article?
How helpful was this post?
Click on a star to rate!
Average rating 3.8 / 5.
Number of ratings: 8
No votes so far! Be the first person to rate this post!