From order to added value: Metadata management with the Data Catalog
- DATA Catalog, SAP Datasphere
- 7 min reading time
Franziskus Heep
A Data Catalog is a central directory that provides information about the data assets available in a company. It helps users find, understand, and use the relevant data sources. A Data Catalog typically contains metadata, i.e. structured data that contains comprehensive information about the actual data .
The goal? To empower everyone in the company – from data experts to business analysts – to quickly understand company data using structured metadata, without needing technical knowledge or struggling to navigate through different systems.
This article provides a brief overview of the current functionalities and application possibilities of the SAP Datasphere Data Catalog.
Table of contents
1. Introduction to the SAP Dataspere Data Catalog
1.1 Navigation
The Data Catalog integrated in SAP Datasphere is accessible via the left navigation bar on the main interface.
1.2 Users
In the Data Catalog, users can be divided into two user groups:
Catalog Administrators
Are responsible for connecting to the source systems and extracting the metadata, as well as enriching it with additional semantics.
Catalog User
Primarily use the catalog to search, discover, and understand the available data.
Both roles must be used in combination with another Datasphere role, such as DW-Viewer or DW-Modeler, to grant the user access.
1.3 Monitoring
In the Monitoring area, the Catalog Administrator manages the connected remote catalog systems and the metadata extractions of the artefacts available in the systems.
During metadata extraction, information about the assets is transferred from the source systems to the catalog.
In addition to the two source systems SAP Analytics Cloud and SAP Datasphere, further SAP applications are to be made available as source systems in the future, with the possibility of connection and automatic extraction of metadata.
The SAP applications for which a replication function for data import already exists in Datasphere have been announced here: SAP HANA, SAP BW, SAP ECC, and SAP S/4HANA.
2. Objects in the Data Catalog
The following five areas exist for managing and using metadata in the Data Catalog: Assets, Terms, KPIs, Data Products, Data Providers
Assets are data or analysis objects in SAP Datasphere and SAP Analytics Cloud, such as local tables, remote tables, views, analytic models, planning models, data flows, intelligent lookups, SAC stories, or predictive scenarios.
Terms serve as a dictionary for business concepts to promote a company-wide, consistent understanding and enable the description of synonyms. Terms can be linked to assets, KPIs, or other terms.
In addition to the description of the key figures, information on the type, threshold values, standard unit, calculation details, detailed documentation and the relationship to other assets, terms or other KPIs can be maintained.
Data Products consist of bundled, reusable data units that are typically domain-specific and specifically designed for concrete business requirements. This concept enables data providers to design their own data products and then publish them via the SAP Data Marketplace externally or internally within the company, for example, for different departments.
Data Providers are individuals or companies that make their Data Products available via the Data Marketplace. Depending on the situation or requirements, data provision can take place internally or externally. They are responsible for developing the data products, managing their lifecycle, and updating the versions.
3. Insight into an Asset
Assets are not created manually, but are generated by extracting the metadata from the connected systems. This metadata extraction is performed by the Data Catalog Administrator and is currently available for SAP Datasphere and SAC.
Overview
When opening a specific asset, such as the “Sales Orders View”, you receive a detailed overview of the relevant metadata.
The name and type of the asset, creation and modification date as well as the DSP space in which the object is located can be viewed in the overview.
The Details section describes the key figures (e.g. "GROSSAMOUNT", "TAXAMOUNT") and attributes (e.g. "BILLINGSTATUS") contained in the asset. These are supplemented by technical information such as data types, aggregation types and other relevant information.
These two areas are automatically maintained by adopting the metadata from the system, without the need for active maintenance by the user.
Semantic Enrichment
The Semantic Enrichment area is not automatically populated by extracting the metadata. To fill this area with "life", manual maintenance of the metadata by a user is required.
As a rule, a term, KPI or tag is first created in the catalog in order to then be linked to an asset.
Once a term or KPI, such as 'Sales Revenue Achievement Rate', has been assigned to an asset, there is a relationship between these objects. This allows the user to navigate from the asset to the linked term or KPI, as well as vice versa from a term or KPI to a corresponding asset.
Additionally, linking an asset with appropriate tags, which can also be structured hierarchically, can facilitate future searching, filtering, and browsing for these objects.
In principle, it is advisable to deal with the maintenance of metadata before or during the actual development or adaptation of a data object. Some information that can be entered in object maintenance in the source system, such as the business purpose, is automatically transferred to the metadata of the asset in the Data Catalog when it is saved. This avoids redundant maintenance of this information.
Currently, there is no way to navigate directly from the interface of the source system in which the development object is being edited to the metadata maintenance of the corresponding asset in the Data Catalog. To do this, it is necessary to start the Data Catalog, manually search for the corresponding asset and open it.
Conversely, the Data Catalog offers the practical possibility of navigating directly from metadata maintenance to the DSP object maintenance of the opened asset.
3.1 Lineage and impact analysis
Lineage and impact analysis is particularly helpful for understanding data. Lineage shows the objects that use the analyzed asset as a target, while impact shows the objects that use the asset as a source. Currently, this analysis is only available for assets as a central element. However, SAP plans to make this functionality available for data products and other elements in the near future.
The graphical representation helps catalog users to better understand the origin and use of the data, to assess the impact of changes to the data and to make informed decisions in data modeling and management.
In the impact and lineage analysis, the examined asset (DSP Analytical Model: Sales_Order_AM) is highlighted centrally and in color. To the left of it, the data origin and to the right, the data usage are graphically processed. Since the automatic extraction of metadata from the systems SAP HANA, SAP BW, SAP HANA and SAP S/4HANA into the Data Catalog is not yet possible, there are also no corresponding assets of these systems. Therefore, they are not included in the analysis of the data origin.
In the lineage analysis, the origin can be traced back to the Datasphere space. In the example above, these are the remote tables of the transactional S/4 tables MARA, VBAP, MAKT and VBAK in the "Showroom S/4 Living Company" space.
On the right side, the impact analysis shows how the Analytical Model is used within an SAC story. It should be noted that only objects to which the user also has access are displayed.
I find the possibility of previewing the most important metadata of each node, which the user can display as a tooltip, very helpful for understanding the data.
Currently, lineage and impact analysis in the Data Catalog can only be implemented at the data object level. Tracking the data origin at the level of individual columns is not possible. In order to carry out such a detailed tracking, it is necessary to switch from the Data Catalog to the data basis in the SAP Datasphere.
The disadvantage is that no objects from the target systems, such as the SAP Analytics Cloud, are displayed.
4. Insight into a KPI
In addition to the ability to define KPIs and add descriptions, the Data Catalog offers the option of defining thresholds and then displaying them graphically.
KPIs can also be related to each other. This works analogously to relations between other objects in the Data Catalog and will be explained in more detail in the next section.
5. Insight into an Appointment
Terms serve as a dictionary for business terms to promote a company-wide, consistent understanding. Terms can be bundled in glossaries for better organization.
Once a term has been defined and maintained, any number of relationships between the objects of assets, KPIs, and other terms can be defined under Manage Relationships.
Terms can be managed as separate objects in different languages. In this example, the term "Billing Status" and its corresponding German equivalent "Fakturastatus" were created and maintained. Subsequently, the asset "Sales Orders View" within the Semantic Enrichments was linked to the term "Billing Status".
In addition, the two terms “Billing Status” and “Fakturastatus” (which also means “Billing Status” in German) have been linked together through a relationship. In this way, the Data Catalog enables the metadata of an asset to be enriched so that the user can navigate through different languages.
6. Conclusion
The Data Catalog makes it easier to make knowledge, which was previously only available locally in one place, centrally searchable and easily accessible. The Catalog User requires little technical knowledge.
As with other concepts, it will be important how consistently metadata management is “lived” in the company. Only with consistent maintenance and use can a tool like the Data Catalog unleash its power and generate added value.
I personally think that the Data Catalog is well integrated into the Datasphere and is a useful tool for metadata management overall.
The central documentation, as well as the use and search of metadata, helps me to better find my way around the multitude of data models and their mutual relationships and dependencies.
I find the graphical representation of the lineage and impact analysis across different systems particularly valuable, as it visualizes the dependencies and effects in a structured way.
It would be a considerable added value for me personally if other systems could also be mapped in the lineage and impact analysis.
SAP appears to be positioning the Data Catalog as a central metadata management tool integrated into Datasphere and is continuously announcing new features for future releases.
While some features are still missing today, SAP is planning to include the ability to import specific metadata via Excel and the metadata extraction for S/4HANA (e.g., CDS View), HANA Cloud, and ERP ECC mentioned in the article for the 2025 release.
In addition, the strategic partnership that SAP entered into with the data catalog provider Collibra in 2023 gives me hope that other applications besides SAP data sources can also be integrated in the future.
Know more?
Published by:
Franziskus Heep
Professional Analytics consultant
Franziskus Heep
How did you like the article?
How helpful was this post?
Click on a star to rate!
Average rating 4.8 / 5.
Number of ratings: 25
No votes so far! Be the first person to rate this post!






