SAP Data to Databricks: A Strategic Guide to Data Integration
- Databricks
- Databricks
- 7 min reading time
Dr. Andreas Wagner
Crew? Ready. Rocket? Selected. Now comes the question that will determine whether the mission succeeds or is aborted: How does the fuel actually get into the system?
Historically, SAP systems have operated as closed ecosystems. Valuable business data was locked into proprietary structures, making it a resource-intensive challenge for modern analytics platforms and artificial intelligence to access. In today’s data economy, where the speed of information processing is the primary competitive advantage, this isolation is no longer sustainable.
With the launch of the SAP Business Data Cloud (BDC), SAP is making a strategic shift toward an open ecosystem. This transformation enables companies to use the Databricks Data Intelligence Platform not just as an external target system, but as an integral part of a hybrid architecture.
Table of contents
1. The Six Paths of Data Extraction
Choosing an integration method is a decision about scalability, latency, and preserving business semantics. In other words, it’s about laying the groundwork before you can even think about getting started. Sounds logical, right? In practice, five approaches have become established.
Method 1: JDBC/ODBC (Infrastructure-based Direct Access)
This traditional approach uses the SAP HANA JDBC driver for a point-to-point connection. It is a technically direct method in which queries are executed directly against the database schema. While this works for occasional ad-hoc analyses, the model reaches its limits when dealing with massive data volumes. Since there is no native Change Data Capture (CDC), incremental logic must be implemented manually using timestamps. Additionally, serializing the data on the application server strains system resources, which often leads to performance bottlenecks.
Appropriate use cases for this approach:
- Rapid Prototyping & PoCs: When quick results are needed for a proof of concept and the infrastructure for BDC or OData has not yet been fully implemented.
- Extraction of master data: For tables with a small volume and a low change rate (e.g., company codes, plants, or material types), where a simple batch extraction is sufficient.
- Data Discovery: When analysts need to run ad hoc exploratory queries directly on HANA shadow views without setting up a permanent pipeline.
- Systems without ODP licensing: In legacy environments where the ODP framework or modern connectors are not yet technically supported.
- Architectural Risk: According to SAP Note 2415336, this approach results in complete semantic decoupling. Since the SAP application server is bypassed, all metadata, currency conversions, and security policies (DCL) at the application level are missing.
- The TCO Trap: All business logic must be manually reconstructed in Databricks, which results in high data engineering costs and duplicate maintenance of the data model.
Method 2: SAP SLT (Trigger-based real-time replication)
The SAP Landscape Transformation Replication Server captures data changes at the table level using database triggers in near real time.
Technical complexity: Implementing SLT is highly complex and resource-intensive. It requires a dedicated server infrastructure as well as in-depth expertise to configure the delta mechanisms and monitor the database triggers. Since replication occurs at the purely technical table level, all business relationships, joins, and semantic logic (e.g., currency conversions) must be reconstructed entirely manually in the target system, Databricks. This results in significant initial development and maintenance effort.
When is this approach worthwhile? Despite the significant effort involved, SLT is the method of choice for:
- Mission-Critical Real-Time: Processes that require latency on the order of seconds (e.g., just-in-time production, inventory management in high-speed logistics centers).
- Continuous data stream: Scenarios in which there is no batch window due to 24/7 operations and data must flow continuously.
- Legacy integration: When source systems do not support modern OData or zero-copy interfaces but still need to provide real-time data.
- Strategic Deadline: A standalone SLT server is based on SAP NetWeaver 7.5. According to SAP Note 2881788, mainstream maintenance for this will end on December 31, 2027.
- Loss of business logic: Since raw tables are usually replicated, business relationships (e.g., hierarchies, currency conversions) must be laboriously recreated in Databricks.
Method 3: SAP Datasphere Premium Outbound (Semantic Abstraction)
Here, SAP Datasphere acts as an intelligent orchestration layer. It aggregates data from S/4HANA or BW systems and makes it available via coordinated replication flows. The key advantage lies in preserving the business context. The data is semantically processed within the SAP security envelope. When written to object storage (ADLS/S3), the logical structure is preserved, which significantly accelerates subsequent processing in Databricks.
Appropriate use cases for this approach:
- Enterprise reporting based on harmonized data: When data from multiple SAP sources (e.g., different company codes from various ERP instances) has already been consolidated in Datasphere and is to be transferred to Databricks as a unified dataset.
- Governed Data Products for Business Units: Provision of certified data models in which compliance with business rules and permissions has already been ensured at the Datasphere level.
- Avoiding logic replication: Scenarios in which deeply nested CDS view hierarchies or complex calculations (e.g., calculated metrics) should not have to be laboriously reconstructed in Databricks.
- Hybrid data integration: When Datasphere serves as a bridge to securely and reliably replicate data from on-premises systems (e.g., SAP BW/4HANA) to cloud storage for use in Databricks.
Method 4: Third Party (Compliance Risk ODP-RFC)
Specialized ETL tools (such as Theobald Software, Fivetran, or hyperscaler services) have bridged the gap between SAP and third-party platforms for years. However, this approach is facing a critical turning point that requires a deep understanding of the underlying protocols.
The Role of the RFC Interface: The Remote Function Call (RFC) is the technological backbone of the SAP world. It allows external systems to directly call function modules within the SAP ABAP stack. Third-party vendors primarily use RFC as a transport protocol for the Operational Data Provisioning (ODP) framework.
Why was this route used?
- Performance: Compared to HTTP-based interfaces, RFC offers very high data throughput.
- Delta Capability: Through ODP-RFC, third-party providers could natively access the delta queues of SAP extractors. This enabled efficient Change Data Capture (CDC) for large tables without overloading the source systems with full extractions.
- Universal access: It served as the "master key" for accessing classic SAPI extractors (Service API extractors), BW providers, and modern ABAP CDS views.
What has changed?
In SAP Notes 3255746 and 3439624, SAP declares the ODP-RFC API to be exclusively for communication between SAP applications. Technically, this means that SAP is introducing a validation mechanism: Starting in June 2026, a security patch will check the “subscriber type” for every ODP call. If the application is a non-SAP application, access will be blocked. SAP is thus enforcing a transition to more modern, more controllable interfaces such as ODP-OData or the strategic zero-copy approach (Method 5) to minimize security risks and better enforce intellectual property rights at the data level. However, SAP also severely restricts the use of OData services for data export from non-SAP systems under section 2.2.2 of the SAP API Policy.
Method 5: Zero-Copy Delta Share
The architectural paradigm shift—or the cleanest way to get started?
This is the most efficient method of modern data integration—and, from an architectural standpoint, the most elegant approach that Systemcheck has to offer. Instead of laboriously pumping data through cables, physical data movement is eliminated entirely.
Based on the open Delta Sharing Protocol, BDC Connect generates cryptographically secured short-term links (pre-signed URLs). Databricks worker nodes access the original data in the SAP Object Store directly—without any intermediary application servers, without redundant data copies, and without bottlenecks. The result is virtually unlimited scalability with maximum data protection.
Rules for data persistence:
- Temporary storage: Storing copies of data in the target system (e.g., Databricks) is permitted solely for the purpose of performance optimization. This prevents repeated queries of identical data records and improves the user experience. However, permanently storing the unmodified original data as the primary access point is prohibited.
- Permanent Storage Through Enrichment: As soon as SAP data is transformed, aggregated, or enriched using AI models (e.g., joins with third-party data), the result no longer qualifies as a pure SAP data product. This enriched data may be persisted indefinitely in the target system (e.g., Databricks Delta Lake).
- Distribution Restriction: The data products provided may be used within the connected platform but may not be redistributed to downstream systems outside the SAP or partner ecosystem (no pass-through).
Method 6: ABAP SDK for Hyperscalers
The Autonomous "Clean Core Approach" – Maximum Control Without Middleware
While other methods rely on data being retrieved (pull), the SAP system takes control here. Using the ABAP SDK (available for AWS, Azure, and GCP), the ABAP stack actively writes data with high performance directly to the cloud storage (e.g., S3, Cloud Storage, or ADL Gen2) on which the Databricks environment operates. This approach is the architectural answer to the “Clean Core” paradigm.
Benefits of cloud-native integration:
- Elimination of middleware: No additional ETL layer is required. Communication takes place directly between SAP and the hyperscaler’s storage, using encryption.
- Performance & Scalability: Since data is pushed in the highly efficient Parquet or JSON format, there is no serial load on the application server—a common bottleneck in traditional RFC queries.
Rules for data provision:
- Storage-based ingestion: The data is first stored in a “landing zone” within the cloud storage. From there, Databricks handles the orchestration. This enables a strict separation between SAP transaction workloads and analytics computing.
- Governance: Control over the timing and scope of data extraction remains entirely within the SAP system, which simplifies internal compliance.
- Scalability vs. Effort: While implementation via the ABAP SDK offers maximum performance and independence, it requires a greater initial development effort compared to BDC Connect.
Andreas & Yvonne's Databricks-Guide
Would you like all the important information at a glance?
Download the free guide to SAP Databricks now!
2. Technical Requirements
Technical Comparison of Integration Parameters
Regulatory Requirement: The ODP-RFC Migration in 2026
SAP Note 3255746 defines the ODP-RFC API as being exclusively for communication between SAP applications. For all third-party tools, this marks the end of an established access method that is now classified as insecure.
Companies that continue to rely on RFC-based extractors must have implemented a new target architecture by Security Patch Day on June 9, 2026.
Universal Governance: End-to-End Security
In a hybrid data landscape, security must not stop at system boundaries—just as a control center is not responsible for only half of a space mission. Modern architecture requires seamless identity harmonization:
- Microsoft Entra ID serves as the central identity provider.
- The Identity Provisioning System (IPS) automatically synchronizes identities with the SAP Cloud Identity Service (IAS).
- Unity Catalog (Databricks) uses the same identities for granular access controls at the table, row, and column levels.
The result: Role changes or deactivations in Entra ID take effect immediately and consistently—from SAP Datasphere to Databricks Notebook.
Cost-effectiveness and TCO analysis
Traditional integrations tie up significant resources in manual pipeline management (data plumbing). Often, up to 80% of project resources are spent on maintaining these connections rather than on actual value creation. The zero-copy approach reduces this total cost of ownership (TCO) by addressing key cost drivers:
- Infrastructure fees: Costs are primarily incurred through inter-region data transfers and processed API requests. However, since there is no need to manage redundant storage copies in third-party systems, gross storage costs are reduced.
- Premium Fees: When using partner platforms such as Snowflake or Microsoft Fabric, a surcharge is applied based on the compute workload of the SAP data products. This surcharge does not apply when using native OEM solutions within the SAP ecosystem.
- Operational costs: Eliminating complex ETL maintenance significantly reduces staffing costs for data engineers.
Compared to the licensing fees for third-party tools and the massive manpower required for traditional pipeline development, the zero-copy approach offers a cleaner architectural design and a cost structure that is scalable in the long term.
3. Conclusion and Recommendations for Action
The choice of integration method is a fundamental decision for the future viability of the data strategy. While legacy interfaces face regulatory and technical pressures, the SAP Business Data Cloud offers a path to a scalable and open ecosystem.
Immediately identify all RFC-based interfaces using the RODPS_REPL_SUBSCRIBER_ASSESS report.
Replace high-maintenance ETL pipelines with BDC Connect to immediately leverage the innovative power of Databricks Mosaic AI on your SAP data.
Implement SAP Databricks for in-depth SAP context analysis and use BDC Connect as a bridge for global enterprise data integration.
Ready for the next step? Our experts will guide you every step of the way, from the audit phase through to the successful integration of your data.
Your data strategy is unique—your consulting should be too.
The choice between these methods depends on countless factors: your existing system landscape, your business goals, and your data culture. There is no standard answer.
Let's talk about which path is right for you, with no obligation. Contact us for a personal consultation.
Published by:
Dr. Andreas Wagner
Customer Success Executive
Dr. Andreas Wagner
How did you like the article?
How helpful was this post?
Click on a star to rate!
Average rating: 4.7 / 5.
Number of reviews: 29
No votes so far! Be the first person to rate this post!






