SAP, founded in 1972 by five former IBM employees, revolutionized enterprise software with its vision of integrating business processes and enabling real-time data processing. Initially focusing on financial and inventory management, SAP evolved from a small German company into a global leader.
The launch of R/3 in 1992 marked a major milestone, propelling SAP to the forefront of the global business software market. Over the years, SAP expanded its reach with acquisitions and innovations, including the introduction of SAP HANA (2011) and the move to the cloud. Today, SAP serves millions of users worldwide, offering an extensive suite of integrated business solutions across various industries.
The introduction of HANA brought improvements in speed, performance, and Data Architecture for SAP Systems, with BW/HANA offering a full Warehouse offering inside the SAP Ecosystem. The shift to HANA backed systems from recommended to ultimately mandated was an early indication of the general direction SAP was taking in terms of increased lock-in for SAP users.
BW/HANA in combination with SAP Data Services (BODS), Smart Data Integration and Smart Data Access offers substantial ETL and modelling capabilities. However, as with other traditional Warehouse solutions, it has limitations when compared to a modern Lakehouse.
For instance, there is a lack of data democratization with much of a company’s data out of reach of the business. Data governance and integrations with third-party systems relies heavily on specialist SAP knowledge, or being handled with additional external tools. Likewise monitoring and scaling of the storage and compute resources requires significant resource and time commitments with limited elasticity.
Databricks mitigates these issues through features such as Unity Catalog for Data Governance and Democratization. Partner Connect, Lakehouse Federation, and Drivers for broad third party system interoperability. Cloud based compute and storage for scalability, redundancy and performance management. All supported by common languages such as Python, SQL and Scala in an intuitive UI and simple architecture, which lowers the barrier to entry for platform users and administrators.
Data Engineers looking to achieve interoperability with SAP Systems will know of the complexities this can entail. There are various connectivity protocols available to interact with SAP Systems each with unique strengths and limitations, some of which are discussed below.
HANA J/ODBC protocols enable third party tools to execute SQL logic directly against HANA objects, via direct queries for BI use cases or within ETL processes. These integrations can include watermarking logic through bespoke queries to achieve CDC, a functionality which is harder to achieve with alternative connectivity approaches. This can be particularly useful when used in the HANA connector in Azure Data Factory to improve the performance of batch processes.
In addition, the HDBCLI Python package provides cursor based API access to HANA systems, enabling SQL queries to pull HANA data within Python code, including Databricks Notebooks. However this method is not suited to bulk or high concurrency operations due to the resource implications at source and for Databricks compute.
Operational Data Provisioning (ODP) released in SAP BW 7.40 and has evolved into the recommended successor for traditional SAP BW Extractors. ODP was developed with the purpose of improving data extraction from SAP sources through increased efficiency and flexibility.
A major benefit in ODP is native Change Data Capture for ABAP CDS Views, a feature that is present in the ‘SAP CDC’ connector for Azure Data Factory. The connector utilises checkpoints within the SAP source to enable robust delta ingestion processes against Curated objects.
However, per ‘SAP Note 3255746’ published in February 2024, the ODP data replication API is not permitted for use by third-party applications. At present, SAP customers would need to consult their Licencing to understand to what extent this note can be enforced. However the warning is clear that alternative connectivity approaches should be sought.
Open Data Protocol (OData) connectivity for SAP is presented as a potential alternative to the ODP approach and can be used to access SAP via REST API. The approach is suited to web apps such as Logic Apps and the Power Platform.
The method is appropriate for lightweight applications but is not suited to bulk operations or batch processing due to poor performance and high resource costs. Integrations also become complex to create and manage when handling evolving source schemas and metadata.
SAP Databricks is a new fully managed SaaS solution sold by SAP within the SAP Business Data Cloud, which enables the management and governance of SAP Data as well as seamless connectivity with third party systems.
SAP Databricks will rollout on Azure, AWS and Google Cloud over the coming months in a phased release. The solution will enable direct interaction with existing Unity Catalog enabled Databricks Workspaces via Delta Sharing. Enabling near real-time data access with built-in security and compliance.
An immediate benefit of SAP Databricks, is the fact it is native and fully supported by both SAP and Databricks. This provides confidence in the longevity of data engineering processes and investment of development time to create them. This peace of mind is particularly valuable when comparing to the ODP method per the reasons noted above.
This also comes with a substantial investment from both SAP and Databricks in supporting successful adoption of the technology, Databricks has earmarked 250 million dollars specifically for supporting customers on adopting the new technology.
SAP Databricks also offers significantly improved data latency, as discussed by Ali Ghodsi in a case study on a large oil & gas corporation. In this instance the SAP Databricks implementation reduced the latency for inventory levels from 48 hours down to near real time at 45 mins.
This enhanced latency is possible due to the Delta Sharing approach which enables direct access to clean, curated and context-rich data products with business semantics already incorporated. The completeness and reliability of this data also poses the opportunity to skip the Bronze Layer stage within the medallion architecture, in instances where no deletes occur at source.
Delta Sharing offers a view of the data as it exists in source, and can remove the requirement for complex CDC processes and the need to physically copy data into the target system. Which results in a reduction of processing costs and lowering the overheads for initial development and ongoing maintenance of ETL processes.
The solution also offers the potential to simplify SAP Administration by removing the need for complex access controls and account administration in SAP, with the Access Controls passed handled in Unity Catalog in the target instance.
The immediate consideration with the adoption of this new tooling is the costs associated with the SAP BDC licensing. The development of this solution does align with the general market position of SAP to prioritise usage of licenced SAP products for Data Engineering solutions.
However, SAP Databricks does illustrate a recognition of businesses requiring tools that better combine their SAP and wider data estate. When considering the limitations of existing interoperability approaches and the clear benefits that SAP Databricks can offer, this commercial decision may be easier to make.
Early Adoption can be challenging and there will be a need for SAP Teams to upskill and invest time and resources to implement and support this new approach. However as noted above there is substantial investment from Databricks to support clients on successfully adopting and utilising SAP Databricks.
We’re Advancing Analytics, a multi-award-winning Databricks partner with 50+ Lakehouse-certified developers, 10 Databricks Champions, and 2 Databricks MVP's!
We build, optimise, and accelerate Data Lakehouses - helping businesses unlock faster insights, modernise architectures, and adopt the latest innovations like Unity Catalog and Generative AI.
Whether you’re tackling an SAP migration or exploring how to streamline your data landscape, we’re here to guide you through it. Get in touch - we’d love to help you level up your data game and make your Lakehouse journey a smooth one!
*Disclaimer: The blog post image was generated by AI and does not depict any real person, place, or event.