SAP’s latest announcement of a strategic partnership with Databricks has generated vital pleasure amongst SAP prospects. Databricks, the info and AI specialists, presents a compelling alternative for leveraging analytics and ML/AI capabilities by integrating SAP HANA with Databricks. Given this collaboration’s immense curiosity, we’re thrilled to embark on a deep dive weblog sequence.
In lots of buyer eventualities, a SAP HANA system serves as the first entity for knowledge basis from varied supply methods, together with SAP CRM, SAP ERP/ECC, SAP BW. Now, the thrilling risk arises to seamlessly combine this strong SAP HANA analytical sidecar system with Databricks, additional enhancing the group’s knowledge capabilities. By connecting SAP HANA with Databricks, companies can leverage the superior analytics and machine studying capabilities (like MLflow, AutoML, MLOps) of Databricks whereas harnessing the wealthy and consolidated knowledge saved inside SAP HANA. This integration opens up a world of prospects for organizations to unlock beneficial insights and drive data-driven decision-making throughout their SAP methods.
A number of approaches can be found to entry SAP HANA tables, SQL views, and calculation views in Databricks. Nevertheless, the quickest approach is to make use of SAP Federated ML Python libraries (FedML) which might be put in from the PyPi repository. Probably the most vital benefit is SAP FedML package deal has a local implementation for Databricks, with strategies like “execute_query_pyspark(‘question’)” that may execute SQL queries and return the fetched knowledge as a PySpark DataFrame.
Allow us to begin with SAP HANA and Databricks integration
In an effort to take a look at this integration with Databricks, SAP HANA 2.0 was put in within the Azure cloud.
Put in SAP HANA data in Azure:
|Working System||SUSE Linux Enterprise Server 15 SP1|
Right here is the high-level workflow depicting the totally different steps of this integration.
Please see the hooked up pocket book for extra detailed directions for extracting knowledge from SAP HANA’s calculation views and tables into Databricks utilizing SAP FedML.
As soon as the above steps are carried out, create the Dbconnection utilizing the config_json, which carries the SAP HANA connection info.
db = DbConnection(dict_obj=config_json)
Begin creating the dataframes utilizing the execute_query_pyspark API from FedML and passing within the choose question as proven under with schema, desk title.
df_sap_ecc_hana_vbap = db.execute_query_pyspark('SELECT * FROM "ECC_DATA"."VBAP"')
To get knowledge from the Calculation View, we now have to do the next:
For instance, this calculation view is created within the inside schema “_SYS_BIC”.
This code snippet creates a PySpark dataframe named “df_sap_ecc_hana_cv_vbap” and populates it from a Calculation View from the SAP HANA system (on this case, CV_VBAP).
df_sap_ecc_hana_cv_vbap = db.execute_query_pyspark('SELECT * FROM "_SYS_BIC"."ecc-data-cv/CV_VBAP"')
After producing the PySpark knowledge body, one can leverage Databricks’ infinite capabilities for exploratory knowledge evaluation (EDA) and machine studying/synthetic intelligence (ML/AI).
Summarizing the above knowledge frames:
The main target of this weblog revolves round SAP FedML for SAP HANA, nevertheless it’s price noting that different strategies corresponding to sparkjdbc, hdbcli, and hana_ml can be found for related functions.