>

We are thrilled to announce that the latest release of SAP Data Hub, version 2.3, is now available.

This release has a fresh new look and an easy-to-follow user interface. Now you can rely on SAP Data Hub to help you manage metadata assets, enable governance, and accelerate data-driven processes across the entire landscape in a more efficient manner.

All components in this version are fully containerized, which means the underlying components, such as engines, agents, metadata store in SAP HANA, are all running in an isolated execution environment within Kubernetes. As such, it tremendously simplifies the installation process and reduces the deployment time.

Let’s go through the key features and explain how they benefit you.

1. Single Entry Point for All SAP Data Hub Applications

SAP Data Hub launchpad, a modern-looking user interface, is designed to provide a single point of access to all user-facing applications, including system management, monitoring, SAP Vora tools, connection management, metadata explorer, and a pipeline modeler. Additionally, there is a system management with life-cycle and repository capabilites introduced.

2. Simplified Deployment of SAP Data Hub in Cloud and On-Premise Environments

Having a fully containerized architecture allows you to deploy SAP Data Hub on any platform that supports Kubernetes. This includes managed cloud services: AWS (EKS), GCP (GKE), Azure (AKS), private cloud, or on-premise installations like Suse CaaSP.

Furthermore, we join forces with Cisco to provide a turn-key enterprise-scale solution that fosters a seamless interplay of powerful hardware and sophisticated software. With the Cisco Container Platform running on its hyper converged hardware solution Cisco Hyperflex, Cisco provides an elastically scalable container cluster with upstream Kubernetes. We complemented it with a scality ring object store and AVI networks load balancers to form the perfect foundation for running SAP Data Hub on premise on enterprise-scale hybrid cloud environments.

Starting with this release, all necessary components including SAP HANA and SAP Vora’s distributed runtime engines are delivered containerized via a docker registry. This removes the need to install a separate SAP HANA instance for external storage or a Hadoop cluster for SAP Vora’s runtime executions.

The major advantage of following a fully containerized architecture is to enable the data processing layer to be separated by ideally co-located with the main data storage. By removing the requirement to install SAP HANA on a separate server, the installation process becomes much leaner and easier. All major cloud storage platforms, HDFS as well as on-premise file shares are fully supported.

3. Meta Data Catalog to Improve Visibility About Landscape-Wide Data Assets

We are introducing SAP Data Hub metadata explorer with the goal to help govern and manage metadata assets that are spread across diverse systems and disparate sources.

Key functionalities include but are not limited to:

  • Connect to data sources with the ability to automatically crawl their meta data structures
  • Create references on data, so-called data sets, and store them in the metadata catalog
  • Search and browse from the metadata catalog to find relevant data assets
  • Discover and profile data within the landscape to get insights on the data quality
  • Out-of-the-box support for SAP HANA, SAP Vora, object stores (S3, GCS, etc.), HDFS, SAP BW, Oracle

With these new features available in SAP Data Hub 2.3, it is now easier for you to manage metadata and enable data-driven processes within distributed landscapes. SAP Data Hub provides an easy way for all data professionals, including data designers, scientists, engineers, architects, and modelers, to get insights regardless of where the data is stored — data warehouse, data lake, cloud storage, etc.

4. Enhanced Data Integration and Connectivity Capabilities

SAP Data Hub provides a broad spectrum of connectivity with a strong focus on Big Data components, such as Hadoop, cloud stores, machine learning services, and real-time messaging technologies. As the product is continuously evolving and adopted by a broader customer base, we understand the need of having native connectivity to a wide range of databases and enterprise applications. We are introducing a new common connectivity framework, which serves as the underlying infrastructure, with the goal to rapidly expand and enhance the native connectivity and integration functionalities especially tailored for structured data sources.

Among many others, SAP Data Hub provides native connectivity to the following sources:

  • Relational databases (Oracle, etc.) and enterprise applications, such as SAP S/4HANA, SAP BW/4HANA
  • Popular cloud storage platforms, such as WASB, S3, and GCS
  • Open protocols msuch as OData and OpenAPI
  • Cleansing and enrichment services via integration of SAP Data Quality Management microservices (DQMm) for location data
  • Machine learning services like SAP Leonardo Machine Learning Foundation services
  • Third-party services and technologies like Spark, Livy, and Google Pub/Sub

Below is a snapshot of the operators that provides native connectivity:

Furthermore, there are improvements in optimization for ingesting data streams into SAP Vora and persistence settings:

  • Native streaming capability into SAP Vora persistent storage
    • Support for DML operations — insert, update, delete, upsert — on streaming tables with the disk engine
    • Support for external cloud storages as back up checkpoints
  • Supports real-time data replication into SAP Vora tables directly with SAP LT replication server

5. Unified Modeling Interface with SAP Data Hub Modeler

Finally, we unified and improved the user experience. In this release, all existing modeling capabilities are now unified into a single interface, the SAP Data Hub modeler. The below data operations are now delivered as dedicated operators which are ready to be used in any data pipeline:

  • Workflow Pipeline Operators: Data transfers (SAP HANA/SAP BW), Spark Jobs, etc.
  • Remote Sources Orchestration: SAP BW process chain, SAP Data Services job, SAP HANA flow graph
  • Structured Data Transformations: Projection, aggregation, join, union, case
  • Data Masking: Mask out, numeric generalization, pattern variance, etc.
  • Validation Rules: Basic and custom functions

In summary, SAP Data Hub 2.3 offers more functionality with a flexible architecture, which ultimately simplifies and speeds up the process of deployment, scaling the data pipelines, and enforcing governance.

To learn more about the product, please refer to the official documentation site, or check out SAP Data Hub on YouTube for more tutorial videos.

For hands-on experience with SAP Data Hub, please visit the following assets:


Marc Hartz is product management lead for SAP Data Hub.