How SAP and Google Cloud Are Advancing Enterprise AI Through Open Agent Collaboration, Model Choice, and Multimodal Intelligence

Feature

AI is increasingly embedded everywhere in business operations, powering automation, insight, and decision-making across systems and workflows. As part of our ongoing partnership with Google Cloud, SAP is enabling the next wave of enterprise AI by contributing to the new Agent2Agent (A2A) interoperability protocol, which establishes a foundation for AI agents to securely interact and collaborate across platforms.

Boost productivity with the most powerful AI and agents fueled by the context of all your business data

This work is complemented by two additional areas of progress: first, the expansion of Google Gemini models in SAP’s generative AI hub on SAP Business Technology Platform (SAP BTP); second, the use of Google’s video and speech intelligence capabilities to support multimodal retrieval-augmented generation (RAG) for video-based learning and knowledge discovery in SAP products.

Together, these efforts reflect a shared commitment to deliver enterprise-ready AI that is open, flexible, and deeply grounded in business context.

Bringing AI agents together: laying the groundwork for interoperability

The future of work is agentic. Businesses are increasingly deploying AI agents that assist with real tasks — resolving customer issues, managing approvals, and collaborating across business functions. This is why SAP is delivering a collaborative agent architecture with Joule to support cross-functional agentic workflows across SAP Business Suite.

But for these agents to deliver real value, they cannot operate within a single vendor landscape. They must be able to collaborate across various platforms, securely exchange information, and coordinate actions across complex enterprise workflows.  This need for seamless interaction underscores why the A2A protocol represents a significant step beyond simple API integrations or enhanced tooling.

That’s why SAP has joined Google Cloud and other enterprise leaders as a founding contributor to the new A2A protocol. This open standard is designed to ensure agents from different vendors can interact, share context, and work together—enabling seamless automation across traditionally disconnected systems.

Consider a customer dispute resolution scenario: a representative receives a billing inquiry via Gmail. Instead of toggling between tools, they can invoke Joule directly from the email. Joule, acting as an agent orchestrator, initiates a dispute resolution process, engaging another Google agent that connects to Google BigQuery, where relevant transactional warehouse data resides. Together, the agents validate the issue, retrieve insights, and recommend a resolution — without manual system switching, data reconciliation, or context loss.

This is the kind of cross-platform collaboration the A2A protocol is designed to enable: AI agents working together to accelerate business outcomes, reduce friction, and enable people to focus on more strategic work. It also reinforces SAP’s vision for Joule as an agent orchestrator working across enterprise workflows: interoperable, proactive, and deeply connected to business context.

Expanding access to Google models in generative AI hub

Beyond agent interoperability, SAP is furthering its commitment to openness and flexibility by expanding access to Google models in the generative AI hub, a key capability of the AI Foundation on SAP BTP.

Through the generative AI hub, customers gain enterprise-grade access to a curated portfolio of leading foundation models. That portfolio now includes Google Gemini 2.0 Flash and Flash-lite, which join the existing support for Gemini 1.5 models already available through the hub.

This expanded model choice gives customers the flexibility to build and extend AI-driven solutions using high-performance, low-latency models optimized for enterprise workloads — while staying within SAP’s secure, business context-rich environment.

By combining Google’s model innovation with SAP’s deep understanding of enterprise processes, we enable customers to apply generative AI in ways that are not only powerful, but also practical, trustworthy, and fully aligned with how businesses operate.

Unlocking multimodal understanding with Google Video Intelligence

As part of our continued collaboration with Google Cloud, SAP is also advancing multimodal RAG, a highly requested capability among SAP customers, especially for video-based learning content.

Multimodal RAG enhances information retrieval and generation by integrating multiple data modalities — text, images, audio, and video — into a single, structured process. This approach enriches knowledge sourcing and elevates how users interact with training and support materials.

To address the complexity of extracting meaningful insights from video content, SAP leverages Google Video Intelligence for on-screen text detection across video frames, and Google’s Speech-to-Text API for accurate transcription of spoken audio. During the indexing process, these outputs are stored with corresponding timestamps, creating a structured foundation for retrieving relevant video segments with precision.

By grounding audio and visual content with time-aligned metadata, SAP enables users to search and retrieve specific, contextually relevant moments within a video, making the learning experience more intuitive, accessible, and impactful.

“As agentic AI evolves, seamless handling of multi-modal data — text, voice, enterprise videos, and images — becomes paramount,” said Miku Jha, director of AI/ML and Generative AI at Google Cloud. “This introduces significant challenges for agent interoperability. An open protocol like A2A is therefore indispensable, providing the necessary framework and flexibility for agents to effectively communicate and collaborate across these diverse modalities. Multi-modality is not simply a capability; it is a foundational requirement driving the next generation of interconnected agentic systems.”

This is another example of how SAP is integrating Google’s AI capabilities into business-relevant scenarios, helping customers unlock more value from their unstructured content and elevate the way knowledge is delivered across the enterprise.

Shared vision for business AI

These efforts reflect a broader strategic alignment between SAP and Google Cloud: a shared belief in AI that is open, composable, and grounded in real business context. Whether it’s shaping emerging standards for agent collaboration, providing choice through best-in-class models, or making unstructured content actionable, we are focused on helping our customers innovate with confidence — today and into the future.

To learn more about how SAP and Google Cloud are shaping the future of enterprise AI, visit sap.com/ai and explore our session at Google Cloud Next to see these innovations in action.


Walter Sun is senior vice president and head of AI at SAP.

Subscribe to the SAP News Center newsletter and get stories and highlights delivered straight to your inbox each week