Clustering Ensures Consistent Data

A server breakdown, a system collapse, or the loss of business-critical data are horror stories for every company. A business can continue only when its data and the applications that process that data are highly available. Whether they are planned or not, IT downtimes do real and measurable damage to a company.
In the SAP environment, an application consists of a large number of services, such as messaging, enqueue, network file system (NFS), and common Internet file system (CIFS). The services are distributed across several servers, which means that every service is a potential weak link in the chain that must be protected by setting up redundancies at the application level. Redundancy adds security, but it increases the complexity of the environment and makes it more difficult to manage. Because SAP applications support important business processes, a company can experience significant losses if an application fails during use. The ability of applications to be permanently available, maintainable, and recoverable is essential for continuous business operations.
Technologies like synchronous or asynchronous replication of the SAP environment or local or global clustering – a model that consists of several individual server systems – protect against downtime or the loss of data. They enable you to switch SAP applications back and forth between computer centers in various locations. Business continuity plans enhance the technical protections for an SAP environment with organizational elements like communication and escalation paths in specific emergency scenarios. Ensuring business continuity starts at the local level – just as it does for non-SAP environments.

Synchronous and asynchronous replication

Replication writes the data of a local hard drive to several other hard drives in parallel. The local and remote drives can be addressed by various networks at different speeds and with different communications protocols, including IP encapsulation with ATM, frame relay, or X.25.
Synchronous data replication simultaneously transmits all changes to data in blocks to the drives of the primary server and all remote drives. The procedure ends when all drive systems have transmitted a positive confirmation. Only then is the next block written. Depending on the distance, speed, and the number and size of the blocks to be transmitted, slight delays can occur during the procedure. The advantage of synchronous transmission is that it creates identical data that protects against lost data if a server fails. That’s why synchronous data replication is recommended for manageable distances and applications that rely on complete data rather than on process speed.
Companies often rely on asynchronous replication to plan global protection against downtime. Asynchronous replication occurs according to network availability. Although this approach guarantees good performance, it does not offer comprehensive protection against lost data. The replication software first collects the write commands for the remote disk locally in a replication log file. If the file is written successfully, the application is notified, and the software transmits the data to the server or to a second server. You must balance the relative advantages of high speed with the risk of possible data loss. If data must be reconstructed, some data might be missing because all the data might not have reached the second drive. Although controlling the transmission, monitoring the bandwidth, and handling errors demand a great deal of effort during asynchronous replication, it is often the only alternative when large distances are involved.
Only a few manufacturers offer enhanced asynchronous replication, which maintains the write sequence of the blocks on the replicated drive. The correct sequence ensures data integrity and consistency and guarantees that it can be recreated at the second location. The data must reach the backup drive in the exact sequence in which it was written, which is called write-order fidelity. If a serious problem occurs, inconsistent or erroneous data on the second drive makes the entire process meaningless. Replication solutions that operate at the hardware level often lack the ability to guarantee write-order fidelity when they run in asynchronous mode because the changes aren’t transmitted comprehensively and continuously. You must take appropriate measures to ensure the correct sequence so that the mirrored data does not become meaningless.
You can configure such a system to enable switching between synchronous and asynchronous replication. If the replication software recognizes that the transmission quality is no longer sufficient for synchronous replication and that the connection is too slow for the current requirements, it automatically switches to asynchronous mode. If the required quality of the transmission is reinstated, the software switches back to synchronous transmission.

Local and global clustering

Automatic, application-specific monitoring and protection against downtime improve the availability of business processes. Local clustering and the virtualization of business-critical services help you consolidate servers, use existing resources more efficiently, and prevent downtime. Local clustering involves the decoupling of components required for SAP applications (such as drives, file systems, IP address, and NFS releases) from a dedicated computer. For example, if one server fails, a cluster solution ensures that another server fills the gap so that all processes continue to run. If the primary server fails, the reserve server processes the replicated data further, with minimal loss of time.
Global (or WAN) clustering offers the best performance. This approach connects clusters at various locations with SAP applications. State-of-the-art clustering solutions monitor each of the SAP applications and the dependencies among them. This approach guarantees that, in the event of a failure, the services of an application or the entire application are started in the correct sequence in another system in the same cluster. If the entire cluster fails, another cluster is used. If the application is online, the clients are automatically redirected to the application that was started on another server. This approach not only ensures the availability of SAP applications in the event of a failure, it also improves management because you can control the entire application centrally. If you couple replication with WAN clusters, your company can switch to a different, offshore computer center should the entire computer center fail. After the cluster software updates the DNS server, it automatically redirects user accesses to the new location.

Planning continuity correctly

Application performance management with an SAP certificate
Application performance management with an SAP certificate

Technical solutions like clustering and replication automate critical processes, reduce administrative costs, and prevent failures. In an ideal case, they are enhanced with a business continuity plan that contains best practices for dealing with risks, the implementation of a disaster recovery plan, and an evaluation of technologies and measures to protect business-critical services. This kind of plan provides templates for a structured approach to architecture, processes, and procedures to identify and maintain restoration times and goals. To enable the most comprehensive and effective development of the strategy and the plan, you can engage the support of trained consultants. The consultants set benchmarks, identify weaknesses, and suggest solutions. This approach best meets the requirements of auditors, stockholders, and management for the continual availability of systems.
To protect against system failures, an alternate location to guarantee redundancy and disaster recovery can make sense. Nevertheless, many companies shy away from the cost. Traditional recovery solutions for SAP software usually require setting up the identical, expensive hardware at every location. Redirecting an application to an alternate location requires moving backup tapes and personnel to the new location, setting up servers, loading operating systems, and creating current backups of the software. If operations take place at the alternate location, companies must start complicated and potentially error-prone processes to run the SAP components that depend on each other.
Your company can save storage costs if the secondary drive does not necessarily have to correspond to the primary drive. You can also save considerable sums if you use alternate locations primarily for other, less-important business processes. You can turn off or reduce the number of those processes if the main computer center fails and the entire infrastructure is needed to secure truly critical systems. It’s worth it to compare the prices of various hardware and software suppliers if you are not bound to one environment. You can also significantly reduce the total cost of ownership for an SAP environment with this approach.

Manuel  Braun
Manuel Braun