The Perfect Plant Needs High Availability

January 23, 2009 by Ulrich Lenz, Senior Consultant, Stratus Technologies

As convincing as the concept of integrating production and commercial IT worlds may be, it also has its Achilles heel. The longer the process chain becomes, the greater the effect of any disruption. In the perfect plant, malfunctions in the ERP system can even halt production altogether.

Fewer but greater losses

Developments in hardware technology have created systems that are considerably more stable than their predecessors. Thanks to reliable operating systems, server breakdowns have become a rare exception. But when systems do break down, losses are greater today because almost all companies depend on the availability of their IT environment.

It has become virtually impossible to replace IT with manual procedures. Even short server outages disrupt business. The risk of a server breakdown may be lower. But if it does happen, the damage will be more serious.

In manufacturing, even very short disruptions can have an impact. Manufacturing execution systems (MES) can gather business and machine data down to the millisecond; communication granularity sometimes extends right to the very sensors and counters in the machines. Conventional manufacturing execution systems can handle this. But servers optimized for PC technology-based data processing still have to learn this if they are to be tied into these processes as part of integrated manufacturing management.

In Practice – An Example

REXAM BCNA, the world’s biggest maker of beverage cans, manages production using SAP ERP enhanced with solutions that include quality management and shop floor execution.

Their SAP ERP system is run centrally in Chicago and is connected to 17 plants through a wide area network connection. SAP MII has been implemented at each plant to provide local survivability and MES functionality that can survive a wide area network outage. However, the local servers running the SAP MII solution were identified as single point of failure. If they suffer an outage, REXAM BCNA may have revert to manual systems which are costly and inefficient. Additional costs will also be incurred for the manual entry of transactions once the systems come back online.

REXAM BCNA solved the problem with a Stratus system, which provides a fault-tolerant server platform that is easy for non-technical plant resources to maintain.

Too much downtime in cluster systems

Many companies safeguard themselves against hardware breakdown using cluster systems. Clusters of computers can reduce server downtimes, but they cannot rule them out completely. If a breakdown occurs, shifting the processes within the cluster takes time, which means that genuine continuous operation is not possible.

Even if a cluster resumes processing after the breakdown, the data in the memory is still lost. As a result, full data integrity is not ensured with clusters, and time-critical processes can suffer.

Clusters can increase average availability to 99.95 percent. With nonstop operations 24 hours a day, 365 days a year, this means a statistical downtime of up to four hours – too long for a perfect plant.

Furthermore, clusters are expensive to implement and operate because they require specially trained employees. In addition, application software needs to be adapted for cluster use.

Fault-tolerant servers minimize downtime

Nevertheless, companies that want to ensure continuous operations for their production systems do not have to revert to proprietary customized systems or mainframe solutions. Fault-tolerant servers can provide high availability by using standards like Intel hardware, and Windows or Linux operating systems.

The ftServers from Stratus provide an example. They use a fundamentally different technology than clusters: They do not connect redundant servers, but are themselves fully redundant. All mission-critical components are duplicated, including processors, memory chips, and I/O units. All server components process the same instructions at the same time. And in the event of a component malfunction, the partner component serves as an active spare that continues normal operation – without the need to switch.

The applications continue to run without data loss, and the administrator does not have to intervene. Maintenance activities can take place during live operation.

Integrating two IT worlds – with no added risk

Using this technology, ftServers achieve an uptime of more than 99.999 percent (“five nines”), representing an average of less than five minutes of unplanned downtime annually.

Because an ftServer outwardly behaves like a single system, existing Windows applications can be used without being modified. And when the reduced expenses for implementation and administration are taken into account, fault-tolerant servers offer a tangible financial advantage over cluster systems.

With their industry-leading uptime, fault-tolerant systems are the perfect enhancement to the perfect plant. They safeguard today’s highly sophisticated hardware processes and ensure that companies do not expose themselves to incalculable risks when integrating their IT worlds.

Tags: ,

Leave a Reply