Information for: DEVELOPERS   PARTNERS

Cloud Platform Enterprise availability and disaster recovery

Cloud Platform Enterprise is designed for high availability, with guaranteed 99.95% uptime. This page describes how Acquia delivers high availability for Cloud Platform Enterprise.

High availability architecture

Cloud Platform is built on Amazon Web Services (AWS) infrastructure, which is physically remote from the Acquia offices. Cloud Platform subscribers may choose the geographic region for their application’s location.

Each region has several availability zones. AWS Availability Zones are separate yet interconnected data centers within the major regions. Cloud Platform Enterprise offers high availability by using several AWS Availability Zones in one AWS region with redundant servers serving each layer of the technology stack. The following are the three main components of a Drupal application hosted by Cloud Platform Enterprise:

  • Reverse proxy caching and load balancing servers (Nginx and Varnish®)
  • Web servers (Apache with PHP and Drupal code)
  • Database servers (Percona (MySQL))

At the internet-facing tier, a software-based load balancer is deployed with a hot standby in a different availability zone in the same region. The load balancer distributes load across several web servers, which are also distributed across several availability zones. Acquia’s expert operations team adds more web servers to the resource pool as needed. The load balancer continuously monitors the web servers, and if a web server becomes unavailable, removes it from the pool of hosts serving the application. Web servers use a shared network file system so all files maintain synchronization and redundancy.

At the database layer, a scalable database cluster serves the application with active and passive database servers in several availability zones. The active master database server continuously updates the passive master database using MySQL replication. In the event of a failure of the master database, the passive database becomes primary through a domain name system-based (DNS) failover.

Acquia’s policy is to restore subscriber services in the event of a major disaster in the best time frames. If the services in the current zone or region were severely impacted, Acquia would do its best to restore services in an alternate availability zone or region.

Disaster recovery - multiregion replication

Optionally, for subscribers with high availability requirements, Acquia offers Cloud Platform Enterprise subscriber environments with hot standby applications in an alternate region, providing live failover capabilities for disaster recovery.

Balancer failovers

When a load balancer is unreachable or unresponsive, Acquia will perform a load balancer failover. To help you determine if your application uses an ELB, see the Managing Cloud Platform servers for information about your application’s servers or Configuring DNS records for your application.

For subscribers without an elastic load balancer (ELB), Acquia will reassign the elastic IP address (EIP) of the load balancer pair to the load balancer in the pair that remains responsive. During the reassignment process, Acquia ensures the websites served by this load balancer pair remain accessible while repairing the unresponsive load balancer. After the repair, the EIP won’t be reassigned (“failed back”) to the original load balancer in the pair.

For subscribers with an elastic load balancer (ELB), the ELB will stop routing traffic to a load balancer if the ELB detects a load balancer is unhealthy or unavailable.

Database failovers

The active database in a database pair is marked with a DNS pointer. If the DNS server detects the active database isn’t responding, the following steps will occur:

  1. The DNS server attempts to mark the passive database as the active database.

  2. Any queries requiring changes to the database are handled by the still functioning database.

  3. Acquia will repair the unresponsive database.

    Important

    While Acquia repairs the unresponsive database, your subscription won’t be highly available.

  4. The repaired database will be re-synchronized with the currently active database using MySQL Binlogs.

  5. The DNS pointer will be reassigned, or failed back, to the original active database.

Once both databases are responsive, data is synchronized between them, and the DNS pointer has been failed back, high availability has been restored.