Information for: DEVELOPERS   PARTNERS

Acquia Cloud Enterprise availability and disaster recovery

Acquia Cloud Enterprise is designed for high availability, with guaranteed 99.95% uptime. This page describes how Acquia delivers high availability for Acquia Cloud Enterprise.

High availability architecture

Acquia Cloud is built on Amazon Web Services (AWS) infrastructure, which is physically remote from the Acquia offices. Acquia Cloud subscribers may choose the geographic region for their application’s location.

Each region contains several availability zones. AWS Availability Zones are separate yet interconnected data centers within the major regions. Acquia Cloud Enterprise offers high availability by using several AWS Availability Zones in one AWS region with redundant servers serving each layer of the technology stack. The following are the three main components of a Drupal application hosted by Acquia Cloud Enterprise:

  • Reverse proxy caching and load balancing servers (Nginx and Varnish®)
  • Web servers (Apache with PHP and Drupal code)
  • Database servers (Percona (MySQL))

At the internet-facing tier, a software-based load balancer is deployed with a hot standby in a different availability zone in the same region. The load balancer distributes load across several web servers, which are also distributed across several availability zones. Acquia’s expert operations team adds more web servers to the resource pool as needed. The load balancer continuously monitors the web servers, and if a server becomes unavailable, it removes it from the pool of hosts serving the application. Web servers use a shared network file system (GlusterFS) so all files maintain synchronization and redundancy.

At the database layer, a scalable database cluster serves the application with active and passive database servers in several availability zones. The active master database server continuously updates the passive master database using MySQL replication. In the event of a failure of the master database, the passive database becomes primary through a domain name system-based (DNS) failover.

Acquia’s policy is to restore subscriber services in the event of a major disaster in the best time frames. If the services in the current zone or region were severely impacted, Acquia would do its best to restore services in an alternate availability zone or region.

Disaster recovery - multiregion replication

Optionally, for subscribers with high availability requirements, Acquia offers Acquia Cloud Enterprise subscriber environments with hot standby applications in an alternate region, providing live failover capabilities for disaster recovery.

Balancer failovers

When a load balancer is unreachable or unresponsive, Acquia will perform a load balancer failover. To help you determine if your application uses an ELB, see the Managing Acquia Cloud servers for information about your application’s servers or Pointing DNS records to your public IP addresses.

For subscribers without an elastic load balancer (ELB), Acquia will reassign the elastic IP address (EIP) of the load balancer pair to the load balancer in the pair that remains responsive. During the reassignment process, Acquia ensures the websites served by this load balancer pair remain accessible while the unresponsive load balancer is repaired. After the repair, the EIP will not be reassigned (“failed back”) to the original load balancer in the pair.

For subscribers with an elastic load balancer (ELB), the ELB will stop routing traffic to a load balancer if the ELB detects that a load balancer is unhealthy or unavailable.

Database failovers

The active database in a database pair is marked with a DNS pointer. If the DNS server detects that the active database isn’t responding, the following steps will occur:

  1. The DNS server will attempt to mark the passive database as the active database.

  2. Any queries that require changes to the database will be handled by the database that’s still functioning.

  3. Acquia will repair the unresponsive database.

    Important

    While Acquia repairs the unresponsive database, your subscription will not be highly available.

  4. The repaired database will be re-synchronized with the currently active database using MySQL Binlogs.

  5. The DNS pointer will be reassigned, or failed back, to the original active database.

Once both databases are responsive, data is synchronized between them, and the DNS pointer has been failed back, high availability has been restored.