Information for: DEVELOPERS   PARTNERS

Acquia Cloud Enterprise availability and disaster recovery

Acquia Cloud Enterprise is designed for high availability, with guaranteed 99.95% uptime. This page describes how Acquia delivers high availability for Acquia Cloud Enterprise.

High availability architecture

Acquia Cloud is built on Amazon Web Services (AWS) infrastructure, which is physically remote from the Acquia offices. Acquia Cloud customers may choose the geographic region for their application’s location.

Each region contains multiple Availability Zones. AWS Availability Zones are separate yet interconnected data centers within the major regions. Acquia Cloud Enterprise offers high availability by using multiple AWS Availability Zones in one AWS region with redundant servers serving each layer of the technology stack. The following are the three main components of a Drupal application hosted by Acquia Cloud Enterprise:

  • Reverse proxy caching and load balancing servers (Nginx and Varnish®)
  • Web servers (Apache with PHP and Drupal code)
  • Database servers (Percona (MySQL))

At the Internet-facing tier, a software-based load balancer is deployed with a hot standby in a different availability zone in the same region. The load balancer distributes load across multiple web servers, which are also distributed across multiple availability zones. Acquia’s expert operations team adds additional web servers to the resource pool as needed. The load balancer continuously monitors the web servers, and if a server becomes unavailable, it removes it from the pool of hosts serving the application. Web servers use a shared network file system (GlusterFS) so that all files are kept in sync and redundant to each other.

At the database layer, a scalable database cluster serves the application with active and passive database servers in multiple availability zones. The active master database server continuously updates the passive master database using MySQL replication. In the event of a failure of the master database, the passive database becomes primary through a domain name system-based (DNS) failover.

It is Acquia’s policy to restore customer services in the event of a major disaster in the best time frames. If the services in the current zone or region were severely impacted, Acquia would do its best to restore services in an alternate Availability Zone or region.

Disaster recovery - multiregion replication

Optionally, for customers with very high availability requirements, Acquia offers Acquia Cloud Enterprise customer environments with hot standby applications in an alternate region, thus providing live failover capabilities for disaster recovery.

Balancer failovers

When a load balancer is unreachable or unresponsive, Acquia will perform a load balancer failover. To help you determine if your application uses an ELB, see the Managing Acquia Cloud servers documentation page for information about your application’s servers or the Pointing DNS records to your public IP addresses documentation page.

For subscribers without an elastic load balancer (ELB), Acquia will reassign the elastic IP address (EIP) of the load balancer pair to the load balancer in the pair that remains responsive. During the reassignment process, Acquia ensures the websites served by this load balancer pair remain accessible while the unresponsive load balancer is repaired. After the repair, the EIP will not be reassigned (“failed back”) to the original load balancer in the pair.

For subscribers with an elastic load balancer (ELB), the ELB will stop routing traffic to a load balancer if the ELB detects that a load balancer is unhealthy or unavailable.

Database failovers

The active database in a database pair is marked with a DNS pointer. If the DNS server detects that the active database is not responding, the following steps will occur:

  1. The DNS server will attempt to mark the passive database as the active database.

  2. Any queries that require changes to the database will be handled by the database that is still functioning.

  3. Acquia will repair the unresponsive database.

    Important

    While Acquia repairs the unresponsive database, your subscription will not be highly available.

  4. The repaired database will be re-synchronized with the currently active database using MySQL binlogs.

  5. The DNS pointer will be reassigned, or failed back, to the original active database.

Once both databases are responsive, data is synchronized between them, and the DNS pointer has been failed back, high availability has been restored.