Through multi-region failover, Acquia provides Continuity-as-a-Service using a hot cloud recovery model. With multi-region failover, your Production application has a cloned version of its full stack in a secondary failover region. In the event of a failure or substantial impairment in your primary region, you can switch your application immediately to the clone in the secondary region. Multi-region failover is available for Cloud Platform Enterprise applications as an add-on service at an extra cost.
To use multi-region failover, you must also use a CDN service, such as Edge. This is important to avoid an interruption in service; the CDN can continue to serve cached content while your application is switching over to the secondary region.
How it works
When you choose multi-region failover for your Cloud Platform Enterprise application, Acquia duplicates your Production environment in a different region from your primary region. For example, if your application is hosted in the US-East region, your secondary application might be created in the US-West region. The secondary infrastructure cluster is configured to receive the same code deployments as the primary cluster, so that it is always running the same code. In addition, multi-region failover uses database replication to keep the primary and secondary database infrastructure in sync in both the primary and secondary regions. This means that any changes to the database in either the primary or secondary region will immediately sync to the database infrastructure in the other region.
During normal operations, your application continuously runs a special one-way rsync
process on the primary region application, which ensures that any files added to the primary region are also sent to the infrastructure in the secondary (failover) region.
The combination of the synced code, databases, and files means that the failover region and primary region are functionally identical. The main difference, aside from the location of the infrastructure, is that each region is assigned its own distinct Elastic IP (EIP) address.
The failover process
In the event of an emergency in your application’s primary region, the Acquia multi-region failover configuration ensures that there is an alternative functional version of your live production application. This might be an event that causes the primary hosting region to be, in part or in whole, impaired or inoperative in such a way that Acquia’s Support teams cannot restore full service in the primary region immediately or within a reasonable amount of time.
The multi-region failover configuration should not to be used to reduce the impact of routine maintenance or upsizes, to mitigate the impact of high-traffic events if your primary region’s infrastructure reaches capacity, or to attempt to work around incidents where adverse code, file, or database changes have been deployed to your Production application.
In the event of an emergency, you can begin the failover process at any time; you do not need Acquia’s assistance. If your application uses Acquia Edge CDN, you can request that Acquia Support assist with the failover process. In any case, you should notify Acquia as soon as possible, so that Acquia does not take any conflicting actions in addressing the emergency.
To initiate the failover process, configure your application’s CDN settings to point to the Elastic IP address of the secondary region, instead of the primary region. You can find the Elastic IP addresses on the Domains page of the Cloud Platform interface. After the CDN changes take effect, requests to the application will be handled by infrastructure in the secondary region, instead of the primary region.
Since the caches in the secondary region will be empty at first, performance may be slower immediately following failover until the caches rebuild.
Cron and failover
Cron jobs in Cloud Platform are set to run on infrastructure in the primary region. Upon failover, Acquia Support can edit cron jobs to run in the secondary region instead of the primary region. Cron jobs do not transfer over to infrastructure in the secondary region upon failover. For more information, see Using scheduled jobs to maintain your application.
Operating while in failover
While your application is being served from the secondary region, many common Cloud Platform workflow tasks may not function properly. The secondary region includes a clone of your Production environment, but not other environments (such as Development and Staging). Workflow tasks that are designed to facilitate communications between infrastructure in the same region won’t work between environments in different regions. In other cases, such as full or partial region-wide failures, tasks may fail because Acquia’s code repository or task management infrastructure in those regions are also impaired.
The failback process
After the emergency in the primary region has been resolved, you will need to restore your application to its previous configuration so that it is again served from the primary region. This process is called failback.
Before initiating failback to the primary region, notify Acquia Support to confirm the date and time of the failback. At the time of the failback, Acquia will perform one final manual sync of the application’s files between the secondary and primary regions to ensure that there are no issues or inconsistencies. Acquia will then authorize you to proceed with the CDN failback to the primary region, pointing the CDN settings to the Elastic IP address of the primary region. If your application uses Acquia Edge CDN, you can request that Acquia Support assist with the failback.
Similar to when your application first fails over to the secondary region, caches in the primary region may be stale at the time of failback, so site performance may be reduced while the caches rebuild.
Multi-sites and multi-region failover
Multi-region failover is available to Cloud Platform Enterprise subscribers with multi-site applications. However, this functionality is not supported for Cloud Platform Professional or Site Factory applications.
In the event of a failover event, subscribers with applications using multiple databases must ensure that all Production sites on that application are failed over to the secondary region to prevent any risk of data loss after the failover, or as a result of the failback process.
SSL and multi-region failover
Applications configured for multi-region failover should only utilize the standard method for SSL certificate management. The legacy installation method is not supported for this configuration.
Using the current Drupal version with multi-region failover
Acquia is providing extended support for MySQL 5.6 until Cloud Next offers the multi-region failover functionality. Meanwhile, Cloud Platform customers can use the current Drupal version with this functionality. However, certain features or modules in later versions of Drupal might require the database functionality introduced in MySQL 5.7. In this situation, such Drupal features or modules are unavailable until Acquia releases a newer version of the multi-region failover functionality in Cloud Next.
Using multi-region failover with other Acquia products
The following features and Acquia products are incompatible with multi-region failover configurations:
- Cloud Platform Professional
- Site Factory
- Shield VPC
- Secure VPN
- Elastic Load Balancers (ELBs) (Legacy SSL install method)
- Resilient Edge Clusters
- Acquia Search
- Node.js
- Digital Asset Manager
Further, no Acquia Marketing Cloud products support the multi-region failover functionality.
All applications requiring any of these features or services must be architected to ensure that sites can continue to serve critical content without these features in the event of a regional impairment and failover.
Testing the failover process
Acquia tests multi-region failovers during the setup process. After this functionality is in place, Acquia does not support any additional testing and will not provide assistance with failing infrastructure over or back in events unrelated to an emergency event in your primary region.