Scrubbing a Drupal database environment

Drupal administrators often create database backups in order to replicate the live environment for testing or development. This is extremely useful, but can pose security risks if sensitive information is not removed. In the wrong hands or context, sensitive data could cause serious damage. This especially important when development environments are desktops or laptops that are easily lost or stolen. Scrubbing additionally ensures that emails are not accidentally sent from a development environment.

One solution for handling sensitive data is to encrypt local development environments (for example, using Virtualbox in an encrypted disk image). This strategy relies on developers to maintain compliance with security best practices. A solution that works regardless of the strength of the security on development machines is to remove sensitive data when backups are created.

Scrubbing databases

Three methods for scrubbing a Drupal database on Acquia Cloud are:

Drush

The Drush command line tool offers an extensible database scrubbing command called sql-sanitize.

Here is a summary of commands that you can use with Drush to sanitize a database:

Command Description
--db-url A Drupal 6 style database URL. Only required for initial install, not reinstall.
--sanitize-password The password to assign to all accounts in the sanitization operation, or no to keep passwords unchanged. The default is password.
--sanitize-email The pattern for test email addresses in the sanitization operation, or no to keep email addresses unchanged. May contain replacement patterns %uid, %mail, or %name. The default is user+%[email protected]t.

The following code implements a Drush hook (hook_drush_sql_sync_sanitize) to execute additional scrubbing queries when you are running sql-sanitize.

<?php/*** Implementation of hook_drush_sql_sync_sanitize().*/function modulename_drush_sql_sync_sanitize($source) {    $query = "UPDATE content_type_profile SET field_primary_email_email = CONCAT(SUBSTRING_INDEX(field_primary_email_email,'@', 1), '@example.com');";    $query .= "UPDATE content_type_profile SET field_work_phone_value = SUBSTRING(FLOOR(RAND() * 1000000000000) WHERE uid NOT IN  SELECT uid FROM users_roles WHERE rid=3) AND uid > 0 FROM 1 FOR 12)  WHERE field_work_phone_value IS NOT NULL;";   drush_sql_register_post_sync_op('my-sanitize-id', dt('Sanitize email addresses and phone numbers'), $query);}?>

SQL scripts

The following script accepts a gzipped SQL dump and creates a new scrubbed version:

#!/bin/bash# Use as ./sql_scrub_dump.sh local_db_user local_db_pass dump_file.gz# Creates a scrubbed version of a database dump using the local database defined below as the intermediary.# note: the dump file will be gzip compressed, as will the resulting file.local_db=6000_temp_scrubfunction db_query() {    sql=${1}    mysql -u${local_user} -p${local_pass} -e "$sql" ${local_db}}local_user=${1}local_pass=${2}dump_file=${3}new_dump_file=scrubbed-${3}mysql -u${1} -p${2} -e "DROP DATABASE IF EXISTS $local_db"mysql -u${1} -p${2} -e "CREATE DATABASE $local_db"gunzip -c $dump_file | mysql -u${1} -p${2} $local_dbmysql -u${1} -p${2} $local_db < scrub.sqlmysql -u${1} -p${2} $local_db < live_to_dev_scrub.sql# Here we make sure that all the tables are InnoDB.mysql -u${1} -p${2} -e "SHOW TABLES" $local_db \  | grep --invert-match '^Tables' \  | sed -e 's/\(.*\)/ALTER TABLE `\1` ENGINE = innodb;/' \  | mysql -u${1} -p${2} $local_db# Dump the scrubbed version and drop the temporary DB.mysqldump -u${1} -p${2} $local_db | gzip > $new_dump_filemysql -u${1} -p${2} -e "DROP DATABASE IF EXISTS $local_db"

In the preceding script, the scrubbing is invoked with the following line:

mysql -u${1} -p${2} $local_db < live_to_dev_scrub.sql

After the scrubbing is complete, the script performs a final transformation to ensure that all of the tables are in the InnoDB format. Then it executes a new SQL dump and stores it in a gzipped file before deleting the temporary database.

The following sample code shows an example live_to_dev_scrub.sql. This script will vary depending on the site, but many of these concepts will be useful.

-- Here we strip out private information that should not be floating around-- on developer systems, disable modules that should not or need not be-- enabled, etc. admin user passwords are set for ease of testing.UPDATE users SET mail = CONCAT(name, '@localhost'), init = CONCAT(name, '@localhost'), pass = MD5(CONCAT('MILDSECRET', name)), picture = CONCAT(name, '@localhost'), signature=CONCAT(name, '@localhost') WHERE uid NOT IN  (SELECT uid FROM users_roles WHERE rid=3) AND uid > 0;UPDATE authmap SET authname = CONCAT(aid, '@localhost');-- Turn off modules which shouldn't be active in development (if this is going to production, remove this line).DELETE FROM system WHERE name IN ('twitter', 'googleanalytics', 'securepages');-- Admin user should not be same but not really well knownUPDATE users SET pass = MD5('supersecret!') WHERE uid = 1;-- Scrub url aliases for non-admins since these also reveal names-- Add the IGNORE keyword, since a user may have multiple aliases, and without-- this keyword the attempt to store duplicate dst values causes the query to fail.UPDATE IGNORE url_alias SET dst = CONCAT('users/', REPLACE(src,'/', '')) WHERE src IN (SELECT CONCAT('user/',u.uid) FROM users u WHERE u.uid NOT IN (SELECT uid FROM users_roles WHERE rid=3) AND u.uid > );-- don't leave e-mail addresses, etc in comments table.UPDATE comments SET name='SCRUBBED', mail='[email protected]', homepage='http://example.com' WHERE uid=0;--Remove sensitive data from other tablesTRUNCATE accesslog;TRUNCATE access;TRUNCATE cache;TRUNCATE cache_filter;TRUNCATE cache_menu;TRUNCATE cache_page;TRUNCATE devel_queries;TRUNCATE devel_times;TRUNCATE flood;TRUNCATE history;TRUNCATE search_dataset;TRUNCATE search_index;TRUNCATE search_total;TRUNCATE sessions;TRUNCATE watchdog;

This script starts by removing user IDs, email addresses, and passwords. It then disables modules that aren't needed in a development environment, scrubs the sensitive URL aliases, and then proceeds to execute a long list of truncations for tables that hold data that is temporary or no longer relevant.

You may also want to alter sensitive entries in the variable table. You can do this with the following SQL statement:

UPDATE variable SET value = 's:4:"fake";' WHERE name = 'smtp_password';

Because every site has unique requirements, Acquia does not recommend using the preceding code as is. For example, you should consider if a site has ecommerce or mail systems that house personal information about site users. It may also be important to consider contextual differences between environments and machines. The preceding SQL statements are not applicable in every case.

Acquia Cloud hooks

Acquia Cloud also has some hooks available through some secondary software on GitHub. Located on the Cloud Hooks page, you can download these hooks, and then use the db-scrub.sh script to clean the database. Each script contains its own usage documentation, so read the documentation carefully before execution. For more information about Cloud Hooks, see the Automating with Cloud Hooks documentation page.

Scrubbing custom and contributed module tables

To determine adjustments that may be needed, search through the text of a database export for text strings that you might expect to be associated with personal data. For example, search the database dump file for the names of email providers (for example, Yahoo and Gmail) to see if there are additional email address fields. If you find one, you'll have to determine the tables and columns in which they are used, and then run UPDATE statements to sanitize these values.

Many Drupal modules use fields that carry user-specific data, including the following modules:

Take special care with modules (for example, the Twitter module) that store account credentials (for example, Twitter usernames and passwords) in a database table. And take extra special care with modules (for example, the SMTP Authentication Support and the Mailhandler modules) that store credentials for important email account either in the variable table or other locations in the database.

As you add new modules or upgrade existing modules, you may need to update SQL scrubbing scripts. After adding and upgrading modules, make a new database dump, import it into a temporary database, and run your script to sanitize the data. Then dump the new temporary database to a file, and search for new fields that contain sensitive information (for example, email addresses, user names, and passwords).

Webform module

To clear out old webform entries that are likely to include emails, integrate the following SQL statement in the scrub script:

UPDATE webform_submitted_data SET data='scrubbed';

Project and Project issue tracking modules

If a site uses the Project and Project_issue modules, the following SQL commands may be applicable:

UPDATE temp_backup_prep.project_issue_projects SET mail_digest = '[email protected]', mail_copy = '[email protected]';UPDATE temp_backup_prep.projects SET mail = CONCAT("empty", '@localhost');

Simplenews module

If a site uses the Simplenews module, the following SQL commands may be applicable:

UPDATE simplenews_subscriptions SET mail = CONCAT(snid, '@localhost');

Contact supportStill need assistance? Contact Acquia Support