Content Hub

Bulk import and export of entities in Content Hub 1.x

Important
  • This page applies only to Content Hub 1.x.
  • Content Hub 1.x will reach end-of-life on September 30, 2024. Acquia recommends you to upgrade to Content Hub 3.x. For more information, see Upgrading from Content Hub 1.x to 3.x.

Acquia Content Hub enables you to export and import your entities in bulk, making it easier to transfer your initial content from one website to another.

Prerequisites

Download and add these files to a script directory in your repository, and deploy them on Cloud Platform:

These bulk operations aren’t yet available as Drush commands.

Setting up a bulk export

Bulk export is ideal when starting new, and when you plan to import content to your websites later in a separate step. Before you try a bulk export, you must confirm that Content Hub has no registered webhooks.

Important

If your websites have registered webhooks, the bulk export will send webhooks to all of your websites, causing massive load on your servers.

  1. To see the list of the webhooks you have on Content Hub, run this command from any of your websites:

    drush acquia-contenthub-webhooks list
    
  2. Sign in to each website listed in the webhooks section, and de-register the webhooks one at a time:

    1. Sign into the Drupal website as a user with the Administrator role.
    2. Navigate to Configuration > Acquia Content Hub > Webhooks
    3. Clear the checkbox.
    4. Click Save.

    You can also use the following Drush command:

    drush acquia-contenthub-webhooks unregister
    

    If you have problems removing any of the webhooks, create a Support ticket.

  3. Navigate to the Entity settings page at Configuration > Acquia Content Hub > Configuration
  4. Select all the content types you want to export. Don’t select Publish view modes.
  5. Go to Configuration > Acquia Content Hub.
  6. Enable the Export queue.
  7. Purge the Content Hub system:
    1. From the command line, purge Content Hub using the command:

      drush ach-purge
      
    2. Truncate the tracking table with the following SQL command:

      truncate acquia_contenthub_entities_tracking
      

      Note

      When running SQL commands, use drush -l SITEURL sqlc to access the correct database. Double-check the URL to make sure you’re using the correct one.

    3. Empty the export queue (SQL):

      DELETE FROM queue WHERE name = 'acquia_contenthub_export_queue';
      
  8. Check that the table and queue are empty. The following commands should return 0 if the table and queue are empty:

    drush -l http://example.com sql-query "select count(*) from acquia_contenthub_entities_tracking"
    drush -l http:/example.com sql-query "select count(*) from queue where name = 'acquia_contenthub_export_queue'"
    
  9. Enable a Scheduled job for the export queue in your Cloud Platform environment. For example:

    flock -xn /tmp/ach-export-queue.lck -c "drush -v @mysite.env -l http://example.com queue-run acquia_contenthub_export_queue" &>> /var/log/sites/${AH_SITE_NAME}/logs/$(hostname -s)/drush-ach-export.log
    

    Important

    Never run the queue from Drupal’s administrative user interface, as it causes a race condition with cron jobs on Cloud Platform.

  10. Run the bulk export script from a command line:

    drush -l http://example.com scr ../scripts/ach-bulk-export.php > ~/ach-examplesite-bulk-export.log
    

    This command sends the output to a log file. To view the command’s progress, use the tail -f command. The run bulk-export script will run for less than 15 minutes, and pause every few minutes.

    The command’s execution should end with a line like this:

    Enqueuing 'node' (webform) entities with IDs: 1121.
    

Monitoring the bulk export

Use the following tools to monitor the bulk export process:

  • Breakdown of the tracking table contents:

    drush -l http://www.example.com sql-query "SELECT status_export, count(*) FROM acquia_contenthub_entities_tracking GROUP BY status_export"
    
  • Total number of entities in Content Hub:

    drush -l http://www.example.com ach-list|head
    

At the end of the bulk export, the exported number of entities and the total entities in Content Hub should match.

Preparing your website for bulk import

To prepare your website for a bulk import:

  1. Prepare the website for import using the following commands, modifying them for your website:

    drush -l http://example.com/ en reset_entity example_entity -y &&
    drush -l http://example.com/ cr all &&
    drush -l http://example.com/ reset-node example_node_type &&
    drush -l http://example.com/ reset-node example_note_type_2 &&
    drush -l http://example.com/ reset-taxonomy example_taxonomy
    
  2. Truncate the acquia_contenthub_entities_tracking table. This table should be empty, unless you have already imported content that shouldn’t be imported in this bulk import.
  3. Ensure the import queue is empty using the command:

    drush queue-list
    
  4. Ensure Import queue is enabled at Configuration > Acquia Content Hub > Import queue.
  5. Ensure you have an import Scheduled job set up on Cloud Platform for your environment, pointing it to the website where you want to run a bulk import. Update the import queue Scheduled job to match the current website.

    This job must be changed for each new website you want to bulk import.

    Using a scheduled job instead of an external script

    When using a scheduled job, the long-running import runs in a background process. Your developers don’t need to keep connections open to your web server.

    Here is a reference command for a test environment:

    flock -xn /tmp/ach-import-queue.lck -c "drush -v @example.test -l http://example.com queue-run acquia_contenthub_import_queue" &>> /var/log/sites/${AH_SITE_NAME}/logs/$(hostname -s)/drush-ach-import.log
    
  6. Run the import script to queue all items found in Content Hub for import. The script lists all the entities present in Content Hub, adds them to the import queue on the website, and provides options in case you want to be selective about what you want to import. For example:

    drush -l http://example.com scr ../scripts/ach-bulk-import.php > ~/ach-bulk-import-examplesite.log
    

    This command sends the output to a log file. To view the command’s progress, use the tail -f command. It’s normal for the script to pause and resume.

Your scheduled job processes the import queue.

Monitoring the bulk import

To monitor the bulk import process, use the following tools:

  • Breakdown of the tracking table contents:

    drush -l http://example.com sql-query "SELECT status_import, count(*) FROM acquia_contenthub_entities_tracking GROUP BY status_import"
    
  • Monitor the size of the queue table. This table can include failed items, and may not be representative of the number of items left to process:

    drush -l http://example.com queue-list | grep import
    
  • Follow the logs in real time. (To see the log file, check the scheduled job in the Cloud Platform user interface):

    tail -f /var/log/sites/example-test/logs/staging-01/drush-ach-im-example.log
    

Completing the bulk import

Once the bulk import is complete, perform the following steps:

  1. Disable the bulk import scheduled job until it’s needed for another website.
  2. Sign in to your website as a user with the Administrator role.
  3. Navigate to Configure > Acquia Content Hub > Webhooks, and enable webhooks for your website. This ensures that Content Hub knows where to send content updates to subscribers.
  4. Disable the import queue.

Your website should receive content update from the publishing website using Content Hub webhooks.

Additional resources