Information for: DEVELOPERS   PARTNERS

Unloading Query Results to Amazon S3

In some cases, it may be useful to unload query results to an Amazon S3 bucket. This method is typically used when you need to access more than 10k rows of data or if you need to setup some post-processing of query results. Achieving this workflow involves the following steps:

  1. Initial one-time setup of an Amazon S3 bucket
  2. Unloading query results to Amazon S3
  3. Accessing query results on Amazon S3

Setting up an Amazon S3 Bucket

There are two options here:

  1. Unload query results to an Amazon S3 bucket managed by you
  2. Unload query results to an Customer Data Platform (CDP)-managed Amazon S3 bucket

Bringing your own S3 bucket is the recommended option because you would have full control over how long query results are retained in the bucket. Instead, if CDP sets up the bucket for you, we give you read-only access to the files and the query results are retained in the S3 bucket for 7 days.

Option 1: Unload query results to an Amazon S3 bucket managed by you

Here are the steps for setting up your S3 bucket. Please refer to AWS documentation for any questions you may have on how to do the following steps:

  1. Create a new IAM user in your AWS account for CDP. Give the user a descriptive name so that it can identified in the future.
  2. Create a new S3 bucket with the following properties:
    • Bucket Name: Make this descriptive so that it can be differentiated from other buckets your AWS account may have.
    • Region: Buckets for US clients should be created in the US East (N. Virginia) aka us-east-1 AWS region. EU clients should create their buckets in the EU (Ireland) aka eu-west-1 AWS region.
  3. Additional properties for S3 bucket such as versioning, logging, etc.: These are not required and can be disabled.
  4. Give the IAM user created in Step 1 full READ/WRITE permissions to the newly created S3 bucket.
  5. Very Important: Ensure that the S3 bucket is not publicly accessible.
  6. Send the following details to your CDP CSM:
    • AWS Access Key ID for IAM user from Step 1
    • AWS Secret Access Key for IAM user from Step 1
    • AWS Bucket Name
    • AWS Region where your S3 bucket is located
    • Optional: Folder Name (if you want files to be dropped in a specific folder)
  7. Your CDP CSM will send you back an S3 stage name that you will need in the next step for unloading your query results. For example, the S3 stage name would look something like this: A1SF_EXTSTAGE_101_123.

Option 2: Unload query results to an CDP-managed Amazon S3 bucket

Note that while we can setup a bucket for you, you will have read-only access to the files in the bucket and the unloaded query results will be retained for 7 days only.

If you prefer to go with this option, you can make the request to your CDP CSM. After our OPS team sets up the bucket, your CSM will send you back an S3 stage name that you will need in the next step for unloading your query results. For example, the S3 stage name would look something like this: A1SF_EXTSTAGE_101_123.

Unloading query results to Amazon S3

Run the following query in the Interactive Queries Web UI:

COPY INTO @S3_stage_name[/folder_path][/file_name][.csv.gz]
-- [.csv.gz] should only be used when SINGLE = TRUE option is used
FROM
(
<your sql query goes here>
)
SINGLE = TRUE -- Specifies whether to generate a single file or multiple files
MAX_FILE_SIZE = 100000000 -- Number that specifies the maximum size (in bytes) of each file to be generated
HEADER = TRUE -- Specifies whether to include the table column headings in the output files
OVERWRITE = TRUE -- Specifies whether the COPY command overwrites existing files, if any, in the location where files are stored

Accessing query results on Amazon S3

Recommended tool: Cyberduck: https://cyberduck.io/

Follow these steps after installing Cyberduck:

Open Connection –> Amazon S3:

  1. Server: [bucket-name].s3.amazonaws.com
  2. Port: 443
  3. Access Key ID: Provided by whoever is managing the bucket from Step 1
  4. Secret Access Key: Provided by whoever is managing the bucket from Step 1