Upload Content to S3

The Upload Content to S3 transformation moves your data directly to Amazon S3 cloud storage from within your data pipeline. This powerful capability bridges the gap between data processing and storage, enabling seamless integration with the AWS ecosystem and beyond.

Why Use Upload Content to S3?

Centralize Your Data - Store processed information in a reliable, highly available cloud repository
Enable Downstream Processes - Trigger AWS Lambda functions, data analytics, or machine learning workflows
Simplify Distribution - Share data with other teams, systems, or business partners
Create Data Archives - Maintain historical records for compliance or reference purposes

Note

Before implementing this transformation, verify you have proper AWS credentials and bucket write permissions configured in your environment.

Setting Up the S3 Upload Transformation

Navigate to your transformation menu and select "Upload Content to S3"
Choose or create an Amazon S3 connection
Configure the required parameters (detailed below)
Test the connection with sample data
Save and apply the transformation

Configuration Parameters

S3 Connection Establishes authentication with AWS using your credentials.
Options:
- Select an existing connection from your saved connections
- Create a new connection with your AWS access key, secret key, and region
Security Best Practice: Use IAM roles with temporary credentials rather than long-term access keys when possible.
Refer
Check the documentation for Amazon S3 connector here.
Bucket Name Specify the destination S3 bucket for your uploads. Examples:
- company-data-lake
- customer-analytics-prod
- financial-reports-archive
File Name Field Specifies which field in your dataset contains the name to use for the uploaded file in S3. How It Works:
- Select an existing field from your dataset that contains the desired filename
- The value in this field will be used as the actual filename in S3
- The field can contain just the filename or include a path structure
Examples:
- Field value: report.csv → Uploads to s3://bucket-name/report.csv
- Field value: customer_123/profile.json → Uploads to s3://bucket-name/customer_123/profile.json
- Field value: reports/2024/04/daily.parquet → Uploads to s3://bucket-name/reports/2024/04/daily.parquet
Transformed Field Name
Creates a new field in your data that stores the complete S3 URL of the uploaded file. Example Value: s3://company-data-lake/reports/monthly/2024/04/data.parquet

How It Works?

This transformation follows these steps during execution:

Reads the current record's data from the pipeline
Establishes a secure connection to your S3 bucket
Uploads the content with the specified file name
Generates the complete S3 path/URL
Adds this path as a new field in your data record

Common Use Cases

Scenario	File Type	Naming Strategy	Benefit
Daily reports	CSV files	reports/${DATE}/summary.csv	Automatic date organization
Customer data	JSON objects	customers/${CUSTOMER_ID}.json	Easy lookup by ID
Image processing	Binary files	images/processed/${TIMESTAMP}.jpg	Chronological tracking
Log archiving	Text files	logs/${APP_NAME}/${DATE}/${HOUR}.log	Hierarchical organization

Best Practices

Structure Your Data - Use folder paths in your file names to create logical organization
Consider File Formats - Choose appropriate formats (CSV, JSON, Parquet) based on downstream needs
Set Up Lifecycle Rules - Configure S3 lifecycle policies to automatically archive or delete old files
Monitor Costs - Watch your S3 storage usage and implement appropriate storage classes
Implement Error Handling - Create fallback procedures for failed uploads

Troubleshooting

Issue: Upload failures

Check your AWS credentials and permissions
Verify network connectivity to AWS
Ensure the bucket name is correct and accessible

Issue: Files overwritten unexpectedly

Implement unique naming with timestamps or UUIDs
Enable S3 versioning on your bucket
Use conditional checks before uploads

Issue: Slow performance

Consider compressing large files before upload
Evaluate your network bandwidth limitations
For very large files, explore multipart uploads