Announcing Support for Replication and Intelligent-Tiering for Amazon S3 Tables

Today we’re announcing two new features for Amazon S3 tables: support for the new Intelligent-Tiering storage class, which automatically optimizes costs based on access patterns, and replication support to automatically maintain consistent replicas of Apache Iceberg tables across AWS regions and accounts without manual synchronization.

Organizations working with tabular data face two common problems. First, they must manually manage storage costs as their data sets grow and access patterns change over time. Second, when maintaining replicas of Iceberg tables across regions or accounts, they must build and maintain complex architectures to track updates, manage object replication, and handle metadata transformations.

S3 Tables Intelligent-Tiering storage class

With the S3 Tables Intelligent-Tiering storage class, data is automatically assigned to the most economical access level based on access patterns. Data is stored in three low-latency tiers: Frequent Access, Infrequent Access (40% lower cost than Frequent Access), and Immediate Archive Access (68% lower cost compared to Infrequent Access). After 30 days of no access, the data moves to occasional access and after 90 days to immediate access to the archive. This is done without changing your applications or impacting performance.

Table maintenance operations, including compression, snapshot expiration, and deletion of unreferenced files, work without affecting third-party access to the data. Compaction automatically processes only data in the Frequent Access layer, optimizing performance for actively queried data while reducing maintenance costs by skipping cooler files in cheaper layers.

By default, all existing tables use the Standard storage class. When creating new tables, you can specify Intelligent-Tiering as the storage class, or you can rely on the default storage class configured at the table segment level. You can set Intelligent-Tiering as the default storage class for your table sector so that tables are automatically stored in Intelligent-Tiering when no storage class is specified at creation.

Let me show you how it works

You can use the AWS Command Line Interface (AWS CLI) and put-table-bucket-storage-class and get-table-bucket-storage-class commands to change or verify the storage level of your S3 table.

# Change the storage class
aws s3tables put-table-bucket-storage-class \
   --table-bucket-arn $TABLE_BUCKET_ARN  \
   --storage-class-configuration storageClass=INTELLIGENT_TIERING

# Verify the storage class
aws s3tables get-table-bucket-storage-class \
   --table-bucket-arn $TABLE_BUCKET_ARN  \

{ "storageClassConfiguration":
   { 
      "storageClass": "INTELLIGENT_TIERING"
   }
}

Support for replication of S3 tables

New S3 table replication support helps you maintain consistent read replicas of your tables across AWS regions and accounts. You specify a target table segment and the service creates read-only replica tables. It replicates all updates chronologically, preserving the relationships between parent and child snapshots. Table replication helps you create global data sets to minimize query latency for geographically distributed teams, meet compliance requirements, and provide data protection.

You can now easily create replicated tables that provide similar query performance to their source tables. Replica tables are updated within minutes of updates to the source tables and support independent encryption and retention policies from the source tables. Replica tables can be queried using Amazon SageMaker Unified Studio or any Iceberg-compatible engine including DuckDB, PyIceberg, Apache Spark, and Trino.

You can create and maintain replicas of your tables through the AWS Management Console or the AWS APIs and SDKs. You specify one or more target table segments to replicate your source tables. When you turn on replication, S3 Tables automatically creates read-only replica tables in target table buckets, populates them with the latest state of the source table, and constantly monitors for new updates to keep the replicas in sync. This helps you meet time travel and auditing requirements while maintaining multiple replicas of your data.

Let me show you how it works

To show you how it works, I’m going through three steps. First I create an S3 table bucket, create an Iceberg table and populate it with data. Second, I configure replication. Third, I join the replicated table and query the data to show you that the changes are replicated.

For this demo, the S3 team kindly gave me access to an already set up Amazon EMR cluster. You can follow the Amazon EMR documentation to create your own cluster. They also created two buckets of S3 tables, the replication source and target. Again, the S3 Tables documentation will help you get started.

I note two S3 Tables Amazon Resource Names (ARN) buckets. I call them environment variables in this demo SOURCE_TABLE_ARN and DEST_TABLE_ARN.

First step: Prepare the source database

I fire up a terminal, connect to the EMR cluster, start a Spark session, create a table, and insert a row of data. The commands I use in this sample are documented in Accessing Tables Using the Amazon S3 Tables Iceberg REST Endpoint.

sudo spark-shell \
--packages "org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.1,software.amazon.awssdk:bundle:2.20.160,software.amazon.awssdk:url-connection-client:2.20.160" \
--master "local(*)" \
--conf "spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions" \
--conf "spark.sql.defaultCatalog=spark_catalog" \
--conf "spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkCatalog" \
--conf "spark.sql.catalog.spark_catalog.type=rest" \
--conf "spark.sql.catalog.spark_catalog.uri=https://s3tables.us-east-1.amazonaws.com/iceberg" \
--conf "spark.sql.catalog.spark_catalog.warehouse=arn:aws:s3tables:us-east-1:012345678901:bucket/aws-news-blog-test" \
--conf "spark.sql.catalog.spark_catalog.rest.sigv4-enabled=true" \
--conf "spark.sql.catalog.spark_catalog.rest.signing-name=s3tables" \
--conf "spark.sql.catalog.spark_catalog.rest.signing-region=us-east-1" \
--conf "spark.sql.catalog.spark_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO" \
--conf "spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.SimpleAWSCredentialProvider" \
--conf "spark.sql.catalog.spark_catalog.rest-metrics-reporting-enabled=false"

spark.sql("""
CREATE TABLE s3tablesbucket.test.aws_news_blog (
customer_id STRING,
address STRING
) USING iceberg
""")

spark.sql("INSERT INTO s3tablesbucket.test.aws_news_blog VALUES ('cust1', 'val1')")

spark.sql("SELECT * FROM s3tablesbucket.test.aws_news_blog LIMIT 10").show()
+-----------+-------+
|customer_id|address|
+-----------+-------+
|      cust1|   val1|
+-----------+-------+

So far so good.

Second step: Configure replication for S3 tables

Now I use CLI on my laptop to configure S3 table bucket replication.

Before doing so, I create an AWS Identity and Access Management (IAM) policy that allows the replication service access to my S3 bucket and encryption keys. See the S3 table replication documentation for details. The permissions I used for this demo are:

{
    "Version": "2012-10-17",
    "Statement": (
        {
            "Effect": "Allow",
            "Action": (
                "s3:*",
                "s3tables:*",
                "kms:DescribeKey",
                "kms:GenerateDataKey",
                "kms:Decrypt"
            ),
            "Resource": "*"
        }
    )
}

After creating this IAM policy, I can now go ahead and configure replication:

aws s3tables-replication put-table-replication \
--table-arn ${SOURCE_TABLE_ARN} \
--configuration  '{
    "role": "arn:aws:iam:::role/S3TableReplicationManualTestingRole", 
    "rules":(
        {
            "destinations": (
                {
                    "destinationTableBucketARN": "${DST_TABLE_ARN}"
                })
        }
    )

Replication starts automatically. Updates typically replicate within minutes. The time required to complete depends on the amount of data in the source table.

Step Three: Connect to the replicated table and query the data

Now I reconnect to the EMR cluster and start a second Spark session. This time I’m using a target table.

Replication of S3 tables - target table

To verify that replication is working, I insert a second row of data into the source table.

spark.sql("INSERT INTO s3tablesbucket.test.aws_news_blog VALUES ('cust2', 'val2')")

I wait a few minutes for the replication to start. I monitor the state of replication using get-table-replication-status command.

aws s3tables-replication get-table-replication-status \
--table-arn ${SOURCE_TABLE_ARN} \
{
    "sourceTableArn": "arn:aws:s3tables:us-east-1:012345678901:bucket/manual-test/table/e0fce724-b758-4ee6-85f7-ca8bce556b41",
    "destinations": (
        {
            "replicationStatus": "pending",
            "destinationTableBucketArn": "arn:aws:s3tables:us-east-1:012345678901:bucket/manual-test-dst",
            "destinationTableArn": "arn:aws:s3tables:us-east-1:012345678901:bucket/manual-test-dst/table/5e3fb799-10dc-470d-a380-1a16d6716db0",
            "lastSuccessfulReplicatedUpdate": {
                "metadataLocation": "s3://e0fce724-b758-4ee6-8-i9tkzok34kum8fy6jpex5jn68cwf4use1b-s3alias/e0fce724-b758-4ee6-85f7-ca8bce556b41/metadata/00001-40a15eb3-d72d-43fe-a1cf-84b4b3934e4c.metadata.json",
                "timestamp": "2025-11-14T12:58:18.140281+00:00"
            }
        }
    )
}

When the replication status is displayed readyi connect to the EMR cluster and query the target table. Unsurprisingly, I see a new row of data.

Replication of S3 tables - target table is current

Other things you should know

Here are some other points to pay attention to:

Replication for S3 Tables supports both Apache Iceberg V2 and V3 table formats, giving you flexibility in choosing the table format.
You can configure replication at the table segment level, making it easy to replicate all tables in that segment without configuring individual tables.
Your replica tables retain the storage class you choose for the target tables, meaning you can optimize for your specific cost and performance needs.
Any Iceberg-compatible catalog can query your replica tables directly without any additional coordination—it just needs to point to the location of the replica table. This gives you flexibility in your choice of query modules and tools.

Price and availability

You can monitor storage usage by access level through AWS cost and usage reports and Amazon CloudWatch metrics. To monitor replication, AWS CloudTrail logs provide events for each replicated object.

There are no additional fees to configure Smart Tiering. You only pay the cost of storage in each level. Your spreadsheets continue to work as before with automatic cost optimization based on your access patterns.

To replicate S3 tables, you pay S3 table fees for storage in the target table, PUT replication requests, table updates (commitments), and object monitoring on the replicated data. When replicating a table across regions, you also pay to transfer data between regions from Amazon S3 to the destination region based on the Region pair.

As usual, see the Amazon S3 pricing page for details.

Both features are available today in all AWS regions where S3 tables are supported.

To learn more about these new features, visit the Amazon S3 tables documentation or try them out in the Amazon S3 console today. Share your feedback via AWS re:Post for Amazon S3 or via AWS support contacts.

— self

Announcing Support for Replication and Intelligent-Tiering for Amazon S3 Tables | Amazon Web Services

Leave a Comment Cancel reply