Skip to main content
The Data Lifecycle optimizer enables data retention and storage tiering capabilities on Iceberg tables. It operates at the partition level, automatically moving old data to reduce storage costs and support compliance requirements. Data Lifecycle supports two complementary capabilities:
  • Retention - permanently delete old partitions based on age
  • Tiering - archive old partitions to low-cost storage
Both capabilities are optional and can be enabled independently or together.

Requirements

To enable Data Lifecycle on a table, the following requirements must be met:
  • The table must be partitioned
  • A partition key must be selected to represent partition age. Supported data types are date, time, timestamp, timestamptz.
  • Lifecycle policies apply only to entire partitions, not individual rows

Data Retention

Retention permanently deletes partitions that exceed a configured age threshold. Retention is commonly used to:
  • Enforce compliance policies (e.g. GDPR, data minimization)
  • Cap table growth
  • Remove data that is no longer operationally or analytically useful
Partitions are removed from the table once they pass the threshold. The underlying data files are physically deleted only after all referencing snapshots are expired, based on the table’s snapshot lifecycle configuration. This ensures correctness and consistency across readers and engines.

Data Tiering

Tiering archives old partitions by moving their data files to lower-cost cold storage, reducing storage costs by up to 95%. The partition’s data is not readable until it is restored. Tiering is designed for data that:
  • Must be retained for compliance or resilience reasons
  • Is rarely accessed (e.g. once or twice per year)
  • Does not need to remain immediately queryable
Tiering behavior and considerations:
  • Tiering moves data files only; Iceberg metadata remains intact. Archived partitions continue to appear in Iceberg metadata tables.
  • Queries that attempt to read archived partitions will fail until the data is restored.
  • Write operations that scan archived data (including updates, merges, and most deletes) will fail.
  • Tiering runs daily. If new data is written to a partition that is already archived, it may remain accessible briefly before being archived.
  • Restoring older snapshots does not automatically restore archived partitions; archived data remains archived until explicitly restored.
  • Shortening the archiving period (e.g. from 240 days to 180 days) does not un-archive partitions that were already archived.

Restoring Archived Partitions

Archived partitions can be temporarily restored to make their data readable again. All archived partitions are listed in the Storage tab under Archived Partitions. When restoring, users can select:
  • Which partitions to restore (by time range)
  • How long to restore them for (1-30 days)
  • Restoration speed
    • Standard - takes up to 12 hours to restore. Cloud provider cost: $0.10 per 1000 objects, $0.02 per GiB.
    • Bulk - takes up to 48 hours to restore. Cloud provider cost: $0.025 per 1000 objects, $0.0025 per GiB.
Before confirming the restoration, you’ll be able to see the estimated cloud costs for the operation. This is the total cost of restoring (based on total object count and size), and the cost of temporarily storing the restored version of these files.
Note: restoration is temporary. Restored partitions are automatically re-archived when the restoration period expires.If restored data needs to remain permanently accessible, the recommended approach is to copy it to a new table.Example:
CREATE TABLE raw_events_2024_01_to_2024_12
   AS SELECT *
        FROM raw_events
       WHERE day(event_time) >= '2024-01-01'
         AND day(event_time) <= '2024-12-31'