Apache Iceberg - Under the Hood

In this video, Dipankar breaks down how Apache Iceberg works under the hood - starting from the limitations of Hive-style tables to why Iceberg was built in the first place. He covers:

✅ Why Hive-based tables break at scale (Netflix example)
✅ How object storage changes the problem (S3 behavior, listing, throttling)
✅ Iceberg architecture (catalog, metadata, snapshots, manifests, data files)
✅ How query planning works step by step
✅ Why Iceberg is a specification — not an execution engine

Join the Cloudera Community to learn more! 👉https://community.cloudera.com
Explore the Full Series: 👉 https://www.youtube.com/playlist

Chapters:

00:00 Introduction: Cloudera Developers & Learning Journey

00:45 What to Expect: Deep Dive into Apache Iceberg Internals

03:30 The Problem: Scaling Challenges & Expensive Updates

05:20 The Netflix Origin Story: Why Iceberg was Born

11:15 From Directories to Metadata: The Big Architectural Shift

13:10 Inside the Iceberg Architecture: The Catalog

15:35 Demo: Inspecting a Metadata File

19:30 Metadata Components

24:10 Summary: Turning Object Stores into Databases

#ApacheIceberg #DataEngineering #CloudComputing #OpenLakehouse #Cloudera #DataLakehouse #DataArchitecture #SoftwareEngineering #OpenSource #ApacheHive