Ep 63 | Open Lakehouse Architecture: How to Scale AI to Production
Open lakehouse architecture is becoming the foundation for production AI and enterprise AI at scale.
In this episode of The AI Forecast, Dipankar Mazumdar, Director of Developer Relations at Cloudera and co-author of the book “Engineering Lakehouse with Open Table Formats,” joins Paul Muller to explain why open lakehouse architecture is critical for moving from AI pilot to production AI.
They break down:
✅ How Apache Iceberg and open table formats decouple storage from compute
✅ How schema evolution enables change without costly data rewrites
✅ How multiple engines can securely access the same data without duplication
✅ How to prevent small-file performance bottlenecks
✅ How to control AI compute costs at scale
✅ How to embed governance, metadata, and data lineage into AI workloads
Production-ready AI requires scalable data architecture and governance built in from day one. AI and GenAI pilots may be everywhere, but your architecture is what truly decides what survives.
- Stay in touch with Dipankar:*
- Dipankar Mazumdar on LinkedIn: https://www.linkedin.com/in/dipankar-mazumdar/
- Dipankar’s website: https://dipankarmazumdar.github.io/
- Dipankar’s book on Amazon: https://www.amazon.com/Engineering-Lakehouses-Open-Table-Formats-ebook/dp/B0DKJD39X8
- Links & Resources*
- 👉Full Series Playlist: https://www.youtube.com/playlist
- 👉Learn more: https://www.cloudera.com/resources/podcast/the-ai-forecast.html
Chapters:
00:00 Intro & Welcome to The AI Forecast
01:44 The Fast Four: Getting to Know Dipankar Mazumdar
06:51 3 Best Practices for Working With Data
09:27 Dipankar's Journey to Developer Advocacy
14:10 What Exactly is a Data Lakehouse?
20:39 Why Write a Book on Open Table Formats?
25:35 Common Misconceptions in Lakehouse Adoption
28:43 Anti-Patterns: Streaming Workloads & The "Small Files" Problem
34:26 Balancing Cloud Costs & Context Engineering for AI
39:31 Connecting Lakehouse Architecture to Business Value
41:52 Why Governance and Lineage are Non-Negotiable
44:22 Operational Pain Points: Moving from POC to Production
47:25 The Future: How Lakehouses Power Generative AI
51:09 Conclusion & Where to Learn More
- Connect with Cloudera*
- Subscribe to stay ahead of the curve with the latest in data strategy, open architectures, and enterprise AI innovations. https://www.cloudera.com
- LinkedIn ► https://www.linkedin.com/company/cloudera
- Facebook ► https://www.facebook.com/cloudera/
- X ► https://x.com/cloudera
- Spotify ► https://open.spotify.com/show/102S8zoZR6nmZV0HxZlxZu
#DataLakehouse #ApacheIceberg #DataEngineering #Cloudera #OpenTableFormats #DataArchitecture #GenerativeAI #TechPodcast #DataScience