Systems | Development | Analytics | API | Testing

Data Lakes

Open Data Lakehouse powered by Apache Iceberg on Apache Ozone

With minimal setup, it is this simple to get started with Iceberg on Ozone in CDP Private Cloud. This ability allows you to reap the benefits of both a powerful exabyte-scale storage system and an optimized table format for petabyte-scale analytics. In this video I'm going to demonstrate how to create, upgrade and use iceberg tables on Ozone in CDP Private Cloud. Iceberg is engine agnostic and it works with most analytic query engines like Hive, Impala, Spark and so on.

Educating ChatGPT on Data Lakehouse

As the use of ChatGPT becomes more prevalent, I frequently encounter customers and data users citing ChatGPT’s responses in their discussions. I love the enthusiasm surrounding ChatGPT and the eagerness to learn about modern data architectures such as data lakehouses, data meshes, and data fabrics. ChatGPT is an excellent resource for gaining high-level insights and building awareness of any technology. However, caution is necessary when delving deeper into a particular technology.

Snowflake Workloads Explained: Data Lakes

Snowflake’s cross-cloud platform breaks down silos by supporting a variety of data types and storage patterns. Data engineers, data scientists, analysts, and developers across organizations can access governed structured, semi-structured, and unstructured data for a variety of workloads, without resource contention or concurrency issues.

Isn't the Data Warehouse the Same Thing as the Data Lakehouse?

A data lakehouse is a data storage repository designed to store both structured data and data from unstructured sources. It allows users to access data stored in different forms, such as text files, CSV or JSON files. Data stored in a data lakehouse can be used for analysis and reporting purposes.


From Data Warehouse to Lakehouse

This is a guest post for written by Bill Inmon, an American computer scientist recognized as the "father of the data warehouse." Inmon wrote the first book and first magazine column about data warehousing, held the first conference about this topic, and was the first person to teach data warehousing classes.


How to Integrate BI and Data Visualization Tools with a Data Lake

For the past 30 years, the primary data source for business intelligence (BI) and data visualization tools has generally been either a data warehouse or a data mart. But as enterprises today struggle to cope with the growing complexity, scale, and speed of data, it’s becoming clear that the data tools of 30 years ago weren’t designed to handle the enterprise data management challenges of today - especially with the growing variety and amounts of data that enterprises are generating.


From Data Lake to Data Mesh: How Data Mesh Benefits Businesses

Current data architecture is going through a revolution. Enterprises are starting to shift away from the monolithic data lake towards something less centralized: data mesh. It’s a relatively new concept, first coined in 2019, that addresses potential issues with data warehouses and data lakes that can cause businesses to be slow, unresponsive, or even suffer from data silos. What is a data mesh, and how could it benefit your business?


All the Features A Robust Data Lake Should Have

From databases to data warehouses and, finally, to data lakes, the data landscape is changing rapidly as volumes and sources of data increase. With a growth projection of almost 30%, the data lake market will grow from USD 3.74 billion in 2020 to USD 17.6 billion by 2026. Also, from the 2022 Data and AI Summit, it is clear that data lake architecture is the future of data management and governance.

Modern Data Architectures | Data Mesh, Data Fabric, & Data Lakehouse

For years, companies have viewed data the wrong way. They see it as the byproduct of a business interaction and this data often ends up collecting dust in centralized silos governed by data teams who lack the expertize to understand its true value. Cloudera is ushering in a new era of data architecture by allowing experts to organize and manage their own data at the source. Data mesh brings all your domains together so each team can benefit from each other’s data.