Systems | Development | Analytics | API | Testing

Data Lakes

Data Lake vs Data Warehouse

Data warehouses and data lakes represent two of the leading solutions for enterprise data management in 2023. While data warehouses and data lakes may share some overlapping features and use cases, there are fundamental differences in the data management philosophies, design characteristics, and ideal use conditions for each of these technologies.

Data Lake Architecture & The Future of Log Analytics

Organizations are leveraging log analytics in the cloud for a variety of use cases, including application performance monitoring, troubleshooting cloud services, user behavior analysis, security operations and threat hunting, forensic network investigation, and supporting regulatory compliance initiatives. But with enterprise data growing at astronomical rates, organizations are finding it increasingly costly, complex, and time-consuming to capture, securely store, and efficiently analyze their log data.

10 AWS Data Lake Best Practices

A data lake is the perfect solution for storing and accessing your data, and enabling data analytics at scale - but do you know how to make the most of your AWS data lake? In this week’s blog post, we’re offering 10 data lake best practices that can help you optimize your AWS S3 data lake set-up and data management workflows, decrease time-to-insights, reduce costs, and get the most value from your AWS data lake deployment.

The Pros and Cons of Data Mesh vs Data Lake

Data has become the lifeblood of modern businesses, and organizations are constantly looking for ways to extract more value from it. While there isn’t a one-size-fits-all solution for data management, organizations tend to take some common approaches. Two popular approaches to managing data are Data Mesh and Data Lake. Data meshes and data lakes have recently become popular strategies for groups that want to avoid silos so they can make data-driven decisions.

Transforming Manufacturing Data: The Power of Qlik and Databricks Together

Manufacturing is undergoing a massive transformation. Driven by technological advancements that generate vast amounts of data. The industry is moving towards becoming smarter, more sustainable, and services driven. The fragmented nature of manufacturing’s data architecture however, has created barriers to realizing the full value of data, with many projects stalling at the Proof-of-Concept stage.

Data lake vs. data mesh: Which one is right for you?

What’s the right way to manage growing volumes of enterprise data, while providing the consistency, data quality and governance required for analytics at scale? Is centralizing data management in a data lake the right approach? Or is a distributed data mesh architecture right for your organization? When it comes down to it, most organizations seeking these solutions are looking for a way to analyze data without having to move or transform it via complex extract, transform and load (ETL) pipelines.

Educating ChatGPT on Data Lakehouse

As the use of ChatGPT becomes more prevalent, I frequently encounter customers and data users citing ChatGPT’s responses in their discussions. I love the enthusiasm surrounding ChatGPT and the eagerness to learn about modern data architectures such as data lakehouses, data meshes, and data fabrics. ChatGPT is an excellent resource for gaining high-level insights and building awareness of any technology. However, caution is necessary when delving deeper into a particular technology.

Open Data Lakehouse powered by Apache Iceberg on Apache Ozone

With minimal setup, it is this simple to get started with Iceberg on Ozone in CDP Private Cloud. This ability allows you to reap the benefits of both a powerful exabyte-scale storage system and an optimized table format for petabyte-scale analytics. In this video I'm going to demonstrate how to create, upgrade and use iceberg tables on Ozone in CDP Private Cloud. Iceberg is engine agnostic and it works with most analytic query engines like Hive, Impala, Spark and so on.

Snowflake Workloads Explained: Data Lakes

Snowflake’s cross-cloud platform breaks down silos by supporting a variety of data types and storage patterns. Data engineers, data scientists, analysts, and developers across organizations can access governed structured, semi-structured, and unstructured data for a variety of workloads, without resource contention or concurrency issues.

Isn't the Data Warehouse the Same Thing as the Data Lakehouse?

A data lakehouse is a data storage repository designed to store both structured data and data from unstructured sources. It allows users to access data stored in different forms, such as text files, CSV or JSON files. Data stored in a data lakehouse can be used for analysis and reporting purposes.