Systems | Development | Analytics | API | Testing

Data Lakes

Diving Deep Into a Data Lake

A Data Lake is used to refer to a massive amount of data stored in a structured, unstructured, semi-structured, or raw form. The purpose is just to consolidate data into one destination and make it usable for data science and analytics algorithms. This data is used for observational, computational, and scientific purposes. The database has made it easier for AI models to gather data from various resources and implement a flawless system that can make informed decisions.

Data Lakes: The Achilles Heel of the Big Data Movement

Big Data started as a replacement for data warehouses. The Big Data vendors are loath to mention this fact today. But if you were around in the early days of Big Data, one of the central topics discussed was — if you have Big Data do you need a data warehouse? From a marketing standpoint, Big Data was sold as a replacement for a data warehouse. With Big Data, you were free from all that messy stuff that data warehouse architects were doing.

Cloudera's Open Data Lakehouse Supercharged with dbt Core(tm)

dbt allows data teams to produce trusted data sets for reporting, ML modeling, and operational workflows using SQL, with a simple workflow that follows software engineering best practices like modularity, portability, and continuous integration/continuous development (CI/CD).

What Challenges Are Hindering the Success of Your Data Lake Initiative?

Conventional databases are no longer the appropriate solution in a world where data volume is growing every second. Many modern businesses are adopting big data technologies like data lakes to counter data volume and velocity. Data lake infrastructures such as Apache Hadoop are designed to handle data in large capacities. These infrastructures offer benefits such as data replication for enhanced protection and multi-node computing for faster data processing.

Chose Both: Data Fabric and Data Lakehouse

A key part of business is the drive for continual improvement, to always do better. “Better” can mean different things to different organizations. It could be about offering better products, better services, or the same product or service for a better price or any number of things. Fundamentally, to be “better” requires ongoing analysis of the current state and comparison to the previous or next one. It sounds straightforward: you just need data and the means to analyze it.

The Modern Data Lakehouse: An Architectural Innovation

Imagine having self-service access to all business data, anywhere it may be, and being able to explore it all at once. Imagine quickly answering burning business questions nearly instantly, without waiting for data to be found, shared, and ingested. Imagine independently discovering rich new business insights from both structured and unstructured data working together, without having to beg for data sets to be made available.

5 Insights from Gartner's Hype Cycle for Data Management 2022 Report

As a global leader in technology research, Gartner supports enterprise organizations, non-profits, and government agencies by sharing information and in-depth analysis of emerging technological trends, tools, and products. With the continued growth of big data over the past decade, Gartner has been especially invested in helping data and analytics (D&A) leaders make the right decisions for managing and generating value from data within their organizations.

8 Reasons to Build Your Cloud Data Lake on Snowflake

You want to enable analytics, data science, or applications with data so you can answer questions, predict outcomes, discover relationships, or grow your business. But to do any of that, data must be stored in a manner to support these outcomes. This may be a simple decision when supporting a small, well-known use case, but it quickly becomes complicated as you scale the data volume, variety, workloads, and use cases.

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

We are excited to announce the general availability of Apache Iceberg in Cloudera Data Platform (CDP). Iceberg is a 100% open table format, developed through the Apache Software Foundation, and helps users avoid vendor lock-in. Today’s general availability announcement covers Iceberg running within key data services in the Cloudera Data Platform (CDP)—including Cloudera Data Warehousing (CDW), Cloudera Data Engineering (CDE), and Cloudera Machine Learning (CML).