Systems | Development | Analytics | API | Testing

Data Warehouses

3x better performance with CDP Data Warehouse compared to EMR in TPC-DS benchmark

In a previous blog post on CDW performance, we compared Azure HDInsight to CDW. In this blog post, we compare Cloudera Data Warehouse (CDW) on Cloudera Data Platform (CDP) using Apache Hive-LLAP to EMR 6.0 (also powered by Apache Hive-LLAP) on Amazon using the TPC-DS 2.9 benchmark. Amazon recently announced their latest EMR version 6.1.0 with support for ACID transactions. This benchmark is run on EMR version 6.0 as we couldn’t get queries to run successfully on version 6.1.0.

How to Migrate Your Enterprise Data Warehouse to a Cloud Data Warehouse

Migrating a data warehouse from a legacy environment requires a massive upfront investment in resources and time. There is a lot to consider before and during migration. You may need to replan your data model, use a separate platform for tasks scheduling, and handle changes in the application’s database driver. Therefore, organizations must take a strategic approach to streamline the process. This article presents a step-by-step approach for migrating a data warehouse to the cloud.

How a Discovery Data Warehouse, the next evolution of augmented analytics, accelerates treatments and delivers medicines safely to patients in need

I met Matthew in New York City about a year ago. We sat in a private conference room and he told me the story of his pharma startup. A small group of researchers set out to solve the black-box enigma of certain kinds of vicious cancers. There are so many cancers, so their vision was to focus on especially heinous ones. Fast forward to their recent FDA approval of their “Hail Mary” procedure and treatment methodology for stage-four patients of a particular cancer.

Data Exploration & Reporting with Cloudera Data Warehouse

In this video, we’ll go over how you can use both Cloudera Public Cloud to both Ingest data through Cloudera Data Engineering as well as explore it through Hue and Impala within Cloudera Data Warehouse. You'll see how easy it is to run queries that give you insight into your data and how you can use a built in data visualization tool to then create a dashboard to share your results.

Data Lakes vs. Data Warehouses vs. Data Marts

Let’s precisely define the different kinds of data repositories to understand which ones meet your business needs. October 29, 2020 A data repository serves as a centralized location to combine data from a variety of sources and provides users with a platform to perform analytical tasks. There are several kinds of data repositories, each with distinct characteristics and intended use cases. Let’s discuss the peculiarities and uses of data warehouses, data marts and data lakes.

Cloudera Data Warehouse outperforms Azure HDInsight in TPC-DS benchmark

Performance is one of the key, if not the most important deciding criterion, in choosing a Cloud Data Warehouse service. In today’s fast changing world, enterprises have to make data driven decisions quickly and for that they rely heavily on their data warehouse service. In this blog post, we compare Cloudera Data Warehouse (CDW) on Cloudera Data Platform (CDP) using Apache Hive-LLAP to Microsoft HDInsight (also powered by Apache Hive-LLAP) on Azure using the TPC-DS 2.9 benchmark.

Choosing the right Data Warehouse SQL Engine: Apache Hive LLAP vs Apache Impala

Some of the most powerful results come from combining complementary superpowers, and the “dynamic duo” of Apache Hive LLAP and Apache Impala, both included in Cloudera Data Warehouse, is further evidence of this. Both Impala and Hive can operate at an unprecedented and massive scale, with many petabytes of data. Both are 100% Open source, so you can avoid vendor lock-in while you use your favorite BI tools, and benefit from community-driven innovation.