Systems | Development | Analytics | API | Testing

Latest Posts

Enabling The Full ML Lifecycle For Scaling AI Use Cases

When it comes to machine learning (ML) in the enterprise, there are many misconceptions about what it actually takes to effectively employ machine learning models and scale AI use cases. When many businesses start their journey into ML and AI, it’s common to place a lot of energy and focus on the coding and data science algorithms themselves.

Cloudera Replication Plugin enables x-platform replication for Apache HBase

The Cloudera Data Platform (CDP) is the latest Big Data offering from Cloudera. It includes Apache HBase and Phoenix as part of the platform. These two components are provided in 3 form-factors: Cloudera’s Apache HBase customers typically run mission-critical applications that cannot afford any downtime. They need a way to migrate to a new deployment either without a production outage or, at a minimum, a tiny outage.

The role of data in COVID-19 vaccination record keeping

The role of data in COVID-19 vaccination record keeping Now that the Pfizer vaccine has been approved by the FDA for use in the US, and the Moderna vaccine likely isn’t far behind, we are now on the verge of being able to emerge from the social distancing world that began earlier in 2020. Recent news has talked about distributing a vaccination record card to everyone who gets a COVID-19 vaccine.

Bringing transaction support to Cloudera Operational Database

We’re excited to share that after adding ANSI SQL, secondary indices, star schema, and view capabilities to Cloudera’s Operational Database, we will be introducing distributed transaction support in the coming months. The ACID model of database design is one of the most important concepts in databases. ACID stands for atomicity, consistency, isolation, and durability. For a very long time, strict adherence to these four properties was required for a commercially successful database.

How does Apache Spark 3.0 increase the performance of your SQL workloads

Across nearly every sector working with complex data, Spark has quickly become the de-facto distributed computing framework for teams across the data and analytics lifecycle. One of most awaited features of Spark 3.0 is the new Adaptive Query Execution framework (AQE), which fixes the issues that have plagued a lot of Spark SQL workloads. Those were documented in early 2018 in this blog from a mixed Intel and Baidu team.

Top 4 Reasons Why You Should Upgrade Your Stream Processing Workloads To CDP

If there’s one thing enterprises have learned in 2020, it’s how to navigate through uncertain times, and in 2021, organizations will likely have to continue navigating through a shifting landscape. One trend that we’ve seen this year, is that enterprises are leveraging streaming data as a way to traverse through unplanned disruptions, as a way to make the best business decisions for their stakeholders.

Covid Data: An anomalous blip, or the new normal?

COVID-19 has forced virtually every industry to embrace an acceleration in digital capabilities. While it can be argued that digital transformation was already underway; it’s hard to dispute that it has accelerated in recent months. A recent McKinsey survey, cited in CRN, shows that worldwide, 58 percent of customer interactions were digital as of July 2020.

3x better performance with CDP Data Warehouse compared to EMR in TPC-DS benchmark

In a previous blog post on CDW performance, we compared Azure HDInsight to CDW. In this blog post, we compare Cloudera Data Warehouse (CDW) on Cloudera Data Platform (CDP) using Apache Hive-LLAP to EMR 6.0 (also powered by Apache Hive-LLAP) on Amazon using the TPC-DS 2.9 benchmark. Amazon recently announced their latest EMR version 6.1.0 with support for ACID transactions. This benchmark is run on EMR version 6.0 as we couldn’t get queries to run successfully on version 6.1.0.