Systems | Development | Analytics | API | Testing

Latest News

Challenges of running a big data distro in the cloud

There are many reasons to run a big data distribution, such as Cloudera Data Hub (CDH) and Hortonworks Data Platform (HDP), in the cloud with Infrastructure-as-a-Service (IaaS). The main reason is agility. When the business needs to onboard a new use case, a data admin can bring on additional virtual infrastructure to their clusters in the cloud in minutes or hours. With an on-prem cluster, it may take weeks or months to add the infrastructure capacity for the new use cases.

Evolving Insurance with Data and Analytics

Insurance companies around the world are striving ahead with innovative offerings that are fundamentally changing the insurance landscape. Insurance companies are creating personalized offerings and products that are tailored to the specific needs of their customers. For example, they are implementing usage-based insurance (UBI) based on driving habits, miles driven and driving history and discounts on health insurance based on health trackers, etc.).

The U.S. Census Enters the Digital Age with Cloudera

2020 brings a new decade, and for the U.S Census Bureau, a new challenge. As the federal government’s—and the nation’s—leading provider of demographic and economic data, its largest initiative is the U.S. Census, which is conducted every 10 years and counts every resident in the United States. For the first time in U.S history, the census will be conducted primarily online instead of by mail.

Now Is the Time to Take Stock in Your Dataops Readiness: Are Your Systems Ready?

As the global business climate is experiencing rapid change due to the health crisis, the role of data to provide much needed solutions to urgent issues are being highlighted throughout the world. Helping customers manage critical modern data systems for years, Unravel sees a heightened interest in fortifying the reliability of business operations in healthcare, logistics, financial services and telecommunications.

How do I move data from MySQL to BigQuery?

In a market where streaming analytics is growing in popularity, it’s critical to optimize data processing so you can reduce costs and ensure data quality and integrity. One approach is to focus on working only with data that has changed instead of all available data. This is where change data capture (CDC) comes in handy. CDC is a technique that enables this optimized approach.

Supercharge ML models with Distributed Xgboost on CML

Since childhood, we’ve been taught about the power of coalitions: working together to achieve a shared objective. In nature, we see this repeated frequently – swarms of bees, ant colonies, prides of lions – well, you get the idea. It is no different when it comes to Machine Learning models. Research and practical experience show that groups or ensembles of models do much better than a singular, silver bullet model. Intuitively, this makes sense.

Operational Database Administration

This blog post is part of a series on Cloudera’s Operational Database (OpDB) in CDP. Each post goes into more details about new features and capabilities. Start from the beginning of the series with, Operational Database in CDP. This blog post gives you an overview of the operational database (OpDB) administration tools and features in the Cloudera Data Platform.

Benchmarking NiFi Performance and Scalability

Ever wonder how fast Apache NiFi is? Ever wonder how well NiFi scales? When a customer is looking to use NiFi in a production environment, these are usually among the first questions asked. They want to know how much hardware they will need, and whether or not NiFi can accommodate their data rates. This isn’t surprising. Today’s world consists of ever-increasing data volumes. Users need tools that make it easy to handle these data rates.

Enabling Olympic-level performance and productivity for Delta Lake on Databricks

Recently, Databricks introduced Delta Lake, a new analytics platform that combines the best elements of data lakes and data warehouses in a paradigm it calls a “lakehouse.” Delta Lake expands the breadth and depth of use cases that Databricks customers can enjoy. Databricks provides a unified analytics platform that provides robust support for use cases ranging from simple reporting to complex data warehousing to real-time machine learning.