Systems | Development | Analytics | API | Testing

Analytics

Operational Database Integrity

This blog post is part of a series on Cloudera’s Operational Database (OpDB) in CDP. Each post goes into more details about new features and capabilities. Start from the beginning of the series with, Operational Database in CDP. This blog post provides an overview of the OpDB data integrity capabilities that help you achieve ACID transactions and data consistency. OpDB guarantees certain properties to ensure atomicity, durability, consistency, and visibility.

Are you using the right data strategy based on the hierarchy of data needs?

Being data-driven is the holy grail of modern business. It allows you to grow 8x faster than your competition, boosts your company’s net earnings by 30% and will have VCs throwing money at you if your organization relies on AI. So, what strategy does one use to become data-driven? Well, it’s actually quite simple: If you follow this recipe to the T, you can have your data cake and eat it.

Your Next Decision Could Change Lives: Why We Need Data Skills and Analytics

The year was 1993. The place, a little town in Sweden. A serial killer was on the loose. He randomly shot at people standing at bus stops or sitting in their cars, killing one and wounding many others. The residents of Malmö lived in fear. Window blinds were shut, playgrounds were deserted. The police didn’t know where to start.

The challenges you'll face deploying machine learning models (and how to solve them)

In 2019, organizations invested $28.5 billion into machine learning application development (Statistica). Yet, only 35% of organizations report having analytical models fully deployed in production (IDC). When you connect those two statistics, it’s clear that there are a breadth of challenges that must be overcome to get your models deployed and running.

Pentaho 9.0 Teaser: Multcluster Enhancements

Many organizations want to run any workload from any location without the burden of rearchitecting or refactoring applications. Often, they’ll want to leverage their existing on-premise Hadoop investments and provide a seamless experience to data consumers when they migrate to the cloud to take advantage of the usability, scalability and elasticity of cloud-native solutions. Watch this video to learn more about the Pentaho’s 9.0 multicluster enhancements.

One billion files in Ozone

Apache Hadoop Ozone is a distributed key-value store that can manage both small and large files alike. Ozone was designed to address the scale limitations of HDFS with respect to small files. HDFS is designed to store large files and the recommended number of files on HDFS is 300 million for a Namenode, and doesn’t scale well beyond this limit.

Operational Database Availability

This blog post is part of a series on Cloudera’s Operational Database (OpDB) in CDP. Each post goes into more details about new features and capabilities. Start from the beginning of the series with, Operational Database in CDP. This blog post gives you an overview of the high availability configuration capabilities of Cloudera’s OpDB. Cloudera’s Operational Database (OpDB) is a cluster-based software, which comes configured for High Availability (HA) out of the box.

Augment EMR Workloads with CDP

The first thing that comes to mind when talking about synergy is how 2+2=5. Being the writer that he is, Mark Twain described it a lot more eloquently as “the bonus that is achieved when things work together harmoniously”. There is a multitude of product and business examples to illustrate the point and I particularly like how car manufacturers can bring together relatively small engines to do big things.

Create custom functionalities in Keboola's Developer Portal

Every time you write another piece of code that picks up data from an FTP server, a small piece of you dies. As a developer in the data space, you know what we’re talking about. 80% of your time can be taken by building and improving the environment and tools, maintenance tasks, and pieces of functionality. That's simply too much time dedicated away from tackling more important issues.