Systems | Development | Analytics | API | Testing

Cloudera

Metadata Management & Data Governance with Cloudera SDX

In this article, we will walk you through the process of implementing fine grained access control for the data governance framework within the Cloudera platform. This will allow a data office to implement access policies over metadata management assets like tags or classifications, business glossaries, and data catalog entities, laying the foundation for comprehensive data access control.

Using Streams Replication Manager Prefixless Replication for Kafka Topic Aggregation

Businesses often need to aggregate topics because it is essential for organizing, simplifying, and optimizing the processing of streaming data. It enables efficient analysis, facilitates modular development, and enhances the overall effectiveness of streaming applications. For example, if there are separate clusters, and there are topics with the same purpose in the different clusters, then it is useful to aggregate the content into one topic.

What's new in 2.6 | Cost Savings and Developer Improvement

Data engineers and analysts need a self-service way to build data movement flows to get critical data to where it needs to be. Cloudera DataFlow enables self-service by introducing fine grained access control with projects. Projects allow users to group flow drafts and deployments and give access to team members as needed.

Introduction to Ozone on Cloudera Data Platform

When considering whether Ozone is the right fit for your company, view it from several different angles. You can look at it from the perspective of Lower TCO, or reducing the carbon footprint of your Data Center. Other things to consider are how much your data is increasing and at what rate, and if you have enough hardware to cover that growth.

Optimization Strategies for Iceberg Tables

Apache Iceberg has recently grown in popularity because it adds data warehouse-like capabilities to your data lake making it easier to analyze all your data—structured and unstructured. It offers several benefits such as schema evolution, hidden partitioning, time travel, and more that improve the productivity of data engineers and data analysts. However, you need to regularly maintain Iceberg tables to keep them in a healthy state so that read queries can perform faster.

High Availability (Multi-AZ) for Cloudera Operational Database

In the previous blog post we covered the high availability feature of Cloudera Operational Database (COD) in Amazon AWS. Cloudera recently released a new version of COD, which adds HA support to Microsoft Azure-based databases in the Cloud. In this post, we’ll perform a similar test to validate that the feature works as expected in Azure, too.