Systems | Development | Analytics | API | Testing

Latest Posts

From Hive Tables to Iceberg Tables: Hassle-Free

For more than a decade now, the Hive table format has been a ubiquitous presence in the big data ecosystem, managing petabytes of data with remarkable efficiency and scale. But as the data volumes, data variety, and data usage grows, users face many challenges when using Hive tables because of its antiquated directory-based table format. Some of the common issues include constrained schema evolution, static partitioning of data, and long planning time because of S3 directory listings.

12 Times Faster Query Planning With Iceberg Manifest Caching in Impala

Iceberg is an emerging open-table format designed for large analytic workloads. The Apache Iceberg project continues developing an implementation of Iceberg specification in the form of Java Library. Several compute engines such as Impala, Hive, Spark, and Trino have supported querying data in Iceberg table format by adopting this Java Library provided by the Apache Iceberg project.

Integrating Cloudera Data Warehouse with Kudu Clusters

Apache Impala and Apache Kudu make a great combination for real-time analytics on streaming data for time series and real-time data warehousing use cases. More than 200 Cloudera customers have implemented Apache Kudu with Apache Spark for ingestion and Apache Impala for real-time BI use cases successfully over the last decade, with thousands of nodes running Apache Kudu.

How to Manage Risk with Modern Data Architectures

The recent failures of regional banks in the US, such as Silicon Valley Bank (SVB), Silvergate, Signature, and First Republic, were caused by multiple factors. To ensure the stability of the US financial system, the implementation of advanced liquidity risk models and stress testing using (MI/AI) could potentially serve as a protective measure.

Five Ways A Modern Data Architecture Can Reduce Costs in Telco

During the COVID-19 pandemic, telcos made unprecedented use of data and data-driven automation to optimize their network operations, improve customer support, and identify opportunities to expand into new markets. This is no less crucial today, as telcos balance the needs to cut costs and improve efficiencies while delivering innovative products and services.

Do You Know Where All Your Data Is?

In spite of diligent digital transformation efforts, most financial services institutions still support a loose patchwork of siloed systems and repositories. These dis-integrated resources are “data platforms” in name only: in addition to their high maintenance costs, their lack of interoperability with other critical systems makes it difficult to respond to business change.

One Big Cluster Stuck: Platform Health

Clearly environmental health and high performance are dependent on the proper implementation, tuning, and use of CDP, hardware, and microservices. Ideally you have Visibility and Transparency into existing high priority problems in your environment. The links below will carry you to regions within the Cloudera Community where you will find best practices to properly implement and tune hardware and services.

Beyond Monitoring: Introducing Cloudera Observability

Increased costs and wasted resources are on the rise as software systems have moved from monolithic applications to distributed, service-oriented architectures. As a result, over the past few years, interest in observability has seen a marked rise. Observability, borrowed from its control theory context, has found a real sweet spot for organizations looking to answer the question “why,” that monitoring alone is unable to answer.