Systems | Development | Analytics | API | Testing

Latest Posts

Apache Ozone Powers Data Science in CDP Private Cloud

Apache Ozone is a scalable distributed object store that can efficiently manage billions of small and large files. Ozone natively provides Amazon S3 and Hadoop Filesystem compatible endpoints in addition to its own native object store API endpoint and is designed to work seamlessly with enterprise scale data warehousing, machine learning and streaming workloads. The object store is readily available alongside HDFS in CDP (Cloudera Data Platform) Private Cloud Base 7.1.3+.

The Ethics of Data Exchange

COVID-19 vaccines were developed in record time. One of the main reasons for the accelerated development was the quick exchange of data between academia, healthcare institutions, government agencies, and nonprofit entities. “COVID research is a great example of where sharing data and having large quantities of data to analyze would be beneficial to us all,” said Renee Dvir, solutions engineering manager at Cloudera.

Data-driven competitive advantage in the financial services industry

There is an urgent need for banks to be nimble and adaptable in the thick of a multitude of industry challenges, ranging from the maze of regulatory compliance, sophisticated criminal activities, rising customer expectations and competition from traditional banks and new digital entrants. As banks find their bearings in this landscape, what appear to be insurmountable odds are in fact opportunities for growth and competitive differentiation.

How Cloudera Helps Realize and Accelerate Successful Data Product Strategies

In the first part of this series , I outlined the prerequisites for a modern Enterprise Data Platform to enable complex data product strategies that address the needs of multiple target segments and deliver strong profit margins as the data product portfolio expands in scope and complexity: With this article, I will dive into the specific capabilities of the Cloudera Data Platform (CDP) that has helped organizations to meet the aforementioned prerequisite capabilities and fulfill a successful data prod

Automating Data Pipelines in CDP with CDE Managed Airflow Service

When we announced the GA of Cloudera Data Engineering back in September of last year, a key vision we had was to simplify the automation of data transformation pipelines at scale. By leveraging Spark on Kubernetes as the foundation along with a first class job management API many of our customers have been able to quickly deploy, monitor and manage the life cycle of their spark jobs with ease. In addition, we allowed users to automate their jobs based on a time-based schedule.

Announcing the GA of Cloudera DataFlow for the Public Cloud

Are you ready to turbo-charge your data flows on the cloud for maximum speed and efficiency? We are excited to announce the general availability of Cloudera DataFlow for the Public Cloud (CDF-PC) – a brand new experience on the Cloudera Data Platform (CDP) to address some of the key operational and monitoring challenges of standard Apache NiFi clusters that are overloaded with high-performant flows.

Cloudera DataFlow for the Public Cloud: A technical deep dive

We just announced Cloudera DataFlow for the Public Cloud (CDF-PC), the first cloud-native runtime for Apache NiFi data flows. CDF-PC enables Apache NiFi users to run their existing data flows on a managed, auto-scaling platform with a streamlined way to deploy NiFi data flows and a central monitoring dashboard making it easier than ever before to operate NiFi data flows at scale in the public cloud.

Why Modernizing the First Mile of the Data Pipeline Can Accelerate all Analytics

Every enterprise is trying to collect and analyze data to get better insights into their business. Whether it is consuming log files, sensor metrics, and other unstructured data, most enterprises manage and deliver data to the data lake and leverage various applications like ETL tools, search engines, and databases for analysis. This whole architecture made a lot of sense when there was a consistent and predictable flow of data to process.

The Foundations of a Modern Data-Driven Organisation: Change from Within (part 2 of 2)

In my previous blog post, I shared examples of how data provides the foundation for a modern organization to understand and exceed customers’ expectations. However, the important role data occupies extends beyond customer experience and revenue, as it becomes increasingly central in optimizing internal processes for the long-term growth of an organization.