Systems | Development | Analytics | API | Testing

Cloudera

Cloudera Streaming Analytics 1.4: the unification of SQL batch and streaming

In October of 2020 Cloudera acquired Eventador and Cloudera Streaming Analytics (CSA) 1.3.0 was released early in 2021. It was the first release to incorporate SQL Stream Builder (SSB) from the acquisition, and brought rich SQL processing to the already robust Apache Flink offering. The team’s focus turned to bringing Flink Data Definition Language (DDL) and the batch interface into SSB with that completed.

Validations - Cloudera Support's Predictive Alerting Program

Cloudera Support’s cluster validations proactively identify known problem signatures contained in customers’ diagnostic data with the goal of increasing cluster health, performance, and overall stability. Cluster validations are included in a customer’s enterprise subscription at no additional cost. All customers with access to the Support case portal will also be able to take advantage of cluster validations.

Fast Forward Live: Session-based Recommender Systems

Join us live with Fast Forward Labs to discuss the recently possible in Machine Learning and AI. Being able to recommend an item of interest to a user (based on their past preferences) is a highly relevant problem in practice. A key trend over the past few years has been session-based recommendation algorithms that provide recommendations solely based on a user’s interactions in an ongoing session, and which do not require the existence of user profiles or their entire historical preferences. This report explores a simple, yet powerful, NLP-based approach (word2vec) to recommend a next item to a user. While NLP-based approaches are generally employed for linguistic tasks, here we exploit them to learn the structure induced by a user’s behavior or an item’s nature.

Future of Data Meetup: The Power of "Yes" or: How I learned to Stop Worrying and Love Governance

Full data lifecycle projects hold tremendous potential for organizations to uncover new insights and drivers of revenue and profitability. Big Data has brought the promise of doing device data capture, data enrichment, data science, and analytics at scale to enterprises. This promise also comes with challenges for developers, admins, and consumers to continuously access new data and collaborate.

Modernizing Data Pipelines using Cloudera Data Platform - Part 1

Data pipelines are in high demand in today’s data-driven organizations. As critical elements in supplying trusted, curated, and usable data for end-to-end analytic and machine learning workflows, the role of data pipelines is becoming indispensable. To keep up, data pipelines are being vigorously reshaped with modern tools and techniques.

Apache Ozone Metadata Explained

Apache Ozone is a distributed object store built on top of Hadoop Distributed Data Store service. It can manage billions of small and large files that are difficult to handle by other distributed file systems. As an important part of achieving better scalability, Ozone separates the metadata management among different services: Ozone Manager (OM) service manages the metadata of the namespace such as volume, bucket and keys.

The Ethics of AI Comes Down to Conscious Decisions

This blog post was written by Pedro Pereira as a guest author for Cloudera. Right now, someone somewhere is writing the next fake news story or editing a deepfake video. An authoritarian regime is manipulating an artificial intelligence (AI) system to spy on technology users. No matter how good the intentions behind the development of a technology, someone is bound to corrupt and manipulate it. Big data and AI amplify the problem. “If you have good intentions, you can make it very good.