Systems | Development | Analytics | API | Testing

Cloudera

Universal Data Distribution with Cloudera DataFlow for the Public Cloud

The speed at which you move data throughout your organization can be your next competitive advantage. Cloudera DataFlow greatly simplifies your data flow infrastructure facilitating complex data collection and movement through a unified process that seamlessly transfers data throughout your organization. Even as you scale. With Cloudera DataFlow for Public Cloud you can collect and move any data (structured, unstructured, and semi-structured) from any source to any destination with any frequency (real-time streaming, batch, and micro-batch).

AI at Scale isn't Magic, it's Data - Hybrid Data

A recent VentureBeat article , “4 AI trends: It’s all about scale in 2022 (so far),” highlighted the importance of scalability. I recommend you read the entire piece, but to me the key takeaway – AI at scale isn’t magic, it’s data – is reminiscent of the 1992 presidential election, when political consultant James Carville succinctly summarized the key to winning – “it’s the economy”.

Cloudera's Open Data Lakehouse Supercharged with dbt Core(tm)

dbt allows data teams to produce trusted data sets for reporting, ML modeling, and operational workflows using SQL, with a simple workflow that follows software engineering best practices like modularity, portability, and continuous integration/continuous development (CI/CD).

Scaling Kafka Brokers in Cloudera Data Hub

This blog post will provide guidance to administrators currently using or interested in using Kafka nodes to maintain cluster changes as they scale up or down to balance performance and cloud costs in production deployments. Kafka brokers contained within host groups enable the administrators to more easily add and remove nodes. This creates flexibility to handle real-time data feed volumes as they fluctuate.

How to Distribute Machine Learning Workloads with Dask

Tell us if this sounds familiar. You’ve found an awesome data set that you think will allow you to train a machine learning (ML) model that will accomplish the project goals; the only problem is the data is too big to fit in the compute environment that you’re using. In the day and age of “big data,” most might think this issue is trivial, but like anything in the world of data science things are hardly ever as straightforward as they seem.

Cloudera DataFlow Functions for Public Cloud powered by Apache NiFi

Since its initial release in 2021, Cloudera DataFlow for Public Cloud (CDF-PC) has been helping customers solve their data distribution use cases that need high throughput and low latency requiring always-running clusters. CDF-PC’s DataFlow Deployments provides a cloud-native runtime to run your Apache NiFi flows through auto scaling Kubernetes clusters as well as centralized monitoring and alerting and improved SDLC for developers.

Data Governance and Strategy for the Global Enterprise

While the word “data” has been common since the 1940s, managing data’s growth, current use, and regulation is a relatively new frontier. Governments and enterprises are working hard today to figure out the structures and regulations needed around data collection and use. According to Gartner, by 2023 65% of the world’s population will have their personal data covered under modern privacy regulations.

Serverless NiFi Flows with DataFlow Functions: The Next Step in the DataFlow Service Evolution

Cloudera DataFlow for the Public Cloud (CDF-PC) is a cloud-native service for Apache NiFi within the Cloudera Data Platform (CDP). CDF-PC enables organizations to take control of their data flows and eliminate ingestion silos by allowing developers to connect to any data source anywhere with any structure, process it, and deliver to any destination using a low-code authoring experience.