Systems | Development | Analytics | API | Testing

Analytics

Automating Data Pipelines in CDP with CDE Managed Airflow Service

When we announced the GA of Cloudera Data Engineering back in September of last year, a key vision we had was to simplify the automation of data transformation pipelines at scale. By leveraging Spark on Kubernetes as the foundation along with a first class job management API many of our customers have been able to quickly deploy, monitor and manage the life cycle of their spark jobs with ease. In addition, we allowed users to automate their jobs based on a time-based schedule.

Dining with data: A Q&A with OpenTable's Senior Vice President of Data and Analytics Grant Parsamyan

For more than 20 years, OpenTable has connected foodies and novice diners with the restaurants they love. But how does its technology work on the back end? To make a long story short: data. Beyond the app and website, OpenTable provides restaurants with software that manages their floor plans, phone reservations, walk-ins, shift scheduling, turn times, and more.

Transforming Customer Data for Salesforce

CRM (customer relationship management) software is the lifeblood of any modern B2C company. By monitoring and storing all of your interactions with prospects and customers—from their first visit to your website to their most recent purchase—CRM software makes it dramatically easier to segment your customer base, identify hidden trends in the data, make smarter predictions, and forecasts, and much more.

Announcing the GA of Cloudera DataFlow for the Public Cloud

Are you ready to turbo-charge your data flows on the cloud for maximum speed and efficiency? We are excited to announce the general availability of Cloudera DataFlow for the Public Cloud (CDF-PC) – a brand new experience on the Cloudera Data Platform (CDP) to address some of the key operational and monitoring challenges of standard Apache NiFi clusters that are overloaded with high-performant flows.

Cloudera DataFlow for the Public Cloud: A technical deep dive

We just announced Cloudera DataFlow for the Public Cloud (CDF-PC), the first cloud-native runtime for Apache NiFi data flows. CDF-PC enables Apache NiFi users to run their existing data flows on a managed, auto-scaling platform with a streamlined way to deploy NiFi data flows and a central monitoring dashboard making it easier than ever before to operate NiFi data flows at scale in the public cloud.

Cloudera DataFlow for the Public Cloud

Cloudera DataFlow for the Public Cloud takes away the operational and monitoring challenges by providing cloud-native flow management capabilities powered by Apache NiFi. It is a purposely built framework to modernize the data flow user experience so that the NiFi developers and administrators can be prepared to easily handle sophisticated data flows in production.

Building an ETL Pipeline in Python

Thanks to its user-friendliness and popularity in the field of data science, Python is one of the best programming languages for ETL. Still, coding an ETL pipeline from scratch isn’t for the faint of heart — you’ll need to handle concerns such as database connections, parallelism, job scheduling, and logging yourself. The good news is that Python makes it easier to deal with these issues by offering dozens of ETL tools and packages.

What Is Homomorphic Encryption?

Data encryption is one of the smartest things any organization can do to protect the privacy and security of confidential and sensitive data. Using a unique encryption key, data is converted to an intermediate representation known as “ciphertext,” which usually appears as a jumbled mixture of letters and numbers to the human eye. This encrypted data will be meaningless to anyone without the corresponding decryption key—even malicious actors who breach an organization’s defenses.