Systems | Development | Analytics | API | Testing

Latest Videos

Future of Data Meetup: Future of data and analytics in the Hybrid & Multi Cloud

The most valuable and transformative business use cases require multiple analytics workloads and data science tools and machine learning algorithms to run against the same diverse data sets. It’s how the most innovative enterprises unlock value from their data. Turning data into useful insights is not easy, to say the least. The workloads need to be optimised for hybrid and multi-cloud environments, delivering the same data management capabilities across bare metal, private and public clouds. In this session, we will discuss how businesses can leverage the combination of best-in-class software and public cloud to help businesses turn raw data into actionable insights, without the overheads and without compromising performance, security and governance.

Introducing Cloudera DataFlow for the Public Cloud

With the rise of streaming data (or, data-in-motion), companies must figure out how to deliver high-scale data ingestion, transformation, and management. In this session, you’ll see how Cloudera Data Platform’s (CDP) new DataFlow service provides real-time data movement capabilities to address hybrid cloud use cases.

Developing a Basic Web Application using an Operational DB on CDP

In this video, you'll see a simple demo on how you can build a web application on top of a Cloudera Operational Database. We'll leverage the Apache Phoenix integration to easily write SQL statements against our database and use the python flask library to power the back end API calls. The web application will be hosted within Cloudera Machine Learning, showcasing some of the benefits of having your data within a hybrid data platform.

Processing DICOM Files With Spark on CDP Hybrid Cloud

In this video, you will see how you can use PySpark to process medical images from an MRI and convert them from DICOM format to PNG. The data is read from and written to AWS S3 and we leverage numpy and the pydicom libraries to do the data transformation. We are using data from the "RSNA-MICCAI Brain Tumor Radiogenomic Classification" Kaggle competition but this approach can be used for general purpose DICOM processing.

Future of Data Meetup: CDP on Azure - Industrial Strength Data Engineering

Data Engineering is undergoing a huge evolution requiring faster and more reliable data pipelines. Apache Spark and Python are core foundational components of this new architecture enabling data engineers to quickly develop these pipelines. They also introduce challenges when moving to production. Come join us as we: Ask questions and learn. We will also have a raffle of Cloudera swag.