Analytics

Managing Python dependencies for Spark workloads in Cloudera Data Engineering

Apache Spark is now widely used in many enterprises for building high-performance ETL and Machine Learning pipelines. If the users are already familiar with Python then PySpark provides a python API for using Apache Spark. When users work with PySpark they often use existing python and/or custom Python packages in their program to extend and complement Apache Spark’s functionality. Apache Spark provides several options to manage these dependencies.

Data Science vs. Big Data Marketing

Data science and big data are essential in today’s world of marketing. You’ve probably already seen multiple instances of both being used for advertising and sales purposes, but you may not realize just how useful they are. If you own a business, you need to know how to use data for your own marketing programs.

Neither Cloud nor SaaS Will Deliver Your Data's Full Potential

Your data now resides in the cloud, and you’ve chosen SaaS providers that use their own products (or drink their own champagne, as I like to say). Does that mean you’re getting the full value from your data? No. Chances are high your data is still siloed. This time, the culprits are your SaaS providers who collect and store your data, thus limiting the analytics you can perform on it.

Contextual analytics vs dashboards: What's the difference?

For 20 years, standalone BI tools have failed to penetrate more than 25% of the average organization, with most workers using them once a week, according to Eckerson Group. While many modern dashboards are sophisticated and user-friendly, they are still often accessed as standalone tools outside of line-of-business applications. This separation means it isn’t guaranteed that users will adopt BI, or gain insight from their data.

Future of Data Meetup: Exploring Data and Creating Interactive Dashboards in the Cloud

In this meetup, we’re going to once again put ourselves in the shoes of an electric car manufacturer that is deploying a recently developed electric motor out into their new cars. We’re going to show how to explore some data that has been previously collected through various different sources and stored into Apache Hive within a data warehouse, with the goal of tracking down a specific set of potentially defective parts. We’ll then take the results of this data exploration and create an interactive dashboard that presents our results in a visually appealing way using a BI tool that’s integrated right into the same data warehouse.

Fast Forward Live: Few-Shot Text Classification

Join us for this month's Machine Learning research discussion with Cloudera Fast Forward Labs. We will discuss few-shot text classification - including a live demo and Q&A. This is an applied research report by Cloudera Fast Forward. We write reports about emerging technologies. Accompanying each report are working prototypes or code that exhibits the capabilities of the algorithm and offer detailed technical advice on its practical application.

The New Releases of Apache NiFi in Public Cloud and Private Cloud

Cloudera released a lot of things around Apache NiFi recently! We just released Cloudera Flow Management (CFM) 2.1.1 that provides Apache NiFi on top of Cloudera Data Platform (CDP) 7.1.6. This major release provides the latest and greatest of Apache NiFi as it includes Apache NiFi 1.13.2 and additional improvements, bug fixes, components, etc. Cloudera also released CDP 7.2.9 on all three major cloud platforms, and it also brings Flow Management on DataHub with Apache NiFi 1.13.2 and more.