Analytics

How to compete with analytics-first software vendors

These are a new class of vendors like Gainsight and C3 who are building applications based on the idea data will drive a transaction, rather than transactions driving the data. The challenge for every enterprise software vendor is how to respond to this threat because it's going to be difficult. For big vendors, you're going to have vested interests internally who don't see this challenge coming or don't know how to respond to it. Some may even underestimate the threat of the change.

Converting HBase ACLs to Ranger policies

CDP is using Apache Ranger for data security management. If you wish to utilize Ranger to have a centralized security administration, HBase ACLs need to be migrated to policies. This can be done via the Ranger webUI, accessible from Cloudera Manager. But first, let’s take a quick overview of HBase method for access control.

10 Tips to Help You Write a Flat File Database

Originally developed by IBM, flat file databases have been around since the 1970s. Because these files store data in plain text format, most people use MS Excel to create them. It’s an easy-to-use system that allows for the quick sorting of results. This is because each line of plain text has just one record. Tabs, commas, or other delimiters separate multiple records. In this article, you’ll learn some tips for optimizing your flat file.

Announcing Iguazio Version 3.0: Breaking the Silos for Faster Deployment

We’re delighted to announce the release of the Iguazio Data Science Platform version 3.0. Data Engineers and Data Scientists can now deploy their data pipelines and models to production faster than ever with features that break down silos between Data Scientists, Data Engineers and ML Engineers and give you more deployment options . The development experience has been improved, offering better visibility of the artifacts and greater freedom of choice to develop with your IDE of choice.

Cloudera Data Platform (CDP) Private Cloud on Red Hat OpenShift

Learn how Cloudera and Red Hat help enterprise companies securely manage the complete data lifecycle, putting data to work faster and reducing time to value. Cloudera Data Platform (CDP) Private Cloud on Red Hat® OpenShift® aggregates and visualizes data to derive actionable insights in a secure, hybrid, and open-source environment.

HDFS Data Encryption at Rest on Cloudera Data Platform

Encryption of Data at Rest is a highly desirable or sometimes mandatory requirement for data platforms in a range of industry verticals including HealthCare, Financial & Government organizations. The capability increases security and protects sensitive data from various kinds of attack that could be internal or external to the platform.

AI/ML without DataOps is just a pipe dream!

Let’s start with a real-world example from one of my past machine learning (ML) projects: We were building a customer churn model. “We urgently need an additional feature related to sentiment analysis of the customer support calls.” Creating the data pipeline to extract this dataset took about 4 months! Preparing, building, and scaling the Spark MLlib code took about 1.5-2 months!