Systems | Development | Analytics | API | Testing

Cloudera

Get Your Analytics Insights Instantly - Without Abandoning Central IT

Do you need faster time to value? Does your organization’s success depend on immediate delivery of new reports, applications, or projects? When you go to Central IT for support, are you blocked by insanely long wait times for the resources needed to meet your business goals? If so – you are likely one of the growing group of Line of Business (LoB) professionals forced into creating your own solution – creating your own Shadow IT.

Building a Machine Learning Application With Cloudera Data Science Workbench And Operational Database, Part 3: Productionization of ML models

In this last installment, we’ll discuss a demo application that uses PySpark.ML to make a classification model based off of training data stored in both Cloudera’s Operational Database (powered by Apache HBase) and Apache HDFS. Afterwards, this model is then scored and served through a simple Web Application. For more context, this demo is based on concepts discussed in this blog post How to deploy ML models to production.

Digital Transformation is a Data Journey From Edge to Insight

Digital transformation is a hot topic for all markets and industries as it’s delivering value with explosive growth rates. Consider that Manufacturing’s Industry Internet of Things (IIOT) was valued at $161b with an impressive 25% growth rate, the Connected Car market will be valued at $225b by 2027 with a 17% growth rate, or that in the first three months of 2020, retailers realized ten years of digital sales penetration in just three months.

How to configure clients to connect to Apache Kafka Clusters securely - Part 3: PAM authentication

In the previous posts in this series, we have discussed Kerberos and LDAP authentication for Kafka. In this post, we will look into how to configure a Kafka cluster to use a PAM backend instead of an LDAP one. The examples shown here will highlight the authentication-related properties in bold font to differentiate them from other required security properties, as in the example below. TLS is assumed to be enabled for the Apache Kafka cluster, as it should be for every secure cluster.

Cloudera Flow Management Continuous Delivery while Minimizing Downtime

Cloudera Flow Management, based on Apache NiFi and part of the Cloudera DataFlow platform, is used by some of the largest organizations in the world to facilitate an easy-to-use, powerful, and reliable way to distribute and process data at high velocity in the modern big data ecosystem. Increasingly, customers are adopting CFM to accelerate their enterprise streaming data processing from concept to implementation.

Finding digital transformation in high places - how a ski resort improved operational agility and customer experiences

Most blogs in my history are very focused on Industry 4.0’s digital transformation of the manufacturing industry, which in itself is pretty remarkable. By 2025, Industry 4.0 is expected to generate greater than $11 trillion in economic value as connected manufacturing processes, operations and their supply chains become more streamlined, efficient, agile and realize improved productivity, improved uptime and product quality.

Cloudera Data Warehouse Demonstrates Best-in-Class Cloud-Native Price-Performance

Cloud data warehouses allow users to run analytic workloads with greater agility, better isolation and scale, and lower administrative overhead than ever before. With the ability to quickly provision on-demand and the lower fixed and administrative costs, the costs of operating a cloud data warehouse are driven mostly by the price-performance of the specific data warehouse platform.

Brick and Mortar Stores are Now Built Brick by Brick with Digital Insights

In my last three blogs (Get to Know Your Retail Customer: Accelerating Customer Insight and Relevance; Improving your Customer-Centric Merchandising with Location-based in-Store Merchandising; and Maximizing Supply Chain Agility through the “Last Mile” Commitment) I painted a picture that showed an ever-changing landscape in retail, considering that consumers are more in control than ever, mobile (at least somewhat digitally mobile considering the pandemic) and socially connected.

Building a Machine Learning Application With Cloudera Data Science Workbench And Operational Database, Part 2: Querying/ Loading Data

In this installment, we’ll discuss how to do Get/Scan Operations and utilize PySpark SQL. Afterward, we’ll talk about Bulk Operations and then some troubleshooting errors you may come across while trying this yourself. Read the first blog here. Get/Scan Operations In this example, let’s load the table ‘tblEmployee’ that we made in the “Put Operations” in Part 1. I used the same exact catalog in order to load the table. Executing table.show() will give you:

Apache NiFi - the data movement enabler in a hybrid cloud environment

Cloudera provides its customers with a set of consistent solutions running on-premises and in the cloud to ensure customers are successful in their data journey for all of their use cases, regardless of where they are deployed. Cloudera DataFlow provides Apache NiFi in both the Cloudera Data Platform Private Cloud Base (on-premises) and Public Cloud (AWS, Azure, and Google Cloud) products in this hybrid cloud strategy.