Systems | Development | Analytics | API | Testing

Analytics

5 Ways to Process Small Data with Hadoop

From system logs to web scraping, there are many good reasons why you might have extremely large numbers of small data files at hand. But how can you efficiently process and analyze these files to uncover the hidden insights that they contain? You might think that you could process these small data files using a solution like Apache Hadoop, which has been specifically designed for handling large datasets.

Building a Machine Learning Application With Cloudera Data Science Workbench And Operational Database, Part 3: Productionization of ML models

In this last installment, we’ll discuss a demo application that uses PySpark.ML to make a classification model based off of training data stored in both Cloudera’s Operational Database (powered by Apache HBase) and Apache HDFS. Afterwards, this model is then scored and served through a simple Web Application. For more context, this demo is based on concepts discussed in this blog post How to deploy ML models to production.

Digital Transformation is a Data Journey From Edge to Insight

Digital transformation is a hot topic for all markets and industries as it’s delivering value with explosive growth rates. Consider that Manufacturing’s Industry Internet of Things (IIOT) was valued at $161b with an impressive 25% growth rate, the Connected Car market will be valued at $225b by 2027 with a 17% growth rate, or that in the first three months of 2020, retailers realized ten years of digital sales penetration in just three months.

How to configure clients to connect to Apache Kafka Clusters securely - Part 3: PAM authentication

In the previous posts in this series, we have discussed Kerberos and LDAP authentication for Kafka. In this post, we will look into how to configure a Kafka cluster to use a PAM backend instead of an LDAP one. The examples shown here will highlight the authentication-related properties in bold font to differentiate them from other required security properties, as in the example below. TLS is assumed to be enabled for the Apache Kafka cluster, as it should be for every secure cluster.

Goodbye 2020 - Hello 2021 Magic Quadrant for Analytics and BI Platforms

The wait is nearly over, and soon we’ll all be privy to this year’s Gartner Magic Quadrant for Analytics and BI Platforms. Qlik is proud of its 15-year history and ranking as a leader for the last decade in this signature research, and we are enthusiastic about sharing a complimentary copy of the full report when it publishes at this location: https://www.qlik.com/us/gartner-magic-quadrant-2021

Work at warp-speed in the BigQuery UI

Data analysts can spend hours writing SQL each day to get the right insights. So it’s crucial that the tools in the Google Cloud Console make that job as easy and as fast as possible. Now, we’re excited to show you how BigQuery’s Cloud Console UI has been updated with radical usability improvements for more efficient work, making it easier to find the data you need and write the right SQL quickly.

Cloudera Flow Management Continuous Delivery while Minimizing Downtime

Cloudera Flow Management, based on Apache NiFi and part of the Cloudera DataFlow platform, is used by some of the largest organizations in the world to facilitate an easy-to-use, powerful, and reliable way to distribute and process data at high velocity in the modern big data ecosystem. Increasingly, customers are adopting CFM to accelerate their enterprise streaming data processing from concept to implementation.

How Infutor Uses the Placekey External Function to Extend the Power of Snowflake

The Snowflake Data Cloud provides the unique ability for anyone to join their own data sets with thousands of live third-party data sets near-instantly, securely, and without moving data. Businesses operating in the Data Cloud gain a huge advantage over their competitors who are stuck in data silos and struggling with stale data sets downloaded from their legacy data providers weeks, months, or years ago.