Systems | Development | Analytics | API | Testing

Latest News

Scalable Python on BigQuery using Dask and NVIDIA GPUs

BigQuery is Google Cloud’s fully managed serverless data platform that supports querying using ANSI SQL. BigQuery also has a data lake storage engine that unifies SQL queries with other open source processing frameworks such as Apache Spark, Tensorflow, and Dask. BigQuery storage provides an API layer for OSS engines to process data. This API enables mixing and matching programming in languages like Python with structured SQL in the same data platform.

Fraud Detection With Cloudera Stream Processing Part 2: Real-Time Streaming Analytics

In part 1 of this blog we discussed how Cloudera DataFlow for the Public Cloud (CDF-PC), the universal data distribution service powered by Apache NiFi, can make it easy to acquire data from wherever it originates and move it efficiently to make it available to other applications in a streaming fashion.

Kafka best practices: Monitoring and optimizing the performance of Kafka applications

Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. Administrators, developers, and data engineers who use Kafka clusters struggle to understand what is happening in their Kafka implementations.

Performance considerations for loading data into BigQuery

It is not unusual for customers to load very large data sets into their enterprise data warehouse. Whether you are doing an initial data ingestion with hundreds of TB of data or incrementally loading from your systems of record, performance of bulk inserts is key to quicker insights from the data. The most common architecture for batch data loads uses Google Cloud Storage(Object storage) as the staging area for all bulk loads.

Introductory Guide to Business Cash Flow Planning

We all want better business cash flow and we want it yesterday. You can’t plan for emergencies, geopolitics, or sudden problems that you have no control over. But you can mitigate risks of business cash flow problems by having the right tools at your side. Business cash flow planning can get you out of a jam and save your company. Take a look at our ultimate guide to business cash flow planning highlighting.

How to stop failing at data

Innovate or die. It’s one of the few universal rules of business, and it’s one of the main reasons we continue to invest so heavily in data. Only through data can we get the key insights we need to innovate faster, smarter, better and keep ahead of the market. And yet, the vast majority of data initiatives are doomed to fail. Nearly nine out of 10 data science projects never make it to production.

Accelerating IT to the Speed of Business

Businesses today adapt to change with breathtaking speed. Lines of business now follow their products out the door digitally and continue tracking usage throughout the lifecycle. This gives companies valuable insights into how customers use and interact with their products, enabling them to analyze and evolve solutions while nurturing customer relationships.

Why Replicating HBase Data Using Replication Manager is the Best Choice

In this article we discuss the various methods to replicate HBase data and explore why Replication Manager is the best choice for the job with the help of a use case. Cloudera Replication Manager is a key Cloudera Data Platform (CDP) service, designed to copy and migrate data between environments and infrastructures across hybrid clouds.