Systems | Development | Analytics | API | Testing

Blog

6 Data Cleansing Strategies For Your Organization

The success of data-driven initiatives for enterprise organizations depends largely on the quality of data available for analysis. This axiom can be summarized simply as garbage in, garbage out: low-quality data that is inaccurate, inconsistent, or incomplete often results in low-validity data analytics that can lead to poor business decision-making.

Cloudera Data Engineering - Integration steps to leverage spark on Kubernetes

Cloudera Data Engineering is a serverless service for Cloudera Data Platform (CDP) that allows you to submit jobs to auto-scaling virtual clusters. CDE enables you to spend more time on your applications, and less time on infrastructure. CDE allows you to create, manage, and schedule Apache Spark jobs without the overhead of creating and maintaining Spark clusters.

No Data Loss and No Service Interruption - HDF to CFM Rolling Migration

The blog “Migrating Apache NiFi Flows from HDF to CFM with Zero Downtime” detailed how many common NiFi dataflows can be easily migrated when the Hortonworks DataFlow and Cloudera Flow Management clusters are running side-by-side. But what if you lack the resources to run multiple NiFi clusters concurrently? Not a problem.

In the event-driven galaxy, which metadata matters most?

As a developer, you're no stranger to your vast and varied data environment… Or are you? The tremendous amount of data your organization collects is stored in various sources and formats. You need a way to understand where and what data is, to be able to do what you need to do: build amazing event-driven applications.

Empowering Founding Engineers

Massive tomes have been written on engineering management, but I thought it might be helpful to take a brief minute to discuss setting up your Founding Engineers (FE) for success. For this post I define FEs as the first wave of engineers hired after the founding team. This round of hiring usually takes place after seed funding has been secured and some semblance of initial product/market fit has been achieved.

BI Tool Integrations for Heroku Postgres

Heroku is a powerful platform for application development. Users can build and deploy on the cloud, and you can effortlessly scale up once your app takes off. And behind every app, you'll find an equally powerful database: Heroku Postgres. If you're building Heroku apps, you'll find them to be a rich source of operational and customer data. Add in the right Business Intelligence (BI) tools, and you'll be able to derive insights about the inner workings of your organization.