Systems | Development | Analytics | API | Testing

Data dump to data catalog for Apache Kafka

From data stagnating in warehouses to a growing number of real-time applications, in this article we explain why we need a new class of Data Catalogs: this time for real-time data. The 2010s brought us organizations “doing big data”. Teams were encouraged to dump it into a data lake and leave it for others to harvest. But data lakes soon became data swamps.

The reinvention of the Telco: From Pipe to Processor

The next generation of 5G networks are unlocking a mind-bending array of new use cases. Blistering speed, super low latency, and access to more powerful mobile hardware bring VR, AR and ultra high-definition experiences into sharp focus for the near future. But there’s a bigger shift being driven by 5G, and it’s not actually about speed at all. It’s about re-thinking the modern telco business model.

Building a Scalable Process Using NiFi, Kafka and HBase on CDP

Navistar is a leading global manufacturer of commercial trucks. With a fleet of 350,000 vehicles, unscheduled maintenance and vehicle breakdowns created ongoing disruption to their business. Navistar required a diagnostics platform that would help them predict when a vehicle needed maintenance to minimize downtime.

How to Create a JavaScript Library. 7 Tips to Create a Library That Every Developer Loves Using

Have you ever found yourself copy-pasting the same bits of JavaScript code between different projects? Well, when this situation happens two or three times in a row, it’s usually a good indicator that you have a piece of code that is useful and reusable. So, if you are already reusing your code across several different projects, why not go the extra mile and convert it into a library that allows you to optimize this code-sharing?

Enabling high-speed Spark direct reader for Apache Hive ACID tables

Apache Hive supports transactional tables which provide ACID guarantees. There has been a significant amount of work that has gone into hive to make these transactional tables highly performant. Apache Spark provides some capabilities to access hive external tables but it cannot access hive managed tables. To access hive managed tables from spark Hive Warehouse Connector needs to be used.

Data Modeling in a Post-COVID-19 World

As a result of the COVID-19 pandemic, organizations around the world have had to transform overnight. Businesses that had been delaying digital transformation, or that hadn’t been thinking about it at all, have suddenly realized that moving their data analytics to the cloud is the key to coping with and surviving the COVID-19 disruption. The next phase is about rebounding and thriving in a post-COVID-19 world.

What is data modeling and how can you model data for higher analytical outputs?

Being data-driven helps businesses to cut costs and produce higher returns on investments, increasing their financial viability in the fight for a piece of the market pie. But *becoming* data-driven is a more labor-intensive process. In the same way that companies must align themselves around business objectives, data professionals must align their data around data models. In other words: if you want to run a successful data-driven operation, you need to model your data first.