Analytics

Why SQL is your key to querying Kafka

If you’re an engineer exploring a streaming platform like Kafka, chances are you’ve spent some time trying to work out what’s going on with the data in there. But if you’re introducing Kafka to a team of data scientists or developers unfamiliar with its idiosyncrasies, you might have spent days, weeks, months trying to tack on self-service capabilities. We’ve been there.

Data dump to data catalog for Apache Kafka

From data stagnating in warehouses to a growing number of real-time applications, in this article we explain why we need a new class of Data Catalogs: this time for real-time data. The 2010s brought us organizations “doing big data”. Teams were encouraged to dump it into a data lake and leave it for others to harvest. But data lakes soon became data swamps.

CDO Sessions: Getting Real with Data Analytics

Big data leaders are no doubt being challenged with market uncertainty. Data-driven insights can help organizations assess, and uncover market risk and opportunities that may arise during uncertain times. As businesses around the world adapt to digitization initiatives, modern data systems have become more mission critical toward continuity and competitive differentiation.

Enabling high-speed Spark direct reader for Apache Hive ACID tables

Apache Hive supports transactional tables which provide ACID guarantees. There has been a significant amount of work that has gone into hive to make these transactional tables highly performant. Apache Spark provides some capabilities to access hive external tables but it cannot access hive managed tables. To access hive managed tables from spark Hive Warehouse Connector needs to be used.

Use AI To Quickly Handle Sensitive Data Management

The growing waves of data that you’re pulling in include sensitive, personal or confidential data. This can become a compliance nightmare, especially with rules around PII, GDPR and CCPA, and it takes too much time to manually decide what should be protected. In this session, we will show how AI-driven data catalogs can identify sensitive data and share  that identification with your data security platforms to automate its discovery, identification and security.  You'll see how this dramatically reduces your time to onboard data and makes it safely available  to your business  communities.

What is data modeling and how can you model data for higher analytical outputs?

Being data-driven helps businesses to cut costs and produce higher returns on investments, increasing their financial viability in the fight for a piece of the market pie. But *becoming* data-driven is a more labor-intensive process. In the same way that companies must align themselves around business objectives, data professionals must align their data around data models. In other words: if you want to run a successful data-driven operation, you need to model your data first.