Systems | Development | Analytics | API | Testing

ETL

The Future of the Modern Data Stack

The Modern Data Stack is quickly picking up steam in tech circles as the go-to cloud data architecture, and although its popularity has been quickly rising, it can be ambiguously defined at times. In this blog post we’ll discuss what it is, how it came to be, and where we see it going in the future. Regardless of whether you’re new to the modern data stack or have been an early adopter, there should be something of interest for everyone.

When to Use Change Data Capture

Automated ETL (extract, transform, load) and data integration workflows are essential for the modern data-driven organization, and they can swiftly and efficiently migrate data from sources to a target data warehouse or data lake. But ETL must run at regular intervals — or even in real-time — so how can you know which information is fresh and which information you’ve already ingested? Solving this problem is the goal of change data capture (CDC) techniques.

What is Customer Data Ingestion?

The verdict is in: The more you analyze your customer data, the better chance you have of outperforming your business rivals, attracting new prospects and providing excellent service. For example, a report by McKinsey & Company has good news for companies who are "intensive users of customer analytics:" Their chances of excelling at new customer acquisition and being highly profitable are 23 and 19 times more likely, respectively, than those of their competitors.

What is ETL?

The ETL process involves moving data from a source, transforming the data in some way, and loading the information to the same or a different source. You may feel a little confused the first time you encounter an ETL process. With the right platform, though, you can adjust quickly and learn how to manipulate data to make it more valuable.

Looking for an ETL tool? Stop. Right. Here.

You have started your data journey. You know you need to somehow collect data from various sources and land them into a data warehouse or data lake of some sort. Right now you’re browsing tools and calculating costs - there’s one for extraction, another one for transformations, there’s an ETL tool. What if we told you there’s a better way?

3 Reasons Extract, Load & Transform is a Bad Idea

Extract, Load, Transform (ELT) technology makes it easy for organizations to pull data from databases, applications, and other sources, and move it into a data lake. But companies pay for this convenience in many ways. ELT solutions can have a negative impact on data privacy, data quality, and data management.

ETL with Apache Airflow

Written in Python, Apache Airflow is an open-source workflow manager used to develop, schedule, and monitor workflows. Created by Airbnb, Apache Airflow is now being widely adopted by many large companies, including Google and Slack. Being a workflow management framework, Apache Airflow differs from other frameworks in that it does not require exact parent-child relationships. Instead, you only need to define parents between data flows, automatically organizing them into a DAG (directed acyclic graph).