When developing or debugging a stream processing pipeline with Flink SQL, it’s common to inspect each processing step's output to ensure data is being transformed properly. However, comprehending the resulting data stream's structure, distribution, and characteristics entails executing multiple ad-hoc SQL queries, which can be time-consuming and tedious. Additionally, isolating specific subsets of the stream for analysis or debugging often involves even more queries, adding to the complexity and time required.
A well-known debate: tabs or spaces? Sure, we could set up a Google Form to collect this data, but where’s the fun in that? Let’s settle the debate, Kafka-style. We’ll use the new confluent-kafka-javascript client (not in general availability yet) to build an app that produces the current state of the vote counts to a Kafka topic and consumes from that same topic to surface them to a JavaScript frontend.
Learn how to leverage the native monitoring capabilities of the Python Kafka producer along with Confluent Cloud’s Metrics API while exploring how linger.ms affects latency and batch sizes.
Modern edge computing is transforming industries including manufacturing, healthcare, transportation, defense, retail, energy, and much more—pushing data management to far-reaching data sources to enable connected, low latency operations and enhanced decision making. These new use cases shift workloads to the left—requiring real-time data streaming and processing at the edge, right where the data is generated.
When building a distributed system, developers are often faced with something known as the dual-write problem. It occurs whenever the system needs to perform individual writes to separate systems that can't be transactionally linked. This situation creates the potential for data loss if the developer isn't careful. However, techniques such as the Transactional Outbox Pattern and Event Sourcing can be used to guard against the potential for data loss while also providing added resilience to the system.
Tabs or spaces? Merge vs. rebase? Flink SQL vs. KStreams? Let’s Settle This is powered by a new Kafka JavaScript client from Confluent: confluent-kafka-javascript (early access). Find out how Lucia used it to make the website in the video above.
The headless data architecture. Is it a fad? Some marketecture? Or something real? In this video, Adam Bellemare takes you through the basics of the headless data architecture and why it’s beginning to emerge as its own respective pattern. Driven by the decoupling of data computation from storage, the headless data architecture provides the basis for a modular data ecosystem. Stream your data for near real-time low latency use cases, or convert it to an Iceberg table for analytical use cases.
In the space of APIs for consuming up-to-date data (say, events or state available within an hour of occurring) many API paradigms exist. There are file- or object-based paradigms, e.g., S3 access. There’s database access, e.g., direct Snowflake access. Last, we have decoupled client-server APIs, e.g., REST APIs, gRPC, webhooks, and streaming APIs.
In this video, you will see an example of how Tributary bank uses asynchronous events to enrich its domain and protect its fraud detection system from failures.