Systems | Development | Analytics | API | Testing

June 2024

How to Scale RAG and Build More Accurate LLMs

This article was originally published on The New Stack on June 10, 2024. Retrieval augmented generation (RAG) has emerged as a leading pattern to combat hallucinations and other inaccuracies that affect large language model content generation. However, RAG needs the right data architecture around it to scale effectively and efficiently.

Unlocking the Edge: Data Streaming Goes Where You Go with Confluent

While cloud computing adoption continues to accelerate due to its tremendous value, it has also become clear that edge computing is better suited for a variety of use cases. Organizations are realizing the benefits of processing data closer to its source, leading to reduced latency, security and compliance benefits, and more efficient bandwidth utilization as well as supporting scenarios where networking has challenging constraints.

Running Apache Kafka at the Edge Requires Confluent's Enterprise-Grade Data Streaming Platform

Modern edge computing is transforming industries including manufacturing, healthcare, transportation, defense, retail, energy, and much more—pushing data management to far-reaching data sources to enable connected, low latency operations and enhanced decision making. These new use cases shift workloads to the left—requiring real-time data streaming and processing at the edge, right where the data is generated.

Confluent Is Named Microsoft's 2024 OSS on Azure Global Partner of the Year

Confluent is thrilled to be named Microsoft’s 2024 OSS on Azure Global Partner of the Year. As a three-time Partner of the Year award winner, this recognition reflects our commitment to delivering outstanding open source-based applications and infrastructure solutions on Microsoft Azure.

Amazon OpenSearch Ingestion Adds Support for Confluent Cloud as Source

Until recently, customers didn't have an easy way to send data from Confluent’s data streaming platform to Amazon OpenSearch. They had to either write custom code using AWS Lambda as an intermediary, refactor the HTTP Sink connector, or self-manage an old Elasticsearch connector version. Earlier this year, we announced the fully managed OpenSearch Sink connector, providing a seamless way to sink data from Confluent to Amazon OpenSearch.

Microservice Pitfalls: Solving the Dual-Write Problem | Designing Event-Driven Microservices

When building a distributed system, developers are often faced with something known as the dual-write problem. It occurs whenever the system needs to perform individual writes to separate systems that can't be transactionally linked. This situation creates the potential for data loss if the developer isn't careful. However, techniques such as the Transactional Outbox Pattern and Event Sourcing can be used to guard against the potential for data loss while also providing added resilience to the system.

Enhanced Cybersecurity with Real-Time Log Aggregation and Analysis

In today’s hyper-connected world, systems are more intertwined and complex than ever. Myriad data sources including applications, databases, network and IoT devices continuously generate vast amounts of data, capturing every event and interaction. Imagine harnessing this data–login logs, firewall logs, IPS logs, web logins–aggregating it, and analyzing it to create a holistic view of your entire infrastructure.

Tabs or spaces? Merge vs. rebase? Let's settle it with confluent-kafka-javascript

Tabs or spaces? Merge vs. rebase? Flink SQL vs. KStreams? Let’s Settle This is powered by a new Kafka JavaScript client from Confluent: confluent-kafka-javascript (early access). Find out how Lucia used it to make the website in the video above.

Build a scalable and up-to-date generative AI chatbot with Amazon Bedrock and Confluent Cloud for business loan specialists

In this post, we demonstrate how a robust and scalable generative artificial intelligence (GenAI) chatbot is built using Amazon Bedrock and Confluent Cloud. We walk through the architecture and implementation of this generative AI chatbot, and see how it uses Confluent's real-time event streaming capabilities along with Amazon's infrastructure to continually stay up to date with the latest advances from the AI landscape.

What is a Headless Data Architecture?

The headless data architecture. Is it a fad? Some marketecture? Or something real? In this video, Adam Bellemare takes you through the basics of the headless data architecture and why it’s beginning to emerge as its own respective pattern. Driven by the decoupling of data computation from storage, the headless data architecture provides the basis for a modular data ecosystem. Stream your data for near real-time low latency use cases, or convert it to an Iceberg table for analytical use cases.

How to Turn a REST API Into a Data Stream with Kafka and Flink

In the space of APIs for consuming up-to-date data (say, events or state available within an hour of occurring) many API paradigms exist. There are file- or object-based paradigms, e.g., S3 access. There’s database access, e.g., direct Snowflake access. Last, we have decoupled client-server APIs, e.g., REST APIs, gRPC, webhooks, and streaming APIs.

AWS and Confluent: Meeting the Requirements of Real-Time Operations

As government agencies work to improve both customer experience and operational efficiency, two tools have become critical: cloud services and data. Confluent and Amazon Web Services (AWS) have collaborated to make the move to and management of cloud easier while also enabling data streaming for real-time insights and action. We’ll be at the AWS Public Sector Summit in Washington, DC on June 26-27 to talk about and demo how our solutions work together.

Next-Gen Customer Loyalty Programs with Data Streaming

Buy 10 sandwiches, get 1 free. Classic punch cards (and fishing for them in your wallet or occasionally misplacing one) have become a thing of the past, as today's digital landscape demands more innovative solutions. Today’s customer loyalty programs are increasingly sophisticated—evolving, proliferating, and diversifying across every industry from retail, travel, and hospitality to healthcare (e.g., a discount for paying within 30 days of a hospital visit).

How to Use Flink SQL, Streamlit, and Kafka: Part 2

In part one of this series, we walked through how to use Streamlit, Apache Kafka, and Apache Flink to create a live data-driven user interface for a market data application to select a stock (e.g., SPY) and discussed the structure of the app at a high level. First, data with information on stock bid prices is moved via an Alpaca websocket, then, it’s produced to a Kafka topic in Confluent Cloud where it is also processed with Flink SQL.

86% of IT leaders say data streaming is a priority for IT investment in 2024

Confluent survey: 90% of respondents say data streaming platforms can lead to more product and service innovation in AI and ML development. 86% of respondents cite data streaming as a strategic or important priority for IT investments in 2024. For 91% of respondents, data streaming platforms are critical or important for achieving data-related goals.

How to Analyze Data from a REST API with Flink SQL

Join Lucia Cerchie in a coding walkthrough, bridging the gap between REST APIs and data streaming. Together we’ll transform the OpenSky Network's live API into a data stream using Kafka and Flink SQL. Not only do we change the REST API into a data stream in this walkthrough, but we clean up the data on the way! We use Flink SQL to make it more readable and clean, and in that way we keep more of the business logic away from the client code.

How to use Flink SQL, Streamlit, and Kafka: Part 1

Market data analytics has always been a classic use case for Apache Kafka. However, new technologies have been developed since Kafka was born. Apache Flink has grown in popularity for stateful processing with low latency output. Streamlit, a popular open source component library and deployment platform, has emerged, providing a familiar Python framework for crafting powerful and interactive data visualizations. Acquired by Snowflake in 2022, Streamlit remains agnostic with respect to data sources.

Capital One Shares Insights on Cloud-Native Streams and Governance

Businesses that are best able to leverage data have a significant competitive advantage. This is especially true in financial services, an industry in which leading organizations are in constant competition to develop the most responsive, personalized customer experiences. Often, however, legacy infrastructure, data silos, and batch systems introduce significant technical hurdles.