Systems | Development | Analytics | API | Testing

May 2024

Best LLM Inference Engines and Servers to Deploy LLMs in Production

AI applications that produce human-like text, such as chatbots, virtual assistants, language translation, text generation, and more, are built on top of Large Language Models (LLMs). If you are deploying LLMs in production-grade applications, you might have faced some of the performance challenges with running these models. You might have also considered optimizing your deployment with an LLM inference engine or server.

A Software Engineer's Tips and Tricks #4: Collaborating on Visual Studio Code with Live Share

Hey there! We're back for our third edition of Tips and Tricks, our new mini series where we share some helpful insights and cool tech that we've stumbled upon while working on technical stuff. Catch up on the previous posts: All of our posts are super short reads, just a couple of minutes tops. If you don’t like one of the posts, no problem! Just skip it and check out the next one. If you enjoy any of the topics, I encourage you to check out the "further reading" links.

The engineering behind autoscaling with HashiCorp's Nomad on a global serverless platform

There are several ways to handle load spikes on a service. However, these methods are not cost-effective: you either pay for resources you don't use, or you risk not having enough resources to handle the load. Fortunately, there is a third way: horizontal autoscaling. Horizontal autoscaling is the process of dynamically adjusting the number of instances of a service based on the current load. This way, you only pay for the resources you use, and you can handle load spikes without any manual intervention.

A Software Engineer's Tips and Tricks #3: CPU Utilization Is Not Always What It Seems

Hey there! We're back for our third edition of Tips and Tricks. As we said in our first posts on Drizzle ORM and Template Databases in PostgreSQL, our new Tips and Tricks mini blog series is going to share some helpful insights and cool tech that we've stumbled upon while working on technical stuff. Today's topic is short and sweet. It'll be on CPU utilization and what that metric indicates. If you enjoy it and want to learn more, I encourage you to check out the "further reading" links.