Systems | Development | Analytics | API | Testing

Serverless

AWS Regions: Build, Run, Scale on AWS with Koyeb

Today, we are announcing AWS regions on Koyeb for businesses. The fastest way to build, run, and scale your apps on AWS infrastructure. Over the last months, we've gotten more and more requests from businesses established on AWS to have a way to deploy Koyeb services on AWS infrastructure to: Our platform's core technology is cloud-agnostic and can be operated on top of anything, from high-performance bare metal servers to IaaS providers.

Volumes Technical Preview: Blazing-fast NVMe SSD for Your Data

Ready? Day three of Koyeb launch week is on! When you deploy your apps on Koyeb, your data is on ephemeral disks. While this works great for stateless applications, this is challenging for stateful workloads like databases. Just in time to save the day, we are launching the technical preview of Volumes! You can now use Volumes to persist data between deployments, restarts, and even when services are paused. We're gradually onboarding users to ensure the best experience for everyone.

GPUs Public Preview: Run AI workloads on H100, A100, L40S, and more

Welcome to day two of Koyeb launch week. Today we're announcing not one, but two major pieces of news: Our lineup ranges from 20GB to 80GB of vRAM with A100 and H100 cards. You can now run high-precision calculations with FP64 instructions support and a gigantic 2TB/s of bandwidth on the H100. With prices ranging from $0.50/hr to $3.30/hr and always billed by the second, you'll be able to run training, fine-tuning, and inference workloads with a card adapted to your needs.

Autoscaling GA: Scale Fast, Sleep Well, Don't Break the Bank

We are thrilled to kickstart this first launch week with autoscaling - now generally available! Our goal is to offer a global and serverless experience for your deployments. Autoscaling makes this vision a reality. Say goodbye to overpaying for unused resources and late-night alerts for unhealthy instances or underprovisioned resources! During the autoscaling public preview, we received key feedback around scaling factors.

Develop a Serverless TypeScript API on AWS ECS with Fargate

AWS Fargate is a serverless compute engine that allows you to run containers without managing servers. With Fargate, you no longer have to provision clusters of virtual machines to run ECS containers: this is all done for you. Fargate has an Amazon ECS construct that can host an API. In this take, we will build a Fargate service using the AWS CDK, put the API in a docker image, and then host it inside Amazon ECS. The API will be a pizza API and we'll store the data in a DynamoDB table.

Best LLM Inference Engines and Servers to Deploy LLMs in Production

AI applications that produce human-like text, such as chatbots, virtual assistants, language translation, text generation, and more, are built on top of Large Language Models (LLMs). If you are deploying LLMs in production-grade applications, you might have faced some of the performance challenges with running these models. You might have also considered optimizing your deployment with an LLM inference engine or server.

A Software Engineer's Tips and Tricks #4: Collaborating on Visual Studio Code with Live Share

Hey there! We're back for our third edition of Tips and Tricks, our new mini series where we share some helpful insights and cool tech that we've stumbled upon while working on technical stuff. Catch up on the previous posts: All of our posts are super short reads, just a couple of minutes tops. If you don’t like one of the posts, no problem! Just skip it and check out the next one. If you enjoy any of the topics, I encourage you to check out the "further reading" links.

The engineering behind autoscaling with HashiCorp's Nomad on a global serverless platform

There are several ways to handle load spikes on a service. However, these methods are not cost-effective: you either pay for resources you don't use, or you risk not having enough resources to handle the load. Fortunately, there is a third way: horizontal autoscaling. Horizontal autoscaling is the process of dynamically adjusting the number of instances of a service based on the current load. This way, you only pay for the resources you use, and you can handle load spikes without any manual intervention.

A Software Engineer's Tips and Tricks #3: CPU Utilization Is Not Always What It Seems

Hey there! We're back for our third edition of Tips and Tricks. As we said in our first posts on Drizzle ORM and Template Databases in PostgreSQL, our new Tips and Tricks mini blog series is going to share some helpful insights and cool tech that we've stumbled upon while working on technical stuff. Today's topic is short and sweet. It'll be on CPU utilization and what that metric indicates. If you enjoy it and want to learn more, I encourage you to check out the "further reading" links.