Deploying models is becoming easier every day, especially thanks to excellent tutorials like Transformers-Deploy. It talks about how to convert and optimize a Hugging face model and deploy it on the Nvidia Triton inference server. Nvidia Triton is an exceptionally fast and solid tool and should be very high on the list when searching for ways to deploy a model. If you haven’t read the blogpost yet, do it now first, I will be referencing it quite a bit in this blogpost.
ClearML is now officially integrated into the NVIDIA TAO Toolkit 🎉. For those of you that don’t know yet, the NVIDIA TAO Toolkit, built on TensorFlow and PyTorch, is a low-code version of the NVIDIA TAO framework that accelerates the model training process by abstracting away the AI/deep learning framework complexity.