ODSC West MLOps Keynote: Scaling NLP Pipelines at IHS Markit
The data science team at IHS Markit has been hard at work building sophisticated NLP pipelines that work at scale using the Iguazio MLOps platform and open-source MLRun framework. Today they will share their journey and provide advice for other data science teams looking to:
- Ingest, prepare, classify and index structured and unstructured data (in this case, PDFs and Images)
- Handle terabytes of data in hours, not months
- Work in one unified research and production environment to make deployment seamless
- Enable CI/CD for ML
- Run complex models that make unstructured data searchable (including computer vision)
- Allow for sharing and reuse of components across projects and teams
- Utilize auto-scaling serverless functions to abstract away infrastructure complexities
- Build rapidly, iterate faster and focus on the business logic and not the underlying infrastructure
Nick (IHS Markit) and Yaron (Iguazio) will share their approach to automating the NLP pipeline end to end. They’ll also provide details on leveraging capabilities such as Spot integration and Serving Graphs to reduce costs and improve the data science process.