Systems | Development | Analytics | API | Testing

Machine Learning

5 Incredible Data Science Solutions For Real-World Problems

Data science has come a long way, and it has changed organizations across industries profoundly. In fact, over the last few years, data science has been applied not for the sake of gathering and analyzing data but to solve some of the most pertinent business problems afflicting commercial enterprises.

Iguazio Releases Version 2.8 Including Enterprise-Grade Automated Pipeline Management, Model Monitoring & Drift Detection

We’re delighted to announce the release of the Iguazio Data Science Platform version 2.8. The new version takes another leap forward in solving the operational challenge of deploying machine and deep learning applications in real business environments. It provides a robust set of tools to streamline MLOps and a new set of features that address diverse MLOps challenges.

7 Rules for Bulletproof, Reproducible Machine Learning R&D

So, if you’re a nose-to-the-keyboard developer, there’s ample probability that this analogy is outside your comfort zone … bear with me. Imagine two Olympics-level figure skaters working together on the ice, day in and day out, to develop and perfect a medal-winning performance. Each has his or her role, and they work in sync to merge their actions and fine-tune the results.

Elevating Data Science Practices for the Media, Entertainment & Advertising Industries

As more and more companies are embedding AI projects into their systems, attracted by the promise of efficiencies and competitive advantages, data science teams are feeling the growing pains of a relatively immature practice without widespread established and repeatable norms.

Building ML Pipelines Over Federated Data & Compute Environments

A Forbes survey shows that data scientists spend 19% of their time collecting data sets and 60% of their time cleaning and organizing data. All told, data scientists spend around 80% of their time on preparing and managing data for analysis. One of the greatest obstacles that make it so difficult to bring data science initiatives to life is the lack of robust data management tools.

How to Run Spark Over Kubernetes to Power Your Data Science Lifecycle

Spark is known for its powerful engine which enables distributed data processing. It provides unmatched functionality to handle petabytes of data across multiple servers and its capabilities and performance unseated other technologies in the Hadoop world. Although Spark provides great power, it also comes with a high maintenance cost. In recent years, innovations to simplify the Spark infrastructure have been formed, supporting these large data processing tasks.

The Machine Learning Collaboration Tool You'll Want to Ride Solo - User Story

I’ll admit it. I am a gushing fan of this new product from Allegro AI called Allegro Trains. I’m not sure what to call it — what noun I should attach to this creature. “Framework” and “Platform” have become, to my ears, rather meaningless jargon designed to detach suit-wearing types from their money. “Harness” is close.

MLOps for Python: Real-Time Feature Analysis

Data scientists today have to choose between a massive toolbox where every item has its pros and cons. We love the simplicity of Python tools like pandas and Scikit-learn, the operation-readiness of Kubernetes, and the scalability of Spark and Hadoop, so we just use all of them. What happens? Data scientists explore data using pandas, then data engineers use Spark to recode the same logic to scale or with live streams or operational databases.