Systems | Development | Analytics | API | Testing

Unravel

Managing Cost & Resources Usage for Spark

Spark jobs require resources - and those resources? They can be pricey. If you're looking to speed up completion times, optimize costs, and reduce resource usage for your Spark jobs, this is the webinar for you.For Spark jobs running on-premises, optimizing resource usage is key. For Spark jobs running in the cloud, for example on Amazon EMR or Databricks, adding resources is a click away - but it’s an expensive click, so cost management is critical.

Troubleshooting Databricks

The popularity of Databricks is rocketing skyward, and it is now the leading multi-cloud platform for Spark and analytics workloads, offering fully managed Spark clusters in the cloud. Databricks is fast and organizations generally refactor their applications when moving them to Databricks. The result is strong performance. However, as usage of Databricks grows, so does the importance of reliability for Databricks jobs - especially big data jobs such as Spark workloads. But information you need for troubleshooting is scattered across multiple, voluminous log files.

Spark Troubleshooting Solutions - DataOps, Spark UI or logs, Platform or APM Tools

Spark is known for being extremely difficult to debug. But this is not all Spark’s fault. Problems in running a Spark job can be the result of problems with the infrastructure Spark is running on, inappropriate configuration of Spark, Spark issues, the currently running Spark job, other Spark jobs running at the same time – or interactions among these layers.

Migrating Data Pipelines from Enterprise Schedulers to Airflow

At Airflow Summit 2021, Unravel’s co-founder and CTO, Shivnath Babu and Hari Nyer, Senior Software Engineer, delivered a talk titled Lessons Learned while Migrating Data Pipelines from Enterprise Schedulers to Airflow. This story, along with the slides and videos included in it, comes from the presentation.

Driving Data Governance and Data Products at ING Bank France

In this episode of Data+AI Battlescars, Sandeep Uttamchandani, Unravel Data’s CDO, speaks with Samir Boualla, CDO at ING Bank France, one of the largest banks in the world. They cover his battlescars in Driving Data Governance Across Business Teams and Building Data Products. At ING Bank France, Samir is the Chief Data Officer. He’s responsible for several teams that govern, develop, and manage data infrastructure and data assets to deliver value to the business.

Spark Troubleshooting, Part 1 - Ten Challenges

“The most difficult thing is finding out why your job is failing, which parameters to change. Most of the time, it’s OOM errors…” Jagat Singh, Quora Spark has become one of the most important tools for processing data – especially non-relational data – and deriving value from it. And Spark serves as a platform for the creation and delivery of analytics, AI, and machine learning applications, among others.