The appearance of Hadoop and its related ecosystem was like a Cambrian explosion of open source tools and frameworks to process big amounts of data. But companies who invested early in big data found some challenges. For example, they needed engineers with expert knowledge not only on distributed systems and data processing but also on Java and the related JVM-based languages and tools.
“Be Prepared, because we are about to Unleash the Beast” is how I finished the Qlik Research demo at Qonnections 2018. I refer to Qlik’s new Cognitive Engine, as the “Beast” and here is why...
Are you familiar with Apache Beam? If not, don’t be ashamed, as one of the latest projects developed by the Apache Software Foundation and first released in June 2016, Apache Beam is still relatively new in the data processing world. As a matter of fact, it wasn’t until recently when I started to work closely with Apache Beam, that I loved to learn and learned to love everything about it.
In this blog, we are going to take a look at Apache Spark performance and tuning. This a common discussion among almost everyone that uses Apache Spark, even outside of Talend. When developing and running your first Spark jobs there are always the following questions that come to mind.
Formed in 2016, Paddy Power Betfair (PPB) is the world’s largest publicly quoted sports betting and gaming company, bringing “excitement to life” for five million customers worldwide. The merger of Paddy Power and Betfair in 2016 created an additional data challenge for an already highly data-driven organization. The merged company had to bring together 70TB of data, from dozens of sources, into an integrated platform.
Recently I found myself in a predicament that many of you can relate to, trying to update an aging application that has become too difficult to manage and too costly to continue operating. As we started to talk about what to do, we concluded it was time to start decomposing that application into smaller more manageable pieces.