Systems | Development | Analytics | API | Testing

Analytics

PII Substitution May Be the Future of Data Privacy

Unfortunately, most of us have had our sensitive data or personal information compromised at one point or another. Whether the leaked data involves credit cards, a bank account number, a social security number, or an email address, nearly everyone has been a victim of a third-party data breach. In 2020, over 155 million people in the U.S. — nearly half the country's population — experienced unauthorized data exposure.

Modernizing Data Pipelines using Cloudera Data Platform - Part 1

Data pipelines are in high demand in today’s data-driven organizations. As critical elements in supplying trusted, curated, and usable data for end-to-end analytic and machine learning workflows, the role of data pipelines is becoming indispensable. To keep up, data pipelines are being vigorously reshaped with modern tools and techniques.

Apache Ozone Metadata Explained

Apache Ozone is a distributed object store built on top of Hadoop Distributed Data Store service. It can manage billions of small and large files that are difficult to handle by other distributed file systems. As an important part of achieving better scalability, Ozone separates the metadata management among different services: Ozone Manager (OM) service manages the metadata of the namespace such as volume, bucket and keys.

Fast Forward Live: Session-based Recommender Systems

Join us live with Fast Forward Labs to discuss the recently possible in Machine Learning and AI. Being able to recommend an item of interest to a user (based on their past preferences) is a highly relevant problem in practice. A key trend over the past few years has been session-based recommendation algorithms that provide recommendations solely based on a user’s interactions in an ongoing session, and which do not require the existence of user profiles or their entire historical preferences. This report explores a simple, yet powerful, NLP-based approach (word2vec) to recommend a next item to a user. While NLP-based approaches are generally employed for linguistic tasks, here we exploit them to learn the structure induced by a user’s behavior or an item’s nature.

Future of Data Meetup: The Power of "Yes" or: How I learned to Stop Worrying and Love Governance

Full data lifecycle projects hold tremendous potential for organizations to uncover new insights and drivers of revenue and profitability. Big Data has brought the promise of doing device data capture, data enrichment, data science, and analytics at scale to enterprises. This promise also comes with challenges for developers, admins, and consumers to continuously access new data and collaborate.

Data Transformation | Snowflake & Matillion | Rise of The Data Cloud

Data transformation, 2021 data trends, how Matillion is pushing the world of software forward, and how Matillion's partnership with Snowflake is advancing the industry, are some of the topics covered in this episode of Rise of the Data Cloud, featuring Matthew Scullion, Founder, and CEO of Matillion.

Realtime data replication into BigQuery with Datastream and Dataflow

How can you replicate data from a relational database in real time? In this video, we’ll show you you can combine Datastream with Dataflow templates to replicate data from a relational database. Watch to learn how you can use this streaming analytics service in unison with Datastream to easily replicate data from Oracle to BigQuery in real time!