Analytics

Prepare Your Data - The Self-Service Data Roadmap, Session 2 of 4

In this webinar, Unravel CDO and VP Engineering Sandeep Uttamchandani describes the second step for any large, data-driven project: the Prep phase. Having found the data you need in the Discover phase, it's time to get your data ready. You must structure, clean, enrich, and validate static data, and ensure that "live," updated or streamed data events are continually ready for processing.

Our Shared Responsibility Model

There’s a common misconception that as soon as a business signs up for a solution from a cloud service provider (CSP), that the CSP will automatically ensure all their dealings in that cloud environment are safe and secure. As dedicated as Cloud Service Providers are to cybersecurity, that’s simply not possible. Your cloud provider has no control over the customer data you share, the aptitude of your employees, or how you optimize your own on-premises security and firewalls.

CDP Endpoint Gateway provides Secure Access to CDP Public Cloud Services running in private networks

Cloudera Data Platform (CDP) Public Cloud allows users to deploy analytic workloads into their cloud accounts. These workloads cover the entire data lifecycle and are managed from a central multi-cloud Cloudera Control Plane. CDP provides the flexibility to deploy these resources into public or private subnets. Nearly unanimously, we’ve seen customers deploy their workloads to private subnets.

Powering Algorithmic Trading via Correlation Analysis

Finding relationships between disparate events and patterns can reveal a common thread, an underlying cause of occurrences that, on a surface level, may appear unrelated and unexplainable. The process of discovering the relationships among data metrics is known as correlation analysis. For data scientists and those tasked with monitoring data, correlation analysis is incredibly valuable when used for root cause analysis and reducing time to remediation.

Analyzing Python package downloads in BigQuery

The Google Cloud Public Datasets program recently published the Python Package Index (PyPI) dataset into the marketplace. PyPI is the standard repository for Python packages. If you’ve written code in Python before, you’ve probably downloaded packages from PyPI using pip or pipenv. This dataset provides statistics for all package downloads, along with metadata for each distribution. You can learn more about the underlying data and table schemas here.