Analytics

Ozone Write Pipeline V2 with Ratis Streaming

Cloudera has been working on Apache Ozone, an open-source project to develop a highly scalable, highly available, strongly consistent distributed object store. Ozone is able to scale to billions of objects and hundreds petabytes of data. It enables cloud-native applications to store and process mass amounts of data in a hybrid multi-cloud environment and on premises.

What is a data catalog?

Metadata is data about data. Think of names, creation dates, and any other contextual information that describes the data in your data lake or data warehouse. All this metadata adds meaningful information to your datasets. This improves the data’s usability and makes data a real asset for your organization. A catalog of all the metadata makes search and retrieval of any data possible.

Create Beautiful Business Insights With Yellowfin Using Data from APILayer

Yellowfin analytics has a broad range of capabilities to help enterprise organizations and product owners solve the most pressing analytical dashboards and reporting needs. If you've been using Yellowfin for a while, you know how great it is to tell stories with data, work together, and make beautiful, easy-to-use dashboards that let more people see, understand, and act on their data.

Is self-service BI attainable? Benefits and historical concerns of self-service BI

Whether you call it self-service analytics or self-service business intelligence (BI), there has been much discussion about the perils, myths, promises, and prospects of successfully building self-service capability. Going forward, I’ll use the phrase “self-service BI” but you are welcome to substitute the words “self-service analytics”.So, is self-service BI actually attainable or just snake oil?

Using Snowpark For Python And XGBoost To Run 200 Forecasts In 10 Minutes

Snowpark for Python, now generally available, empowers the growing Python community of data scientists, data engineers, and developers to build secure and scalable data pipelines and machine learning (ML) workflows directly within Snowflake—taking advantage of Snowflake’s performance, elasticity, and security benefits, which are critical for production workloads. Using user-defined table functions (UDTFs) and the new Snowpark-optimized warehouse with higher memory, users can run large-scale model training workloads using popular open-source libraries available through Anaconda integration.