Systems | Development | Analytics | API | Testing

Latest Posts

Revolutionize Your Data Experience With Cloudera on Private Cloud

In the age of the AI revolution, where chatbots, generative AI, and large language models (LLMs) are taking the business world by storm, enterprises are fast realizing the need for strong data control and privacy to protect their confidential and commercially sensitive data, while still providing access to this data for context-specific AI insights.

How Financial Services and Insurance Streamline AI Initiatives with a Hybrid Data Platform

With the emergence of new creative AI algorithms like large language models (LLM) fromOpenAI’s ChatGPT, Google’s Bard, Meta’s LLaMa, and Bloomberg’s BloombergGPT—awareness, interest and adoption of AI use cases across industries is at an all time high. But in highly regulated industries where these technologies may be prohibited, the focus is less on off the shelf generative AI, and more on the relationship between their data and how AI can transform their business.

Expanding Possibilities: Cloudera's Teen Accelerator Program Completes Its Second Year

At Cloudera, we’re known for making innovative technological solutions that drive change and impact the world. Our mission is to make data and analytics easy and accessible to everyone. And that doesn’t end with our customer base. We also aim to provide equitable access to career opportunities within data and analytics to the workforce of tomorrow.

Deploying an LLM ChatBot Augmented with Enterprise Data

The release of ChatGPT pushed the interest in and expectations of Large Language Model based use cases to record heights. Every company is looking to experiment, qualify and eventually release LLM based services to improve their internal operations and to level up their interactions with their users and customers. At Cloudera, we have been working with our customers to help them benefit from this new wave of innovation.

How to Ensure Supply Chain Security for AI Applications

Machine Learning (ML) is at the heart of the boom in AI Applications, revolutionizing various domains. From powering intelligent Large Language Model (LLM) based chatbots like ChatGPT and Bard, to enabling text-to-AI image generators like Stable Diffusion, ML continues to drive innovation. Its transformative impact advances multiple fields from genetics to medicine to finance. Without exaggeration, ML has the potential to profoundly change lives, if it hasn’t already.

HDFS Snapshot Best Practices

The snapshots feature of the Apache Hadoop Distributed Filesystem (HDFS) enables you to capture point-in-time copies of the file system and protect your important data against corruption, user-, or application errors. This feature is available in all versions of Cloudera Data Platform (CDP), Cloudera Distribution for Hadoop (CDH) and Hortonworks Data Platform (HDP).

Why Reinvent the Wheel? The Challenges of DIY Open Source Analytics Platforms

In their effort to reduce their technology spend, some organizations that leverage open source projects for advanced analytics often consider either building and maintaining their own runtime with the required data processing engines or retaining older, now obsolete, versions of legacy Cloudera runtimes (CDH or HDP).

Boosting Object Storage Performance with Ozone Manager

Ozone is an Apache Software Foundation project to build a distributed storage platform that caters to the demanding performance needs of analytical workloads, content distribution, and object storage use cases. The Ozone Manager is a critical component of Ozone. It is a replicated, highly-available service that is responsible for managing the metadata for all objects stored in Ozone. As Ozone scales to exabytes of data, it is important to ensure that Ozone Manager can perform at scale.

Unlock the Full Potential of Hive

In the realm of big data analytics, Hive has been a trusted companion for summarizing, querying, and analyzing huge and disparate datasets. But let’s face it, navigating the world of any SQL engine is a daunting task, and Hive is no exception. As a Hive user, you will find yourself wanting to go beyond surface-level analysis, and deep dive into the intricacies of how a Hive query is executed.

One Big Cluster Stuck: Environment Health Scorecard

Throughout the One Big Cluster Stuck series we’ve explored impactful best practices to gain control of your Cloudera Data platform (CDP) environment and significantly improve its health and performance. We’ve shared code, dashboards, and tools to help you on your health improvement journey. We’d like to provide one last tool.