Systems | Development | Analytics | API | Testing

Ozone Write Pipeline V2 with Ratis Streaming

Cloudera has been working on Apache Ozone, an open-source project to develop a highly scalable, highly available, strongly consistent distributed object store. Ozone is able to scale to billions of objects and hundreds petabytes of data. It enables cloud-native applications to store and process mass amounts of data in a hybrid multi-cloud environment and on premises.

Modern Data Architectures | Data Mesh, Data Fabric, & Data Lakehouse

For years, companies have viewed data the wrong way. They see it as the byproduct of a business interaction and this data often ends up collecting dust in centralized silos governed by data teams who lack the expertize to understand its true value. Cloudera is ushering in a new era of data architecture by allowing experts to organize and manage their own data at the source. Data mesh brings all your domains together so each team can benefit from each other’s data.

When Private Cloud is the Right Fit for Public Sector Missions

It’s no secret that IT modernization is a top priority for the US federal government. A quick trip in the congressional time machine to revisit 2017’s Modernizing Government Technology Act surfaces some of the most salient points regarding agencies’ challenges: In the private sector, excluding highly regulated industries like financial services, the migration to the public cloud was the answer to most IT modernization woes, especially those around data, analytics, and storage.

Protect Your Assets and Your Reputation in the Cloud

A recent headline in Wired magazine read “Uber Hack’s Devastation Is Just Starting to Reveal Itself.” There is no corporation that wants that headline and the reputational damage and financial loss it may cause. In the case of Uber it was a relatively simple attack using an approach called Multi Factor Authentication (MFA) fatigue. This is when an attacker takes advantage of authentication systems that require account owners to approve a log in.

Using Apache Solr REST API in CDP Public Cloud

The Apache Solr cluster is available in CDP Public Cloud, using the “Data exploration and analytics” data hub template. In this article we will investigate how to connect to the Solr REST API running in the Public Cloud, and highlight the performance impact of session cookie configurations when Apache Knox Gateway is used to proxy the traffic to Solr servers. Information in this blog post can be useful for engineers developing Apache Solr client applications.

Future of Data Meetup: Enrich Your Data Inline with Apache NiFi

In this meetup, we’ll look at the different options for enriching your data using Apache NiFi. When and why would we prefer using NiFi for enrichment over a potentially more holistic solution, like Flink or Spark? What are the limitations? And how can we get the best of both worlds, performing data enrichment with NiFi when it makes sense and using our CEP engine when that makes the most sense? Join John Kuchmek and Mark Payne to find out!

Accelerating Projects in Machine Learning with Applied ML Prototypes

It’s no secret that advancements like AI and machine learning (ML) can have a major impact on business operations. In Cloudera’s recent report Limitless: The Positive Power of AI, we found that 87% of business decision makers are achieving success through existing ML programs. Among the top benefits of ML, 59% of decision makers cite time savings, 54% cite cost savings, and 42% believe ML enables employees to focus on innovation as opposed to manual tasks.

10 Keys to a Secure Cloud Data Lakehouse

Enabling data and analytics in the cloud allows you to have infinite scale and unlimited possibilities to gain faster insights and make better decisions with data. The data lakehouse is gaining in popularity because it enables a single platform for all your enterprise data with the flexibility to run any analytic and machine learning (ML) use case. Cloud data lakehouses provide significant scaling, agility, and cost advantages compared to cloud data lakes and cloud data warehouses.

Reskilling Against the Risk of Automation

Demand for both entry-level and highly skilled tech talent is at an all-time high, and companies across industries and geographies are struggling to find qualified employees. And, with 1.1 billion jobs liable to be radically transformed by technology in the next decade, a “reskilling revolution” is reaching a critical mass.