CDP Operational Database (COD) is an autonomous transactional database powered by Apache HBase and Apache Phoenix. It is one of the main Data Services that runs on Cloudera Data Platform (CDP) Public Cloud. You can access COD right from your CDP console. With COD, application developers can now leverage the power of HBase and Phoenix without the overheads that are often related to deployment and management.
I’ve had the pleasure to participate in a few Commercial Lines insurance industry events recently and as a prior Commercial Lines insurer myself, I am thrilled with the progress the industry is making using data and analytics. However, I do not think Commercial Lines insurance gets the credit it deserves for the industry-leading role it has played in analytics. Commercial Lines truly is an “uber industry” with respect to data.
As Halloween night quickly approaches, there is only one question on every kid’s mind: how can I maximize my candy haul this year with the best possible candy? This kind of question lends itself perfectly to data science approaches that enable quick and intuitive analysis of data across multiple sources.
There are many good uses of data. With data, we can monitor our business, the overall business, or specific business units. We can segment based on the customer verticals or whether they run in the public or private cloud. We can understand customers better, see usage patterns and main consumption drivers. We can find customer pain points, see where they get stuck, and understand how different bugs affect them.
With the launch of the Cloudera Public Cloud 7.2.12, the Streams Messaging for Data Hub deployments have gotten some interesting new features! From this release, Streams Messaging templates will support scaling with automatic rebalancing allowing you to grow or shrink your Apache Kafka cluster based on demand.
With the latest release of Cloudera DataFlow for the Public Cloud (CDF-PC) we added new CLI capabilities that allow you to automate data flow deployments, making it easier than ever before to incorporate Apache NiFi flow deployments into your CI/CD pipelines. This blog post walks you through the data flow development lifecycle and how you can use APIs in CDP Public Cloud to fully automate your flow deployments.
We are just over one week until the UN Climate Change Conference of the Parties, COP26 convenes in Glasgow. As governments gather to push forward climate and renewable energy initiatives aligned with the Paris Agreement and the UN Framework Convention on Climate Change, financial institutions and asset managers will monitor the event with keen interest.
Airflow has been adopted by many Cloudera Data Platform (CDP) customers in the public cloud as the next generation orchestration service to setup and operationalize complex data pipelines. Today, customers have deployed 100s of Airflow DAGs in production performing various data transformation and preparation tasks, with differing levels of complexity.
As organizations wrangle with the explosive growth in data volume they are presented with today, efficiency and scalability of storage become pivotal to operating a successful data platform for driving business insight and value. Apache Ozone is a distributed, scalable, and high performance object store, available with Cloudera Data Platform Private Cloud.
We’re excited to announce CDP Public Cloud Regional Control Plane in Australia and Europe. This addition will extend CDP Hybrid capabilities to customers in industries with strict data protection requirements by allowing them to govern their data entirely in-region.
Quite often, the digital natives of the family — you — have to explain to the analog fans of the family what PDFs are, how to use a hashtag, a phone camera, or a remote. Imagine if you had to explain what machine learning is and how to use it. There’s no need to panic. Cloudera produced a series of ebooks — Production Machine Learning For Dummies, Apache NiFi For Dummies, and Apache Flink For Dummies (coming soon) — to help simplify even the most complex tech topics.
According to Domo, on average, every human created at least 1.7 MB of data per second in 2020. That’s a lot of data. For enterprises the net result is an intricate data management challenge that’s not about to get any less complex anytime soon. Enterprises need to find a way of getting insights from this vast treasure trove of data into the hands of the people that need it. For relatively low amounts of data, public cloud is a possible path for some organizations.
At the end of May, we released the second version of Cloudera SQL Stream Builder (SSB) as part of Cloudera Streaming Analytics (CSA). Among other features, the 1.4 version of CSA surfaced the expressivity of Flink SQL in SQL Stream Builder via adding DDL and Catalog support, and it greatly improved the integration with other Cloudera Data Platform components, for example via enabling stream enrichment from Hive and Kudu.
Modak, a leading provider of modern data engineering solutions, is now a certified solution partner with Cloudera. Customers can seamlessly automate migration to Cloudera’s cloud-based enterprise platform CDP from on-prem deployments and dynamically auto-scale cloud services with Cloudera Data Engineering (CDE)’s integration with Modak Nabu™.
Apache Impala is a massively parallel in-memory SQL engine supported by Cloudera designed for Analytics and ad hoc queries against data stored in Apache Hive, Apache HBase and Apache Kudu tables. Supporting powerful queries and high levels of concurrency Impala can use significant amounts of cluster resources. In multi-tenant environments this can inadvertently impact adjacent services such as YARN, HBase, and even HDFS.
In this blog, I will demonstrate the value of Cloudera DataFlow (CDF), the edge-to-cloud streaming data platform available on the Cloudera Data Platform (CDP), as a Data integration and Democratization fabric. Within the context of a data mesh architecture, I will present industry settings / use cases where the particular architecture is relevant and highlight the business value that it delivers against business and technology areas.
If your organization is using multi-tenant big data clusters (and everyone should be), do you know the usage and cost efficiency of resources in the cluster by tenants? A chargeback or showback model allows IT to determine costs and resource usage by the actual analytic users in the multi-tenant cluster, instead of attributing those to the platform (“overhead’) or IT department. This allows you to know the individual costs per tenant and set limits in order to control overall costs.
Cloudera Data Platform (CDP) supports access controls on tables and columns, as well as on files and directories via Apache Ranger since its first release. It is common to have different workloads using the same data – some require authorizations at the table level (Apache Hive queries) and others at the underlying files (Apache Spark jobs). Unfortunately, in such instances you would have to create and maintain separate Ranger policies for both Hive and HDFS, that correspond to each other.