If your organization is using multi-tenant big data clusters (and everyone should be), do you know the usage and cost efficiency of resources in the cluster by tenants? A chargeback or showback model allows IT to determine costs and resource usage by the actual analytic users in the multi-tenant cluster, instead of attributing those to the platform (“overhead’) or IT department. This allows you to know the individual costs per tenant and set limits in order to control overall costs.
Cloudera Data Platform (CDP) supports access controls on tables and columns, as well as on files and directories via Apache Ranger since its first release. It is common to have different workloads using the same data – some require authorizations at the table level (Apache Hive queries) and others at the underlying files (Apache Spark jobs). Unfortunately, in such instances you would have to create and maintain separate Ranger policies for both Hive and HDFS, that correspond to each other.
Success for any business starts with data that is easily discoverable, understandable, and of value to the people who need it. We call this type of data “healthy data.” You should look at a wide set of measures and metrics to determine whether data is healthy or not, but at the core of all healthy data is a high level of quality.
Forbes Insights defines the modernized data center as being built to change just as much as it is built to last. One of the key pillars for a modernized data center is an agile data infrastructure. The Forbes Insights briefing explains, “This means it’s not wedded to any specific deployment method or solution set.
A favorite moment of mine is when I get to share Qlik’s vision for Active Intelligence with a customer for the first time. It usually goes like this: genuine excitement about the possibility – taking informed action in the moment from real-time data…invariably followed by many questions – where do I begin? What do I need? What about the tech stack I have already acquired?