Systems | Development | Analytics | API | Testing

Latest Posts

Benchmarking Ozone: Cloudera's next-generation Storage for CDP

Apache Hadoop Ozone was designed to address the scale limitation of HDFS with respect to small files and the total number of file system objects. On current data center hardware, HDFS has a limit of about 350 million files and 700 million file system objects. Ozone’s architecture addresses these limitations[4]. This article compares the performance of Ozone with HDFS, the de-facto big data file system.

Searcher Seismic is utilizing seismic data for the oil and gas industry providing a map to de-risk exploration

In today’s age of technology, the processing of seismic data requires powerful computers, talented researchers, software, and skills. For the Oil and Gas Industry, its paramount to making strategic business decisions. Seismic data accurately helps to plan for wells, reduce the need for further exploration, and minimizes the impact on the environment.

Disk and Datanode Size in HDFS

This blog discusses answers to questions like what is the right disk size in datanode and what is the right capacity for a datanode. A few of our customers have asked us about using dense storage nodes. It is certainly possible to use dense nodes for archival storage because IO bandwidth requirements are usually lower for cold data. However the decision to use denser nodes for hot data must be evaluated carefully as it can have an impact on the performance of the cluster.

How Florida State University is Boosting Student Success and Addressing Data Challenges

For public universities, metrics such as retention rate and graduation rate are important indicators for standing out in the competitive landscape. These success metrics are paramount to bringing in more students, making them successful, and continuing to grow a strong alumni network.

How Cloudera Enables R Users to Optimize Their Data Science and Machine Learning Workflows

This week, R users from around the world convene in San Francisco for rstudio::conf 2020. With a packed agenda of new package announcements and case studies highlighting successful applications of R across different industries, it’s evident that R and the ecosystem of tools around it make up a vital part of the data science and machine learning landscape.

Deep Learning for Anomaly Detection

We are excited to release Deep Learning for Anomaly Detection, the latest applied machine learning research report from Cloudera Fast Forward Labs. Anomalies, often referred to as outliers, are data points or patterns in data that do not conform to a notion of normal behavior. Anomaly detection, then, is the task of finding those patterns in data that do not adhere to expected norms. The capability to recognize or detect anomalous behavior can provide highly useful insights across industries.

Understanding Healthcare's New Industry Imperative: Data Chain of Custody

One of the first recorded medical devices was the stethoscope in 1816. Fast forward more than a century to 2019, where the world witnessed the creation of an award-winning multi-sensor, implantable cardiac device able to predict potential heart failure weeks in advance. The data and analytics streamed and analyzed from new connected devices are transforming healthcare as we know it. However, a real challenge in this environment is the sheer volume and scope of data that must be managed and protected.

Insurance in 2020 & Beyond - Learning from the past decade to plan for the next

Like many other people, I used time over the recent holidays to clean out and organize my digital files. In that process, I finally trashed the speaking notes for a panel I participated in at SMA’s (Strategy Meets Action) first summit in 2012 when I worked at a large global insurer. During that session, a gentleman in the audience asked me what I thought about “big data” and its implications for Insurance.

Real-time log aggregation with Flink Part 1

Many of us have experienced the feeling of hopelessly digging through log files on multiple servers to fix a critical production issue. We can probably all agree that this is far from ideal. Locating and searching log files is even more challenging when dealing with real-time processing applications where the debugging process itself can be extremely time-sensitive.

These Two Trends Will Put an End to Business as Usual in 2020

Where did the last decade go? Seems like it was just 2010 and I was writing about the future of business in 2020, well it is nowhere! I’ve spent much of my career in finance/accounting and management consulting and the last decade+ helping companies link their business and technology strategies with a focus on data and analytics. Where will we head in 2020 and this next decade?