Systems | Development | Analytics | API | Testing

Latest Blogs

How to Create a Python Stack

All programming languages provide efficient data structures that allow you to logically or mathematically organize and model your data. Most of us are familiar with simpler data structures like lists (or arrays) and dictionaries (or associative arrays), but these basic array-based data structures act more as generic solutions to your programming needs and aren’t really optimized for performance on custom implementations. There’s much more than programming languages bring to the table.

A pivotal paradox: 6 lessons learned managing a fully remote team

A mere few months ago the majority of the world was forced to change drastically, including the move into a ‘fully remote’ mode of office work. As reality was bearing down upon us, tech managers and CEOs everywhere were huddled together trying to figure out how to not only make it work, but work well.

Overview of the Operational Database performance in CDP

This article gives you an overview of Cloudera’s Operational Database (OpDB) performance optimization techniques. Cloudera’s Operational Database can support high-speed transactions of up to 185K/second per table and a high of 440K/second per table. On average, the recorded transaction speed is about 100K-300K/second per node. This article provides you an overview of how you can optimize your OpDB deployment in either Cloudera Data Platform (CDP) Public Cloud or Data Center.

Tech Tip: Pointing Your Automated Tests to Sauce

So you’ve realized the benefits of test automation. Through your own research, or perhaps a small proof of concept, you’ve realized removing once-manual quality processes can accelerate release cycles and improve your user experience. You’ve built a small suite of tests, and the benefits are real. The next step in your journey, you realize, is to achieve the real value of automation, which means running it continuously and at scale.

Analyze Your Load Tests

OctoPerf’s report engine provides many graphs to sort and presents test metrics in a comprehensive way. We’ve tried to improve it over the years so that you can access critical information very quickly. But requirements vary from one project to the other. In this post we will look at how you can configure the report to show you preferred metrics, and also all the shortcuts you can take to achieve this goal.

Top 10 API Security Threats Every API Team Should Know

As more and more data is exposed via APIs either as API-first companies or for the explosion of single page apps/JAMStack, API security can no longer be an afterthought. The hard part about APIs is that it provides direct access to large amounts of data while bypassing browser precautions. Instead of worrying about SQL injection and XSS issues, you should be concerned about the bad actor who was able to paginate through all your customer records and their data.

Ask questions to BigQuery and get instant answers through Data QnA

Today, we’re announcing Data QnA, a natural language interface for analytics on BigQuery data, now in private alpha. Data QnA helps enable your business users to get answers to their analytical queries through natural language questions, without burdening business intelligence (BI) teams. This means that a business user like a sales manager can simply ask a question on their company’s dataset, and get results back that same way.

Eliminate the pitfalls on your path to public cloud

As organizations look to get smarter and more agile in how they gain value and insight from their data, they are now able to take advantage of a fundamental shift in architecture. In the last decade, as an industry, we have gone from monolithic machines with direct-attached storage to VMs to cloud. The main attraction of cloud is due to its separation of compute and storage – a major architectural shift in the infrastructure layer that changes the way data can be stored and processed.

How to run queries periodically in Apache Hive

In the lifecycle of a data warehouse in production, there are a variety of tasks that need to be executed on a recurring basis. To name a few concrete examples, scheduled tasks can be related to data ingestion (inserting data from a stream into a transactional table every 10 minutes), query performance (refreshing a materialized view used for BI reporting every hour), or warehouse maintenance (executing replication from one cluster to another on a daily basis).