Skip to content

June 10, 2021 | Matt Pacheco

The Path to Maximum Data Center Uptime

Of all the many ways IT organizations are measured, maximizing uptime has to be one of the most important. After all, if employees and customers can’t access mission-critical systems, none of the rest of what IT does really matters.

In the latest episode of our ‘What’s the Point?’ podcast, Matt Watts, NetApp’s Chief Technology Evangelist, shares what he sees as the greatest threats to system availability in 2021 and whether migrating to the cloud can help mitigate the risks. In this post, I’ll summarize some of what Matt told us as well as provide more of TierPoint’s perspective.

Ransomware heats up

Power outages aren’t the only major threat to uptime. Cybercrime has been one of the greatest threats to data center facility uptime for years. Not too long ago, our attention was solely focused on Distributed Denial of Service (DDoS) and related attacks. Ransomware existed, but it wasn’t terribly sophisticated, and the demands amounted to pocket change for many of the victims.

The ransomware threat landscape has intensified in the last 12 to 24 months. Ransomware attacks have gotten much more sophisticated and are often combined with other forms of attack. For example, cybercriminals might distract a company with a DDoS attack, allowing them to plant ransomware deep within the system. As Matt notes, the latest ransomware is often designed to lay dormant, so even if you’ve upgraded your security posture recently, the ground has already been laid for a future attack.

Ransomware demands have also risen and are likely to continue to rise as more companies are giving in and paying attackers what they ask. When ransomware was first recognized as a problem, the FBI strongly urged organizations not to pay the ransom because it only encouraged more attacks. Then, ransomware attackers started targeting vulnerable organizations with a lot to lose: schools, government agencies, hospitals, etc. The FBI understandably softened its advice against paying the ransom.

The situation came to a head this year when attackers shut down Colonial Pipeline’s systems and reportedly demanded five million dollars (USD). This led to gas shortages along the Eastern seaboard, the likes of which hadn’t been seen since the 1970s. Colonial paid the ransom, and services quickly returned to normal. Unfortunately, many security experts believe we’ll see attacks on our national infrastructure escalate, especially given the size of the payday and the apparent lack of a coordinated response between the company and national security experts.

On a much less sinister note, Matt says human error continues to be a primary cause of unplanned downtime. People make mistakes, and they always will. Removing manual processes can help eliminate errors, but most organizations are only starting to leverage automation in any real way.

From our perspective, as IT environments become increasingly complex, skills gaps grow wider. Under tight budget constraints, most IT organizations prefer to develop from within instead of adding headcount.

That’s understandable, but it also adds stress to an already overburdened IT department. With only so many hours in the day, IT professionals rush from task to task. Mistakes get made, and important details are easily overlooked.

Can cloud computing help maximize uptime?

Matt believes that it can, but with a caveat. As he explains, the reality is that what cloud vendors spend on security is way beyond the scope of most organizations. So, if you want to improve data center security, the fastest way may be to migrate workloads to the cloud.

However, IT leaders need to understand that they still have skin in the game as most cloud providers operate under a shared responsibility model. The provider is typically responsible for the cloud itself and the resources used by the provider’s customer. The customer is responsible for the security of the applications and data that reside in the provider’s cloud. Before migrating workloads to the cloud – any cloud – IT leaders must make sure they understand who owns the responsibility for each security element.

Being off-prem, cloud-based resources can also reduce some manual errors, e.g., someone accidentally turning off a server, power or cooling. However, today’s hybrid environments are more complex than ever, with organizations managing a mix of on-prem data center infrastructure and cloud-based resources across multiple different types of clouds: private hosted, AWS, Azure, GCP, etc. Each of these environments requires a unique skill set, so the potential for human error may increase as more workloads are migrated to the cloud.

As Matt notes, automation can help by taking the execution responsibility out of human hands. However, even with AI, IT systems are only automated to do what they are told to do. Many IT leaders still prefer to have the system suggest actions while letting the IT professional manage the execution.

How to maximize uptime with cloud expertise

At TierPoint, we partner with NetApp to offer managed cloud solutions that can support your digital transformation and unleash the full potential of your data. Learn more about our cloud, security and disaster recovery solutions to see how you can achieve optimal uptime for your infrastructure.

Strategic Guide to IT Security_2020 edition

Subscribe to the TierPoint blog

We’ll send you a link to new blog posts whenever we publish, usually once a week.