Which IT security roles fall to the ops team?

What priority should an IT operations team give to security tasks?

To help secure data and applications, an IT ops team needs to do much more than put up firewalls and apply other traditional security measures. Here’s where to begin.

Chris Moyer

VP of Technology – ACI Information Group – SearchCloudApplications

Specter, Meltdown and similar zero-day vulnerabilities are the scary sorts of things that keep operations teams — especially those with IT security roles — awake at night. Fortunately for most cloud-based companies, those vulnerabilities can be addressed with the latest software updates or an adjustment to your Amazon Elastic Compute Cloud machine images. Organizations that run on serverless platforms have it even easier, needing only to wait for Amazon, Microsoft or Google to apply the patches to the underlying hardware.

Still, these vulnerabilities account for only a small fraction of the attack surface that modern-day operations teams must watch over.

To take their IT security roles seriously, these staffers need to be concerned with stolen credentials and corrupted code repositories, among numerous other threats. Custom alerts can help ops teams detect abnormal conditions, and software-testing procedures can be adjusted to include security risk detection. There’s plenty for ops to do.

Consider a Node.js application launched using the serverless platform on AWS Lambda. Every dependency included in an application could potentially become compromised and lead to malicious code being installed on your site. Such a calamity could result in the loss of your users’ data or your intellectual property.

Systems for continuous integration (CI) and continuous delivery (CD) allow developers to iterate much faster. This is generally a good thing, since it produces much smaller deployments and generally results in fewer bugs. Unfortunately, these CI/CD tools tend to rely on third-party systems to gather packages and requirements, and those repositories can become compromised.

One recent outage, for example, showed a critical vulnerability in the NPM code repository. Supposedly safe packages were replaced with nearly identical ones containing attack code. Since NPM packages can include build and deployment hooks as well, this could do anything from stealing AWS credentials used to deploy your application to harvesting credit card numbers and passwords. Even packages you’ve completely validated as safe and have been using for years could have been compromised during a new deployment.

Previously, operations teams could mitigate some of this risk simply by controlling the hardware. Also, they could put in place specialized firewalls to prevent suspicious network traffic from causing issues, such as a site trying to upload credit card numbers to a known malicious IP address. With the move to cloud serverless technologies, much of this control has been taken away from ops, even while their IT security roles remain.

Adding Detection to the CI/CD Process

For teams with well-defined CI/CD practices, the build process should already have automated unit testing in place for bugs. It’s a natural progression to also require that build step to add in tests for security vulnerabilities. Many tools and organizations can help with this sort of thing, including Snyk and the Open Web Application Security Project, or OWASP. Ops teams are typically responsible for setting up these types of tools, and many of them can be set to run one-time scans before a build, as well as perform ongoing checks of production systems.

Additionally, ops teams with IT security roles or concerns may choose to create a custom in-house repository. For example, NPM Enterprise allows companies to include a feature-compatible version of NPM. This can be maintained by an internal team, behind a firewall, and it prevents the installation of third-party plug-ins that aren’t pre-approved. This can lead to faster, more secure and more reliable deployments.

Anomaly detection and manual approval of suspicious requests can be useful in preventing unwanted activity.

Some attacks result from things that cannot be identified before a system is in production. For example, users’ accounts can be breached. Or, even worse, a developer’s account can be compromised.

With AWS, it’s critically important that each service has strict identity permissions. For example, a user’s API probably shouldn’t have the ability to create new Elastic Compute Cloud instances or to delete users. Developers should be brought along slowly and not granted write access until after they’ve proven they aren’t going to accidentally wipe out the entire database. And no one should have root AWS credentials, except maybe the CTO.

It’s also important to make sure all identity and access management (IAM) users are required to have multifactor authentication (MFA) tokens set up, and it may be useful to turn on S3 versioning as well as require an MFA token to delete S3 objects.

It’s always a good idea to back up critical data in another location — and encrypt it, if it’s sensitive. It’s important to note, however, that when you store backups in different locations, you’re increasing the exposure of that data to attackers. More backups are not always better.

Most cloud providers offer managed backup options. Those should always be the first choice.

Monitor for unusual activity

Even with strict policies in place and personnel focused on IT security roles, it’s inevitable that something will go wrong. Credentials will either be leaked accidentally or be exposed through some malicious code installed on someone’s computer.

It’s important for operations teams to monitor cloud activity. For AWS users, this is typically done via CloudWatch. In Azure, consider Operational Insights, Application Insights or other monitoring tools.

It’s also worth setting up custom alarms. These help you spot abnormalities, such as when an IAM account performs operations that deviate from normal patterns; that unusual behavior could indicate a system compromise.

It can be trickier to identify issues with end users. While some problems can be obvious — such as when a user tries to log into their account from China an hour after logging in from California — other situations aren’t as readily apparent. Anomaly detection and manual approval of suspicious requests can be useful in preventing unwanted activity. Several services can help manage these types of rules, and most authentication services, such as Auth0, already provide built-in anomaly detection.

Web application firewalls can also provide added protection to your web-based access points by blocking traffic using pre-defined rules from communities, as well as custom logic based on patterns your operations team identifies.

For example, if someone is trying to access a wp-admin URL on your custom in-house application, chances are they’re trying to hack into something. Many of the targeted vulnerabilities are for WordPress and applications in the PHP scripting language, so operations teams should be on the lookout for requests for suspicious URLs and be ready to block all traffic from offending IP addresses.

This was last published in May 2018

Application performance tuning tips and tricks

What common causes lead to slow application performance?

Tune hardware, resource allocations and prioritizations to make applications perform their best. Just don’t do it without tracking the results.

Brian Kirsch

IT Architect, Instructor – Milwaukee Area Technical College – SearchServerVirtualization

A slow application is a challenge for IT operations teams, as the problem generally comes with about as much background information as a warning light on a car dashboard that just says Service.

Application performance tuning covers a wide range of possible options. You might hear calls to fix the code, but often, the issue slowing down an application isn’t as deep as faults in the source code. Similarly, routine server and application maintenance — clean up the application install points and remove old logs and temporary files — helps but won’t correct larger issues. Application performance tuning comes down to the things that surround the application and matching the right IT resources to a given application.

Not all applications run the same on the same hardware, virtual or physical. It’s a myth that slow applications simply need more resources. In fact, too liberal resource allocation can do damage to the overall environment, causing unnecessary contention. For example, a single-threaded application won’t run faster with multiple CPUs, no matter how many it gets, and other workloads in the virtual environment might end up starved for processing power. Overallocation is similar to overeating — not good for your health — and it’s expensive and time-consuming as well. Sometimes, more expenditures are worth it: Choices such as network bandwidth or storage tiers are much deeper than just a cost question if the application fails to function properly due to constraints.

Performance tuning starts with resources

Approach application performance tuning through the idea of proper-sizing resources. Proper-sizing also sets applications up well for a move to the cloud, where the organization pays for however many resources the application consumes monthly, rather than in cyclical data center capacity refreshes.

To properly size and scale a deployment, start with the application requirements as a guideline — but don’t treat them as laws. Every application designer has optimal settings for the application’s resources. Optimal might cover only the top of the range of acceptable application performance, however. And IT teams should query what the optimal resources are for the expected user base: 100 or 10,000? Balance the app designer’s settings with the available environment, and use this information as a starting point for reliability and performance from day one.

We often focus on restrictions and limitation in virtualized resources, but setting priority on workloads can be just as critical to the overall operations of the application.

To start application performance tuning, dig into the resource usage profile. Does the application have a highly transaction-driven process or a more intense lookup process? The answer can quickly shift attention from a CPU-driven process to a memory-driven one. In virtual environments, set priority for CPU usage for high transactional loads and memory reservations for intense lookups. Use the same types of settings with both networking and I/O traffic priorities.

We often focus on restrictions and limitation in virtualized resources, but setting priority on workloads can be just as critical to the overall operations of the application. The actual delivery of the application from a VM, container or cloud resource must be configured correctly and tuned in each case, so take a narrow case-by-case approach rather than one for the whole IT deployment overall.

In cloud, sky’s the limit

Cloud resources are essentially infinite, so resource availability is not a challenge for application performance. However, the budget is limited, so strongly consider where and how your company wants to spend its money. Careless resource management wastes money when it occurs on internal IT systems, but when the same problems move to the cloud, the cost is more apparent. When it’s in black and white on a bill, you tend to notice skyrocketing consumption quickly.

Relationships between the resources are critical. A change in one affects the others, and one change, in isolation, could fail to help performance. The move from a spinning disk I/O platform to solid-state drive to tune performance in a storage-heavy application might require additional memory for caching before the improvement is noticeable to the user, for example.

Observe the effects of application performance tuning

Let monitoring tools be your guide. As you make a change in the deployment or how it is managed, follow its effect on application response times and resource utilization. Compare the pre- and post-change metrics to see what impact allocation changes and prioritizations have.

Monitoring data also provides insight into what changes to make next. Application performance tuning can move the hotspot or pinch point from one resource type to another.

Ideally, tuning eliminates any possible slow points in application operation — along with the complaints from users. At this point, apply the acquired performance information to the deployment setup. None of the effort to get the application to work correctly will matter if the changes are not reflected back in the initial deployment stage. Correct the base deployment, and avoid making the same mistakes repeatedly.

Adjust deployment best practices and expected ranges carefully. These changes have wide-reaching effects. If you didn’t do enough monitoring and adjustments beforehand, mistakes will propagate across all new deployments.

This was last published in August 2018

Dodge a data center outage with proper power design, commissioning

Data center outages continue to plague IT. Perform data center commissioning or an audit and have a solid power design to protect your organization from a crash.

continue reading…

United, Delta and Southwest Airlines — on top of a host of other well-known companies — have recently suffered from a major data center outage. And their highly publicized shutdowns have added yet another worry to the IT executive’s list.Many of these data center crashes were reportedly caused by electrical failures, which doesn’t come as a big surprise. According to the Uptime Institute, engine generator systems are the primary data center power source, with local utility power being an economic alternative. Utility power disruptions, however, “are not considered a failure, but rather an expected operational condition for which the site must be prepared.” In other words, these power disruptions are likely to happen in the majority of enterprise data centers. For CIOs who worry about this kind of thing their whole careers, this might be an opportunity to fund some needed improvements. But, be aware: Simply adding redundancy is not, in itself, the answer

The challenge of mission-critical data center power design

The greatest vulnerabilities in enterprise data centers are hidden flaws and installation errors. There is a world of difference between simply duplicating equipment and true mission-critical design. However, it’s a painstaking process to examine data center power design for potential points of failure. Consider hiring a highly qualified, independent specialist to do this task for your organization.

You can continuously review new or renovated facilities through design and installation, but it’s another matter to remedy vulnerabilities in an existing facility while it is in service. When you correct vulnerabilities, you can expose the operation to failures. But even if you don’t undertake risky corrections, know where the potential for failure lies to minimize the risk of a data center outage.

The false security of backup power

One of the most well-documented power failure outages in history happened at 365 Main in San Francisco. The company had redundant uninterruptible power supply (UPS) systems and generators to meet its customers’ expectations of constant availability. But, on July 24, 2007, Murphy’s Law paid an unwelcome visit.

First, there was a power failure. The data center’s UPS maintained power until the generators started. But, soon after, the generators shut down one by one, causing a data center outage that affected a litany of the company’s high-profile customers for hours.

Although the data center had a solid power system design, data center operators hadn’t exposed the issue — firmware in the generator control — through commissioning tests. Rather than test repeated failures and generator restarts under load, administrators relied on the false security of backup power and redundancy.

Many modern UPS systems can signal servers to start a controlled shutdown when battery life has dropped below a preset threshold. While not ideal, it’s far better to implement this capability than to experience a hard crash when restarts begin.

If you can fix a vulnerability, make a detailed plan for how you can do it, as well as how you would handle the potential failures that the remediation process could cause. For example, if an admin sets off a fire alarm, there should be someone with him who can deal with the condition and avoid the dump of a gas fire protection system and an automatic shutdown. And, if the plan is to turn off the fire alarm during the work, notify the facility, security and fire departments, and make sure someone stands by with a portable extinguisher. If there is potential for a cooling failure, plan to initiate selective shutdowns to reduce the heat load and place portable air conditioners as a precaution.

Minimize data center outage risks with commissioning

Even if a data center power design is perfect, there could still be errors that admins can only identify through commissioning. The commissioning agent not only looks at the correctness of the installation and verifies the proper settings and adjustments, but it also attempts to break your system. To complete a test, an agent uses a set of scripts, runs infrastructure systems under simulated conditions and shuts down various elements as if they have failed.

The commissioning process also includes a total power shutdown under load, and might introduce additional failures in individual pieces of equipment, depending on the level of availability used for design intent. The process should also identify unclear markings and unprotected or hard-to-reach critical controls, such as an emergency power off button without a protective cover and alarm.

For a new facility, begin commissioning in the design development stage. If you use an independent commissioning agent, make sure the agent identifies and remedies the majority of the potential flaws before you complete the project design. This not only reduces the chance of a data center outage, but avoids the potential for massive change order costs.

In existing data centers, it is too risky to do multiple shutdowns to look for problems, which means that full commissioning is impractical. In this case, consider a data center audit, which involves a combination of design review and on-site measurement, testing and inspection of critical systems. While it won’t expose every potential condition, it can uncover the vast majority of vulnerabilities and provide a path to remediation when practical.

Avoid these common IT blunders that lead to outagesFollow this checklist for data center ops best practicesBoost IT resiliency with a distributed architecture

This was last published in July 2017

Can vendors go unescorted in a secure colocation center?

You’ve sentenced your production servers to five-to-life lockdown in a secure colocation cage. Who’s allowed to visit them?

continue reading…

Unescorted access to colocation facilities is a big NO.

That engineer coming in from ABC123 Computer Equipment Inc. to patch your cables is just like any other employee of any company. Can you personally vouch for him? Has ABC123 or its partners vetted the person’s background to the extent that the colocation provider would have — or to the same degree to which you vet employees?

You should have chosen a colocation partner after carrying out due diligence on every possible area of security. A secure colocation center vets all of its own employees; it only allows named personnel from your company to have access to your cage or rack; all people entering the facility are logged.

When looking for security vulnerabilities, think like a ‘black hat’ that wants access to one or more companies’ information: The criminal could try corrupting an employee of each company or they could target just one relatively poorly paid engineer working for a third-party vendor. That support technician knows any weakness in the system, has all the “master keys” to certain systems, and understands where users might leave an area open by default or by mistake — without any particular loyalty to the company that owns this IT infrastructure. An unhappy vendor employee is not only easier to corrupt, but they are more valuable too, as they have access to multiple systems.

To ensure a secure colocation center, always send an escort with a vendor’s employee and verify that the technician has a proper job sheet stating what systems they should touch and what actions to take. Only the company that owns the IT equipment can permit the vendor to do anything not on the job sheet, such as log onto a different system or reboot a related system. If only the colocation center’s escort communicates with the vendor, allow no changes.

Next Steps

Find out how to make sure your third-party data center is secure.

This was last published in December 2014

How does data center colocation affect outage response?

Navigating data center malfunctions when hardware is off premises can be tricky. Organizations must have strong SLAs with their colo provider to ensure appropriate response times.

continue reading..

Data center colocation providers offer housing, power, cooling and security, but colocation also poses the drawback of slower response times during a data center outage. Organizations that use colocation must carefully plan where to store important data and pay attention to service-level agreements to minimize the effects of data center outages at colocation facilities.Consider an on-premises data center. The company owns the facility and equipment, builds and maintains the infrastructure, employs and allocates the staff, implements policies and procedures, and sets the priorities needed to remediate any outages. When trouble strikes, business leaders know who to call, there is ample notification and the staff can focus on the business’s interests.

For data center colocation contracts, this direct control is ceded to the service provider, who is responsible for troubleshooting and maintaining contact with the organization.

Troubleshooting both on and off premises

The first step following any data center outage is to identify the nature of the failure. It could be a fault in the server, storage or other business-owned infrastructure. If that is the case, the business must remediate the problem — not the provider. If the business has monitoring and reporting tools, the staff can receive word of the issue in a timely manner.

Admins can mitigate these concerns by making conscientious decisions about workloads.

 

The challenge is more pronounced in colocation situations in which the provider supplies the facility as well as the infrastructure equipment and maintenance staff. This means the organization never touches the facility; providers handle outage notification, troubleshooting and remediation all within the terms delineated in a service-level agreement.

In some cases, it might take many hours for the staff to reach the remote colocation site for any necessary physical repairs. At best, the loss of direct control over the environment can be nerve-wracking. At worst, businesses using service providers can experience extended and costly downtime.

Mitigating uncertainty in a data center colocation setups

Data center colocation providers may introduce additional uncertainty and complexities for organizations. Facilities in remote areas may be subject to geopolitical uncertainty and greater security issues. The provider’s desire to manage costs may trim the support staff, potentially lowering its response capabilities. Mergers and acquisitions can also disrupt the provider’s management staff, possibly affecting the provider’s day-to-day operations and responses to support requests.

Admins can mitigate these concerns by making conscientious decisions about workloads and architecting a backup and disaster recovery framework to blunt the effect of any outages. For example, a mission-critical workload might best be kept in the local data center, while other important workloads running remotely may employ clustering, snapshots and documentation tools.

This was last published in August 2018

IT still needs the tried-and-true on-premises data center

Despite a rapidly changing IT industry driven mostly by cloud, the on-premises data center is here to stay, according to a survey by the Uptime Institute.

continue reading…

Public cloud and colocation may drive the industry’s growth, but for many companies, their on-premises data center matters more than ever.

The rapid pace of cloud adoption hasn’t eaten into the two-thirds of workloads that enterprises still run in house, according to the latest Uptime Institute survey. The on-premises data center is still important because a large number of applications aren’t a good fit in the cloud, and it would be too expensive or technically infeasible for IT teams to rearchitect or rewrite them.

“A lot of folks have workloads that they need to keep in house for whatever reason — whether it’s compliance issues or security … they can’t outsource stuff at this point; they don’t feel safe enough to do that,” said Kelly Quinn, an analyst at IDC, which also recently published a survey on data center trends. “They want that close to the vest; they want that really tight, under their control.”

Thirty percent of respondents to the Uptime Institute’s annual survey plan to build a new traditional data center in the next year to meet capacity demands. This reflects organizations’ massive global build out of data centers, particularly in Africa, the Middle East and Asia, said Matt Stansberry, senior director of content at Uptime Institute.

While some organizations expand with data center facilities, others seek to better manage the on-premises capacity they already have. The survey also found that 60% of enterprise IT server footprints are flattening or shrinking due to ongoing server virtualization. With that consolidation, traditional IT teams are finding “data centers with some headroom,” Stansberry said. Five years ago, data center administrators struggled to keep pace with demand, but now they must deal with excess and aging capacity.

At this point, a lot of the nonearly adopters are testing the waters of the colocation market.

Kelly Quinn, analyst, IDC

“Sites that might have been brought on 15 years ago are now doing things to extend the life of existing data centers,” said Stansberry.

Some organizations conduct server refreshes to squeeze new life out of existing data centers, for example. Another way to extend data center life is to add an uninterruptible power supply (UPS) system and computer room air conditioning units.

Colocation still attractive

As regions such as Africa and Asia focus on new data centers, more established data center regions such as the United States are seeing stagnation in new data center builds and more interest in colocation facilities. Over half of enterprise respondents to IDC’s annual survey use colocation services. “At this point, a lot of the nonearly adopters are testing the waters of the colocation market,” Quinn said.

Among the other half of respondents, there’s still a lot of uncertainty around colocation adoption. IT leaders still ask a variety of questions, said Quinn: How do we engage colocation providers? Which workloads should we migrate? And how secure is this going to be?

Colocation providers must help customers answer these questions prior to the initial engagement, which can be a bit of a handholding process.

Downtime still matters

Whether you use a colocation provider or have an on-premises data center, downtime is still a consideration. Recent high-profile data center outages ranging from Amazon Web Services to airlines still occur with surprising frequency and underscore the significance of downtime for enterprise data centers. How concerned, and prepared, are data center admins for unplanned downtime?

The Uptime survey found that nearly 90% of organizations conduct root-cause analysis of an outage, and over 60% measure the cost of downtime. Many IT organizations are testing IT-based resiliency, relying on multiple geographically distributed data centers for application failover, but they’re not ready to throw out their UPS systems just yet.

“When we talk about IT-based resiliency,” said Stansberry, “it has a long way to go.”

Next Steps

Discover data center trends of 2017

Cloud-style pricing comes to the on-premises data center

How to move servers to a colocation facility

This was last published in May 2017

How to successfully perform a data center upgrade

When data center hardware becomes outdated, there are a few options. Here are ways that admins can ensure that a migration is efficient and downtime is minimized.

continue reading the resource…

A data center upgrade is like a home remodeling project; it requires planning, a budget and a certain finesse to ensure that existing infrastructure isn’t damaged in the process. There’s also the option to skip the remodel altogether and head to a condo with homeowners’ association fees.

Admins must make a few important decisions before they decide to tackle a data center upgrade. It’s an expensive, time-consuming and risky project — and in the age of the cloud, it’s sometimes unnecessary. If admins decide that upgrading the data center is the best route, they should know how to make the process a smooth one.

When to move to the cloud or colocation

The public cloud versus on-premises debate is never-ending, but the pro-cloud argument is growing stronger. A few years ago, on-premises data centers were necessary to maintain a level of security that cloud couldn’t provide; but now, public cloud providers are offering higher security protocols to comply with stringent data protection laws.

Admins should also consider the price of upgrading. Expensive hardware might not be worth it if the organization is willing to sacrifice a level of control to a cloud provider. Plus, a cloud provider can offer more IT support and maintenance than a small to midsize organization can afford to keep in-house.

Admins shouldn’t simply migrate to the cloud, however, without determining the necessary network bandwidth. Cloud providers need to share internet traffic with other tenants, which can put a strain on individual network resources.

Organizations should also consider colocation facilities to cut the costs of cooling equipment, staffing, hardware and maintenance. The issue of control, however, once again comes into play. IT has a support staff to call when downtime occurs, but they don’t have control over how long it takes to remediate the issue. And if an organization didn’t properly evaluate the service-level agreement with a colocation provider, then IT can stumble upon hidden fees or caveats when a problem occurs.

To avoid these problems, admins should determine which workloads — if any — belong in a colocation facility. They should also make a disaster recovery plan in case of an outage.

How to prep for a data center upgrade

If admins decide to keep workloads on premises and upgrade the hardware, they must ensure that downtime doesn’t occur. Or, if it does, they must ensure that it doesn’t affect mission-critical workloads.

Admins should also establish a workflow for the server upgrade process.

Traditionally, admins could let end users know in advance about an upgrade and take workloads offline during non-work hours. But this method isn’t possible for organizations with remote employees or global organizations that span various time zones. In these cases, admins should move the affected workloads to a public cloud and reroute the traffic before starting the upgrade.

Admins should also perform pre-upgrade testing on new versions of software that they plan to use. Admins can perform the data center upgrade process in a lab environment to vet out bugs and compatibility issues. Service providers can help with this process for organizations that lack the necessary in-house resources.

Make upgrades carefully

Admins should replace data center hardware with redundancy to prevent data loss. For example, admins that need to replace a network switch should have a secondary switch in place before the replacement occurs to prevent connectivity loss.

Microsoft designed Windows servers as failover clusters that support rolling upgrades to address this situation. Admins can perform the upgrade process so that only one server node at a time is offline. However, admins should first ensure that the cluster they are trying to upgrade can run its workloads without a cluster node.

Admins should also establish a workflow for the server upgrade process. The most mission-critical workloads aren’t necessarily the first ones they should migrate. It often makes more sense to migrate the least important workloads first to ensure that the migration techniques work properly.

This was last published in November 2018

What you need to know before buying hyper-converged technology

Editor’s note

Hyper-converged technology offers organizations a way to make their IT infrastructure more agile and efficient. By integrating compute, core storage and storage networking into a single, scalable platform, an HCI system can help reduce storage costs and simplify data center management.

A hyper-converged architecture is suitable for any workload that might have typically been hosted in a virtualized environment. This includes virtual desktop infrastructure (VDI) deployments as well as use cases involving applications that have been shifted to the cloud. Hyper-convergence can also support highly distributed environments, such as companies with hundreds or thousands of small branch locations. When companies need additional computing capacity or storage capacity, they simply add more nodes to their HCI cluster.

Before selecting an HCI vendor, it’s crucial to carefully evaluate your HCI requirements so you can make a compelling business case to management. For example, do you need to provide the same operating environment for both on-premises and public cloud workloads? Which deployment option would be most beneficial — an integrated HCI appliance, HCI software that is sold as part of a reference architecture, or HCI that operates both on-premises and in the public cloud? By examining the key features and functions of leading HCI products, and by matching those to your organization’s requirements, you can choose the hyper-converged storage product that will best suit your needs.

To help you create a shortlist of hyper-converged products, we’ve provided a photo story that highlights 10 leading HCI vendors. This roundup illustrates how each vendor addresses deployment options, standard storage management features and advanced secondary storage features, such as data protection, backup, archiving and data recovery.

This hyper-converged technology buyer’s guide focuses on 10 market-leading vendors that offer software-defined storage and appliances. Our research is based on TechTarget surveys and reports from other well-respected research firms, including Gartner and Forrester.

1. Key criteria to selecting the right hyper-converged product

Each organization has individual business needs and requirements, so each will have varying demands of its HCI product. Learn what leading hyper-converged vendors provide to help organizations align with those needs.

2. Examine what products leading hyper-converged vendors offer

Learn what differentiates offerings from 10 leading HCI vendors so you can choose the hyper-converged technology that will best address your organization’s workload needs.

Public cloud workload success requires IT leadership

IT must change with the times and adapt to the reality that others within the organization can now procure and provision cloud resources without their input.

continue reading the resource below…

Few innovations have redefined IT as much as the public cloud. According to Enterprise Strategy Group research, 85% of IT organizations now use public cloud services, using either IaaS or software as a service, and 81% of public cloud infrastructure users work with more than one cloud service provider. Multi-cloud is our modern-day IT reality. Amidst this mass adoption of public cloud services, an interesting phenomenon is occurring: Of IaaS users, 41% have brought at least one public cloud workload back to run on premises. While this may seem like an indictment on the use of public cloud services, it isn’t, quite the contrary.

At ESG, we recently conducted an extensive investigation into the decisions that led to migrating workloads back on premises. When looking for insights into the factors driving and influencing these migration activities, one theme stuck out. Often in the enthusiasm to benefit from public cloud infrastructure, companies commit workloads en masse without applying necessary due diligence. Only later do they identify that some simply don’t fit. Additionally, the cost of moving the workloads and the data back and forth proved significant in most cases.

What can we learn from the organizations that have prematurely shifted workloads to the cloud, only to be forced to move them back at a later date? Well, incongruities between cloud expectations and actual performance arise for a variety of reasons:

  • Cloud decision-makers are often not IT decision-makers. While IT plays a role in a majority of cloud provider selections, there are still a significant number of businesses where IT is left out of the decision process. This is an issue.
  • Factors that influence public cloud workload success often differ from those historically used to drive on-premises decisions. Historically, IT has been a game of managing aggregates — ensuring storage, compute and network deliver performance, as well as capacity and bandwidth necessary at a data center level. Individual public cloud workload analysis is often done on an additive basis. In other words, does the current infrastructure have enough performance or capacity headroom, or should we add more? With public cloud infrastructure service offering far more granularity with resource deployment than on premises, decisions should be made on a workload-by-workload basis. Individual workload characteristics, such as performance and capacity, play a role in determining the cloud cost-effectiveness.
  • The cloud introduces new rules and interfaces. One issue is the ease with which cloud infrastructure users can procure and provision resources. This ease has accelerated adoption, sure, but it has also opened up cloud services to divisions in organizations without the necessary workload — performance or data sensitivity requirements — expertise.

What to do?

IT needs to take the lead on cloud. For most companies this is already the case, but the perception of IT as business inhibitor persists. Line-of-business teams and developers still bypass IT to use cloud resources in a sizable percentage of companies. While the IT community often refers to these as shadow IT activities, many cloud users see this as a feature rather than a bug. They believe they’re serving their company’s best interests by bypassing the slow, outdated IT infrastructure team and processes for a leaner, faster, more agile process.

The reaction from IT decision-makers is often that business must change, and the IT team should make technology decisions. The second part of that statement is accurate, but the first portion simply doesn’t work anymore. IT needs to change.

Here’s how:

  • Understand and manage IT at a workload level. Specifically, IT needs to index heavily on application performance and data sensitivity requirements. These two points represent the most common issues with public cloud workloads, and often the culprit is that the necessary analysis wasn’t done upfront.
  • Architect processes that don’t impede cloud access. This may seem counterintuitive, but if IT impedes or delays access to the cloud, the rest of the business will continue to bypass IT on the way to the cloud. Focus on what matters. It will vary by public cloud workload and organization, but data sensitivity, compliance and performance requirements should take precedence.
  • Use tools from cloud providers to assist with gaps. Cloud providers acknowledge the hurdles businesses encounter with cloud services and offer tools to assist.

As an example of this last point, earlier this year, AWS launched a service called Zelkova, which employs automated reasoning to analyze cloud policies, understand them and then inform on their future consequences. Amazon may have noticed some organizations struggling with proper public cloud workload adoption and built a technology to reduce the complexity and guesswork involved.

A perception exists that they’re serving their company’s best interests by bypassing the slow, outdated IT infrastructure team.

One of Zelkova’s goals is to improve confidence in the security configurations through its standout Public/Non-Public identifier. AWS S3 exploits Zelkova technology to check each bucket policy and then identify if an unauthorized user can read or write to the bucket. A bucket is flagged as Public when Zelkova identifies public requests that can access the bucket. Non-Public, meanwhile, means Zelkova has verified that all public requests are denied.

This tool offers an incredibly valuable service, given the significant skill shortage of cybersecurity IT professionals. The cloud introduces new paradigms when it comes to data security, and any tool that simplifies that process reduces the risk involved in using a hybrid cloud ecosystem and its burden on business.

This is your company’s data, however, and tools like Zelkova are just that, tools. They provide a valuable protection layer when using cloud resources for a public cloud workload, which should expedite IT processes and ensure cloud adoption is done quickly and securely. These tools don’t replace internal due diligence. Ultimately, IT must lead the business to the cloud, and not let the business pass them by.

This was last published in November 2018

Automated data management is a crucial part of IT’s future

Automating data management will play an important role in helping us cope with the coming zettabyte apocalypse, a time when the amount of data generated will overwhelm storage resources.

continue reading the rest of the resource…

Readers of this column know I’m preoccupied with the idea of automated data management. Data management is where the proverbial rubber meets the road when it comes to the future of IT.You can software-define to your heart’s content, hyper-converge until it hurts, but the simple truth is the coming data deluge, combined with eagerness to divine nonintuitive truth from all that data and the government’s desire to ensure the privacy of those who produce the data, spells trouble for the not-too-distant future. There will be too much data, too much storage infrastructure and too many storage services for human administrators to manage. Some sort of automated technology will be required. Time is running out, however: The zettabyte (ZB) apocalypse is on the horizon.

The coming zettabyte apocalypse

IDC projected more than 160 ZB — that’s one followed by 17 zeros — of new data by 2024. Microsoft Azure analysts reported a year or so back the entire annualized manufacturing capacity of disk and non-volatile RAM storage vendors totaled close to 1.5 ZB. Simple math says we cannot hope to host the volume of data that’s amassing with the equipment we have available.

I’m delighted to hear from industry friends that the deficits will be made up for by new data reduction and data compression technologies. However, I can’t help remembering no one ever got the 70-1 reduction ratios deduplication vendors promised back in the first decade of the millennium. So I’m not as enthusiastic as others about the deduplication dike blunting the data tsunami.

Improvements in capacity utilization efficiency from innovations such as fractal storage, using fractal rather than binary algorithms to store data, are more promising. This could get us close to storing massively more content in the same footprint we have now. Even simpler would be using optimized storage mechanisms that eliminate all data copies and enable the use of all types of storage — file, block and object — concurrently. (StorOne is important here, given its core patents in this area.)

Ultimately, however, storage will go hyperscale, and there will simply be too much data to manage. Unsurprisingly, backup vendors have been among the first to blow wise to this reality.

A possible starting point

I recently chatted with Dave Russell, formerly a Gartner analyst, now Veeam Software’s vice president of enterprise strategy. Russell showed me a diagram of what he called the journey to intelligent or automated data management. Coming from Veeam, it wasn’t surprising the originating point of the diagram was backup.

Backup is a core data management function. The only way to protect data is to make a copy — a backup — that can be stored safely out of harm’s way and restored if the original data is damaged or deleted. That’s Data Protection 101.

To Russell, backup is one place to begin the journey to intelligent data management. I agree, though possibly for different reasons. To effectively perform a backup, you need to define, based on an evaluation of business processes the data serves, what data to back up and, based on the access and update frequency of the data, how often.

The question is whether the industry can get to automation in time to avoid catastrophic losses of data.

 

This kind of data classification exercise produces a data protection policy for certain data that admins can ultimately expand to become a policy for data lifecycle management. Russell says the initial focus on backing up data associated with a certain server-storage kit will expand over time to a more aggregated approach covering many servers and storage units or even multiple clouds. This evolution will require a better tool set for managing lots of data protection processes across diverse infrastructure in a way that affords greater administrative efficacy and visibility. At that point, automation can start to become more proactive by allowing the automated data management system to apply policies to data and perform necessary protection tasks.

On Russell’s diagram, an interesting transition occurs. This visibility stage marks a change from policy-based data management to behavior-based data management. The behavior of the data provides the basis for classifying it and delivering necessary storage resources for its storage and appropriate services for its protection, preservation (archive) and privacy (security) until all the handling of data is completely automated — the final stage in Russell’s diagram.

Lost in translation?

I like the idea of changing from policy-based to behavior-based management, but it may be difficult to communicate to customers. One vendor that went to market with a cognitive data management conceptualization of automated data management at the beginning of this year found that the “cognitive” part impeded adoption. Folks aren’t quite sure what to make of computers automatically evaluating data behavior and taking actions.

A friend at the cognitive data management company told me he tried to use the metaphor of a driverless car to get the concept across. That didn’t work out well when tests of driverless vehicles produced multiple wrecks and injuries because of flawed programming and slow network updates.

He’s since retooled his marketing message to suggest his product automates data migration and is gaining quite a bit of traction as a result, because data migration is the bane of most storage admins existence, a thankless task that consumes the bulk of their workday. Like Veeam, which is trying to get to data management nirvana from the starting point of backup, my friend’s company is trying to get there from the starting point of improving the efficiency of data migration.

Both approaches to building real automated data management could get your data where it needs to go safely and securely. The question is whether the industry can get to automation in time to avoid catastrophic losses of data.

This was last published in November 2018
Registrations
No Registration form is selected.
(Click on the star on form card to select)
Please login to view this page.
Please login to view this page.
Please login to view this page.
Copied!