Hidden Cost of Band-Aiding Network Problems of Small Cells

IT technician


IntroductIon

Network problems cost money. This is true for several reasons–not only for the end-users and business functions that are impacted, but network engineers can spend hours, days, or even weeks analyzing data in hopes of finding the root cause. At times, when high level applications are impacted, several departments within IT may come together in War Room situations to find the root cause. These meetings impact the valuable time of several departments within IT and don’t guarantee that the problem will be solved.

If the clear cause of the problem is not found, many organizations invest in guesswork upgrades and blind changes, which still may not resolve the root cause. These band-aid methods may address symptoms of the overall problem, but do not fix the underlying core issue. This approach to network troubleshooting is a reality in many IT organizations, costing thousands, if not hundreds of thousands of dollars, while the problems continue to plague the network.

How much money is spent? A recent study from SpiceWorks.com showed that IT budgets for small to medium businesses are averaging at just over $300,000. Of that number, 75% goes toward hardware and software upgrades. This means that the average SMB spends over $225,000 annually on IT upgrades. This number does not include the much larger enterprise level spending, which is well into the millions of dollars. With all this money spent to improve the network, are IT departments seeing problems disappear, performance increase, and business efficiency improve? Not really.

In this article, we look at some common actions taken by IT departments to address network problems, which often result in only marginal, if any, improvement. We will then show how to avoid these costs and save budget dollars by clearly visualizing and isolating the true root cause of network problems prior to making guesswork changes.

The Top 5 Band-aids to Network Problems

1. Replacing Network Infrastructure Hardware
There is certainly nothing wrong with giving the network a new pair of shoes. This is recommended and even mandatory in some vendor support contracts. However, when troubleshooting network problems, attempting to gently upgrade the problem away one switch and router at a time rarely works – or at least not until after hundreds of thousands of dollars have been spent.

Too often, we hear the claim that a certain switch, router, load balancer, or other device is “eating” packets, or slowing them down on the way to the client/server. These claims are almost never proven and can lead down the path of guesswork upgrades. Without understanding the issue, a company may replace a device only to find that the original problem lingers. There is a time and place for upgrading network hardware, but this should not be the first step when troubleshooting problems.

2. Upgrading Cabling Infrastructure
Again, it’s not wrong to upgrade cabling in all cases. When trying to resolve network problems though, it’s not a wise move. We may hear it from the cable guy trying to sell an upgrade or from a well-intentioned network engineer. “The cabling is too old/limited/legacy to support voice/video/1Gbps. The next gen cabling will make performance go up.” Is this claim true? Prior to issuing an order for new cabling, have we validated the copper or fiber that is already in the walls? What is its true potential and what rates does it support as installed? Before pulling out the jackhammer on the cable infrastructure, these questions should have clear answers to validate whether Cat“X” would really have the desired impact on the network problem.

3. Increasing WAN Bandwidth
Adding more bandwidth to a link may not do much more than drain more of the budget. Depending on the root issue, bandwidth upgrades could appear to give users at the remote office improved overall performance, but underlying issues that commonly cause problems at these sites such as chatty applications, small TCP windows, and unexpected utilization bursts may continue to plague application delivery. Prior to upgrading WAN bandwidth as a response to network problems, make sure to perform a traffic and utilization analysis on the link in question. If 100% utilization is experienced on the link during the time of the problem, identify the traffic and ensure that only normal and expected communications are involved. This traffic could be due to Windows Updates, backups, or even peer-to-peer downloads. Instead of throwing more bandwidth on the link as a solution, a usage or QoS policy could be utilized to ensure that business applications have priority. Backups and updates could be reconfigured to run at off-peak hours. Whether WAN links are supporting QoS as configured must be tested to ensure proper prioritization end-to-end.

Replacing hardware can be costly. Here are some ranges for commercial grade equipment to consider:
  • Avg Cost Range for Core Network Switch - $8,000 - $120,000
  • Avg Cost of Copper Cabling Upgrade (100 drops) - $35,000
  • Avg Cost of New Server - $8,000
*Costs listed include ranges based on hardware/software requirements and do not include estimates for installation and maintenance costs

4. Upgrading links from 1Gbps to 10Gbps/10Gbps to 40Gbps
As with upgrading WAN links, there is a time and place for increasing bandwidth on the LAN. A careful capacity management study should be conducted before considering these improvements as a reaction to network problems. Remember, increasing the bandwidth by a factor of 10 will not improve network and application performance by a factor of 10. If bandwidth is not the root cause of the problem, these improved links may only mask the real underlying problem.

5. Throwing more AP out there to support BYOD
The BYOD boom has put a tremendous strain on the wireless environment. Due to a lack of experience with these devices, or if a wireless expert is not in-house, well-intentioned engineers may simply purchase and connect more APs to support them. More isn’t always better when it comes to APs. To save money on the infrastructure, a careful signal quality assessment should be conducted prior to purchasing anything. Otherwise performance could continue to suffer as the air becomes more congested with competing channels and devices. Downtime and poor performance costs money. IT Managers and Directors may put pressure on engineers to take action to resolve the problem before it is clearly understood. This can cause them to make poor decisions on a resolution before a root cause is found. The effect of “hurry up solutions” can have long term implications on both IT resources and organizational effectiveness.

How to find True Root Cause

In a word – Visibility.

We cannot resolve what we cannot see. If the network is experiencing problems, the wrong thing to do is to try to gently upgrade the problem away, with no clear data that will direct those decisions. The complete environment from client to server must be monitored, analyzed, and understood prior to making network changes. This must also include the Wi-Fi environment, due to the explosive increase of client side wireless access. All too often, companies spend mountains of money to resolve a problem, only to continue to suffer with it after the changes are made.

One reason that IT departments suffer from the hidden costs of band-aid troubleshooting is that they lack the visibility tools necessary to get to the root cause. As we saw earlier, when IT budgets are planned for the year, in many environments space is reserved for upgrades, improvements, and new applications/services, with little room left over for visibility tools. Engineers may be expected to rely on open-source freeware or current monitoring systems that have gaps in visibility from client to server. These tools may end up costing IT more money than they save, due to the fact that they cannot give the complete picture of the problem.

A solid monitoring system must be able to make use of the three pillars of network analysis and monitoring: SNMP, flow metrics (Net Flow, sFlow, jFlow, etc), and packet data. The problem is that most network management systems use sampling rates that are too coarse to isolate short bursts of high utilization, and have no “on the wire” packet visibility. To offset these gaps in their visibility, many network engineers use freeware packet capture and analysis software, but often lack the skills to truly understand application behavior and causes of latency from examining packet decodes.

When visibility tools have the right granularity of detail coupled with automation of the analysis, engineers can be correctly guided to the true root cause. Only then can decisions be made that will resolve the issue without wasting more money from the IT budget.

Conclusion

Blind troubleshooting is expensive. Upgrades and expensive changes may only soak up the IT budget, while the original problem goes unresolved. Resist the urge to troubleshoot symptoms rather than root cause. Make educated decisions that are based on facts from visibility tools. This will direct precious dollars to the real solution, rather than a band-aid fix.

© 2014 Fluke Corporation.