Detecting issues early, before they impact your systems, is the key to performance tuning, but it's impossible to achieve without constant monitoring and insight into your operating environment. However, finding and implementing a monitoring tool that provides the right information can be a difficult task.
Most issues are buried deep in complex interactions between applications, networks, servers and other infrastructure components and it's difficult to find a monitoring tool that is compatible with all of those devices without adding additional complexity to the environment. Many of the available tools are limited, as they are too narrowly focused on the type of infrastructure they can monitor, or the data they gather is too shallow. To optimize performance you need a monitoring tool that provides broad insight into multiple systems and correlates data in a single interface. Viewing part of the system provides an incomplete picture of the truth. You need the different points of view but you also need to know how all tie together.
This issue becomes even more complex when you have IT responsibilities dispersed throughout the organization. As an example, database, network and storage may all be managed by individual groups in some companies - even in different locations. Each function may also have separate monitoring systems. Thus, the metrics in a single system may fail to show the root cause of a problem. Without cross-team communication, issues may be investigated in a silo with each team focused on optimizing their individual problem rather than optimizing for business alignment.
A monitoring tool should allow you in-depth insight into all your systems in one place. Without this transparency you can expend unnecessary resources trying to determine the root cause of an issue. Let's say you are running a database server on top of VMware. Your database server may suddenly run at 100 percent CPU for no apparent reason. Your application or database team may then spend an hour trying to figure out what has changed. In reality, nothing has changed - but another team may be doing stress testing on another virtual machine sharing the same resources, stealing CPU from the database. Each business unit sees a point of view but lacks a complete view of the entire system.
In today's business environment there is a renewed focus on big data and metrics. However, when it comes to monitoring, pure performance metrics may not tell the whole story. For example, slowing may be normal during peak usage, or it may indicate the start of a problem - straight performance metrics that give you the data without correlating it to other factors do not provide insight into which is occurring. However, a graph of database queries processed per percent of CPU consumed would immediately show whether changes in CPU load are due to changes in workload, or available virtual resources, or code changes.
The right network monitoring system can drive transparency and efficiency. Organizations should look for a system that offers dynamic device discovery, automated configuration, continuous scanning, self-correction, custom alerts, and powerful analytics. One of the most important features, however, is automation. While there is a great move to automate process, automation in monitoring not only yields efficiency but access to insights that can drive business decision.
Many organizations do not even realize that you can have a monitoring system that automates the roll-up and presentation of data. For example, you may have different business units that work across different systems. You have multiple storage arrays that are used by the multiple departments that are themselves geographically distributed. In this environment it would not be easy to get a report on how much storage is spread across the organization, and attributable to each department. A monitoring system that automates customized flexible graphing would easily generate this report. In addition if you added another storage array in any location, the reports would automatically update as well.
IT infrastructures are becoming increasingly complex, heightening the need for transparency and communication. Access to the right intelligence can drive optimal performance by fine tuning operations, troubleshooting issues, and proactively resolving problems before they impact the business. There must be a singular focus on solving business problems rather than individual issues. If you can plot business and infrastructure metrics together you can become much more efficient at troubleshooting and deploying resources. The right monitoring tool can be an essential tool in achieving these objectives.
About the Author Steve Francis is the founder and chief Product Officer, LogicMonitor, a SaaS-based IT infrastructure monitoring company.