Harnessing AIOps for IT Operations and ManagementGartner's Padraig Byrne on Data Challenges and Operational Excellence
The deluge of network and telemetry data impedes problem triage and resolution. Support teams often miss critical alerts because of alert fatigue, and task prioritization is difficult as IT teams are dispersed. Artificial intelligence for IT operations or AIOps can potentially address these issues.
Telemetry data is collected from different parts of the IT setup like logs, servers, network devices, endpoints, client agents, sensors, network monitoring tools, and user applications. It comprises past performance records, real-time operations updates, system logs and network information.
Bearing in mind the critical role that telemetry data plays in monitoring IT infrastructure, lapses in collection can lead to significant challenges such as frequent outages, customer frustration, and revenue loss. For instance, two of Australia's largest supermarket chains experienced major technical issues last year.
"IT operations is challenged by the rapid growth in data volumes generated by IT infrastructure and applications that must be captured, analyzed and acted on," said Padraig Byrne, senior director analyst at Gartner.
Businesses are addressing these challenges by turning to AIOps. AIOps - a term coined by Gartner - is a set of enabling technologies allowing businesses to quickly prevent, identify and resolve high-severity outages and other IT operations problems. Gartner predicts that large enterprises' exclusive use of AIOps and digital experience monitoring tools to monitor applications and infrastructure will rise from 5% in 2018 to 30% in 2023.
According to Byrne, the long-term impact of AIOps on IT operations will be transformative. By combining big data and ML functionality, these platforms can enhance or partially replace all primary IT operations functions, including availability and performance monitoring, event correlation and analysis, and IT service management and automation.
AI leverages "sophisticated algorithms to navigate daily IT complexities, freeing resources from mundane tasks," said Bharani Kumar Kulasekaran, product manager, ManageEngine.
"This strategic AI integration is not just for efficiency but an imperative for organizations aiming to thrive amid evolving technological challenges and hybrid infrastructures," Kulasekaran said.
AIOps has "a bright future" as it combines development (DevOps) and traditional application maintenance services such as monitoring, said Kumar Venkatesan, delivery director, data and AI, iLink Digital. "It will certainly help with continuous integration, data/system stability and increased user experience," he said.
"More than 70% of medium and large customers are opting for AIOps technologies. I am confident about its great demand in the near future," said Navin Singh, global head - AIOps automation, IP solution development, industrialization, prototype and support services.
Singh identifies use cases such as causal analytics, anomaly detection, auto heal and prediction of outages.
AIOps Use Cases
Here are some areas where AIOps could be applied in IT management.
- Data Aggregation: A diverse range of sources contribute to data aggregation within an IT environment. This data can then be organized, labeled and classified. Correlations can be formed between log data, which helps in the triage process and root cause analysis, shortening the mean time to repair or resolve.
- Noise Reduction: The sheer volumes of alerts can overwhelm network administrators, potentially causing them to overlook some critical alerts due to alert fatigue. AIOps can filter and prioritize alerts, thereby enabling the team to focus on the most crucial issues that affect reliability. This also reduces false positives.
- Intelligent Alerts and Escalation: Once problems are pinpointed through root cause alerts, ITOps teams can use AI to promptly alert the right experts or response teams for a speedy resolution. In some cases, AI will help resolve issues without human intervention.
- Continuous Improvement: Using ML, AIOps models can be trained on current and past incidents and the actions to resolve them. Historical logs act as a knowledge base for learning, triaging and auditing. This knowledge base can be continuously updated with every ticket resolution, making AIOps more intelligent and effective through continuous improvement. This also helps in application maintenance service.
- Capacity Optimization: AIOps can enable IT managers to use resources optimally, reduce the wastage of underutilized resources and save costs. With the right AI tools and statistical analyses, one can predict future resource requirements and do just-in-time provisioning. AIOps can monitor the usage of resources such as computing, storage, bandwidth, memory and applications to ensure optimum and efficient usage. This will lead to fewer problems and disruptions.
- Enhanced Security: AI can improve application security by analyzing large amounts of data to detect and respond to security threats. AI helps in forming event correlations to identify patterns, which is crucial for identifying security threats.
- Full-Stack Observability: AIOps triggers rapid root cause analysis from telemetry, optimized resource usage and proactive performance monitoring. For example, in a healthcare institution's cloud-based infrastructure, AIOps conducts telemetry-driven root cause analysis, swiftly pinpointing performance issues across various layers. By continuously monitoring resource utilization patterns and dynamically optimizing allocation based on real-time demand, AIOps ensures operational efficiency, cost savings, and proactive handling of potential performance challenges.
AIOps is in the nascent stage of adoption, but it promises to transform IT operations and management by employing automation, AL/ML, analytics and big data.
It will help the IT organization proactively manage incidents, streamline alerts, optimize capacity, and achieve operational excellence - delivering reliability, cost savings, and uninterrupted service.
"IT leaders are enthusiastic about the promise of applying AI to IT operations, but as with moving a large object, it will be necessary to overcome inertia to build velocity," Byrne said.
He remained optimistic about AIOps usage as "AI capabilities are advancing, and more real solutions are becoming available daily."
"There is no future of IT operations without AIOps," Byrne said.