Please login or sign up to post and edit reviews.
Incident Response Machine Learning with Chris Riley
Media Type |
audio
Categories Via RSS |
Technology
Publication Date |
Nov 12, 2019
Episode Duration |
00:45:50

Software bugs cause unexpected problems at every company.  Some problems are small. A website goes down in the middle of the night, and the outage triggers a phone call to an engineer who has to wake up and fix the problem. Other problems can be significantly larger. When a major problem occurs, it can cause

The post Incident Response Machine Learning with Chris Riley appeared first on Software Engineering Daily.

Software bugs cause unexpected problems at every company.  Some problems are small. A website goes down in the middle of the night, and the outage triggers a phone call to an engineer who has to wake up and fix the problem. Other problems can be significantly larger. When a major problem occurs, it can cause

riley.jpg?resize=175%2C175&ssl=1" width="175" height="175">

Software bugs cause unexpected problems at every company. 

Some problems are small. A website goes down in the middle of the night, and the outage triggers a phone call to an engineer who has to wake up and fix the problem. Other problems can be significantly larger. When a major problem occurs, it can cause millions of dollars in losses and requires hours of work to fix.

When software unexpectedly breaks, it is called an incident. To triage these incidents, an engineer uses a combination of tools, including Slack, GitHub, cloud providers, and continuous deployment systems. These different tools emit updates that can be received by an incident response platform, which allow the on-call engineer to have the information they need centralized to more easily work through the incident.

On-call rotation means that different people will be responsible for dealing with different incidents that occur. When an incident happens, the current engineer who is on-call may not be aware that a similar incident happened last week. It might be easier for the new engineer to triage the issue if they have insights about how the incident was managed during the first time.

Chris Riley is a DevOps advocate with Splunk. He joins the show to discuss the application of machine learning to incident response. We discuss the different data points that are created during an incident, and how that data can be used to build models for different types of incidents, which can generate information to help the engineer respond appropriately to an incident. Full disclosure: Splunk is a sponsor of Software Engineering Daily.

Sponsorship inquiries: sponsor@softwareengineeringdaily.com

The post Incident Response Machine Learning with Chris Riley appeared first on Software Engineering Daily.

This episode currently has no reviews.

Submit Review
This episode could use a review!

This episode could use a review! Have anything to say about it? Share your thoughts using the button below.

Submit Review