Everywhere one turns, at least in the movie theater or small screen, there is a new superhero with some new power or focus. Given how important application or site uptime is, it’s a wonder that there isn’t yet a superhero for troubleshooting software problems. In a way, though, superheroes have existed for some time and have been the primary means of keeping applications running smoothly.
These Software Reliability Engineers (SREs) or support superheroes are highly skilled at finding a solution — if not the actual root cause — of issues based on their deep experience, knowledge, and intuition. But these software superheroes cannot scale to meet the new reality caused by the combination of ever-increasing complexity and the accelerating rate of application change. The combination is driving a steady growth in incident numbers, and observability tools are not the problem. The limitation is the amount of information the human mind needs to quickly absorb and analyze to resolve these incidents.