Site reliability engineering (SRE) is a discipline in which automated software systems are built to manage the development operations (DevOps) of a product or service. In other words, SRE automates the functions of an operations team via software systems.
The main purpose of SRE is to encourage the deployment and proper maintenance of large-scale systems. In particular, site reliability engineers are responsible for ensuring that a given system’s behavior consistently meets business requirements for performance and availability.