Site reliability engineering (SRE) is a discipline that incorporates aspects of software engineering and applies them to infrastructure and operations problems.[1] The main goals are to create scalable and highly reliable software systems.[1] Site reliability engineering is closely related to DevOps, a set of practices that combine software development and IT operations, and SRE has also been described as a specific implementation of DevOps.[1][2]
The field of site reliability engineering originated at Google with Ben Treynor Sloss,[3][4] who founded a site reliability team after joining the company in 2003.[5] In 2016, Google employed more than 1,000 site reliability engineers.[6] After originating at Google in 2003, the concept spread into the broader software development industry, and other companies subsequently began to employ site reliability engineers.[7] The position is more common at larger web companies, as small companies often don't operate at a scale that would require dedicated SREs.[7] Companies who have adopted the concept include Dropbox, Airbnb, and Netflix.[6] According to a 2021 report by the DevOps Institute, 22% of organizations in a survey of 2,000 respondents had adopted the SRE model.[8][9]
Site reliability engineering is the application of software engineering to IT subjects including infrastructure and operations, with the goal of creating and maintaining scalable and reliable systems.[1][4] Site reliability engineers often have a backgrounds in software engineering, system engineering, or system administration.[10] Focuses of site reliability engineering include automation, system design, and improvements to system resilience.[10] SRE teams are responsible for system availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning.[11]
Site reliability engineering is closely related to DevOps, a set of practices that combine software development and IT operations, and has also been described as a specific implementation of DevOps.[1][2] Site reliability engineering focuses specifically on building reliable systems, whereas DevOps is more broadly focused on infrastructure.[1] The definition varies somewhat by company, and Stephen Gossett wrote in Built In that some companies have rebranded their operations teams to SRE teams with little meaningful change.[7]
The USENIX organization has held an annual SREcon conference since 2014 for site reliability engineers in industry, and also holds regional conferences with similar themes.[12]
By: Wikipedia.org
Edited: 2021-06-18 19:18:45
Source: Wikipedia.org