HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Datapoints motivating introduction of SRE in organisation

Submitted by: @import:stackexchange-devops··
0
Viewed 0 times
sredatapointsmotivatingintroductionorganisation

Problem

As there is no Site Reliability Engineering dedicated stackexchange, I found this to be closes one.

There are multiple great resources to use as inspiration for slidedeck about SRE principles [SRE slides].

Still I can't find :

  • short



  • concise



  • examples



  • motivating spending resources to implement SRE in organisation.



Most what I experienced in my professional life were highly confidential cases and numbers. I am concerned that most numbers that SREs know, are to remain "internal" to be presented internally within corporations.

However, maybe you know some study, (preferably set of) nice examples of post-morthems (even one by one is good), from which we could make a strong arguments like "after introducing SRE model into organisation velocity of changes grown from n to m release pushes per x, with increase of availability by y and decrease of costs by z" (brainstorming) or other hard data points?

[SRE slides] - some examples:

  • Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy Webinar) by ITSM Academy, Inc.



  • SRE From Scratch by


Grier Johnson, Platform Engineer at Square

  • GOTO 2017 • Site Reliability Engineering at Google • Christof Leng



P.S. If this question could be rephrased to fit better into this site guidelines, please provide me a suggestion in comment and give me a change to improve. Otherwise, I will appreciate other better platforms (However e.g. reddit.com/r/sre did not make great impression to me)

Solution

The types of numbers you're looking for might be hard to come across, because they're highly variable (even within one organization, it varies service-to-service and team-to-team, in my experience.) The SRE Workbook is now available for free, and includes two case studies (chapter 3) that might be helpful. Also, New Relic's SRE eBook does a really good job of summarizing SRE in a concise way.

Another way to approach this would be to try to use what you know about your service today to create a risk assessment and estimate downtime you can prevent if you had SRE and dev support to eliminate those risks

Context

StackExchange DevOps Q#5025, answer score: 3

Revisions (0)

No revisions yet.