1
00:00:12,080 --> 00:00:15,440
welcome back everyone to the devops

2
00:00:13,679 --> 00:00:19,039
track here at pycon

3
00:00:15,440 --> 00:00:21,439
pike online 2021 it's uh it's wonderful

4
00:00:19,039 --> 00:00:24,160
to have you all here and uh we welcome

5
00:00:21,439 --> 00:00:25,760
our next speaker with a tremendous talk

6
00:00:24,160 --> 00:00:27,599
it's molly rowe who is here to talk

7
00:00:25,760 --> 00:00:29,039
about metrics for good

8
00:00:27,599 --> 00:00:32,239
not evil

9
00:00:29,039 --> 00:00:33,680
molly over to you

10
00:00:32,239 --> 00:00:36,399
thank you

11
00:00:33,680 --> 00:00:38,000
hey there everyone my name is molly i'm

12
00:00:36,399 --> 00:00:39,680
here today to talk to you about metrics

13
00:00:38,000 --> 00:00:40,960
for good not evil thank you for coming

14
00:00:39,680 --> 00:00:43,520
to my talk

15
00:00:40,960 --> 00:00:46,000
um so i'm the head of people and culture

16
00:00:43,520 --> 00:00:47,280
for record point i have a pretty weird

17
00:00:46,000 --> 00:00:48,640
background

18
00:00:47,280 --> 00:00:50,160
particularly compared to a lot of the

19
00:00:48,640 --> 00:00:51,440
other people who are talking at pycon

20
00:00:50,160 --> 00:00:53,920
this weekend

21
00:00:51,440 --> 00:00:56,239
um i've been a scientist a project

22
00:00:53,920 --> 00:00:59,280
manager a scrum lead a hybrid cloud

23
00:00:56,239 --> 00:01:02,000
consultant a recruiter a manager and now

24
00:00:59,280 --> 00:01:02,960
this head of people and culture role

25
00:01:02,000 --> 00:01:04,960
um

26
00:01:02,960 --> 00:01:06,560
i've spent the last seven or so years

27
00:01:04,960 --> 00:01:09,040
trying to solve the pain points that

28
00:01:06,560 --> 00:01:11,360
plague so many tech companies

29
00:01:09,040 --> 00:01:13,840
using people-centric solutions

30
00:01:11,360 --> 00:01:16,799
so a combination of tooling technology

31
00:01:13,840 --> 00:01:19,360
metrics and empathy

32
00:01:16,799 --> 00:01:21,759
over the last seven years i've kind of

33
00:01:19,360 --> 00:01:24,840
run about almost a thousand interviews

34
00:01:21,759 --> 00:01:27,360
globally for engineering sre and devops

35
00:01:24,840 --> 00:01:29,119
roles um and

36
00:01:27,360 --> 00:01:30,720
in every single one of those i've been

37
00:01:29,119 --> 00:01:33,040
asking the candidates

38
00:01:30,720 --> 00:01:35,119
what kind of company do you want to work

39
00:01:33,040 --> 00:01:37,759
for and then taking that data back and

40
00:01:35,119 --> 00:01:40,000
trying to make that a reality

41
00:01:37,759 --> 00:01:42,159
so i kind of use this constant influx of

42
00:01:40,000 --> 00:01:43,759
user feedback to iterate on my idea of

43
00:01:42,159 --> 00:01:45,759
what engineers want

44
00:01:43,759 --> 00:01:47,840
and what businesses need

45
00:01:45,759 --> 00:01:51,280
and the complexity of the problem that

46
00:01:47,840 --> 00:01:51,280
separates those two things

47
00:01:52,159 --> 00:01:56,079
and kind of off the back of that i've

48
00:01:54,000 --> 00:01:58,000
built this role at record point

49
00:01:56,079 --> 00:02:00,079
for any of us who any of those who

50
00:01:58,000 --> 00:02:01,119
haven't heard of us we're a software as

51
00:02:00,079 --> 00:02:03,040
a service

52
00:02:01,119 --> 00:02:05,680
software engineering company

53
00:02:03,040 --> 00:02:07,439
um that produces a federated information

54
00:02:05,680 --> 00:02:09,599
management product

55
00:02:07,439 --> 00:02:11,360
we try to reduce the cost and the risk

56
00:02:09,599 --> 00:02:12,560
and the manual effort associated with

57
00:02:11,360 --> 00:02:16,319
going through

58
00:02:12,560 --> 00:02:18,640
data and records management audits

59
00:02:16,319 --> 00:02:21,599
we're a scale-up of about 75 people

60
00:02:18,640 --> 00:02:24,000
globally and 40 to 45 of those are

61
00:02:21,599 --> 00:02:26,319
software engineers and sres

62
00:02:24,000 --> 00:02:27,760
this means that a big chunk of my staff

63
00:02:26,319 --> 00:02:30,959
are engineers

64
00:02:27,760 --> 00:02:32,319
and that plays a big role in my employee

65
00:02:30,959 --> 00:02:34,879
engagement

66
00:02:32,319 --> 00:02:37,360
the overall culture of the business and

67
00:02:34,879 --> 00:02:39,760
our retention rates so everything that i

68
00:02:37,360 --> 00:02:41,840
can do to improve the processes and

69
00:02:39,760 --> 00:02:43,599
remove the pain points and improve the

70
00:02:41,840 --> 00:02:45,599
developer experience

71
00:02:43,599 --> 00:02:47,599
uh is a benefit to

72
00:02:45,599 --> 00:02:51,360
the whole business

73
00:02:47,599 --> 00:02:52,160
which is why i care about slos

74
00:02:51,360 --> 00:02:54,239
so

75
00:02:52,160 --> 00:02:56,239
rekka point has

76
00:02:54,239 --> 00:02:58,000
gone through over the past several years

77
00:02:56,239 --> 00:02:59,760
a pretty common experience that i've

78
00:02:58,000 --> 00:03:01,440
seen across multiple different software

79
00:02:59,760 --> 00:03:03,840
engineering companies

80
00:03:01,440 --> 00:03:05,840
where we've had a shifting power balance

81
00:03:03,840 --> 00:03:07,920
between our engineering department and

82
00:03:05,840 --> 00:03:09,840
our product department

83
00:03:07,920 --> 00:03:11,040
we've had leadership transitions that

84
00:03:09,840 --> 00:03:13,599
have meant that there's been kind of

85
00:03:11,040 --> 00:03:14,560
rapid changes in technical direction as

86
00:03:13,599 --> 00:03:17,200
well as

87
00:03:14,560 --> 00:03:19,200
changes in overall delivery focus

88
00:03:17,200 --> 00:03:21,440
and even though both sides of that coin

89
00:03:19,200 --> 00:03:24,080
both product and engineering have had

90
00:03:21,440 --> 00:03:26,400
their time in the sun had their control

91
00:03:24,080 --> 00:03:26,400
um

92
00:03:26,879 --> 00:03:30,879
it hasn't really benefited anyone

93
00:03:29,120 --> 00:03:32,959
so until

94
00:03:30,879 --> 00:03:35,040
you find a balanced approach where those

95
00:03:32,959 --> 00:03:38,080
two factions are working if not in

96
00:03:35,040 --> 00:03:40,400
harmony at least in concert

97
00:03:38,080 --> 00:03:41,440
you're going to have struggles

98
00:03:40,400 --> 00:03:43,120
so

99
00:03:41,440 --> 00:03:44,640
i kind of went out to the market and

100
00:03:43,120 --> 00:03:47,040
looked for

101
00:03:44,640 --> 00:03:48,879
a magic wand for how to solve this

102
00:03:47,040 --> 00:03:50,400
opposition nature that you see in so

103
00:03:48,879 --> 00:03:52,560
many software engineering businesses

104
00:03:50,400 --> 00:03:53,920
between product and engineering

105
00:03:52,560 --> 00:03:55,200
uh and i want to show where i can find

106
00:03:53,920 --> 00:03:57,840
one

107
00:03:55,200 --> 00:04:00,560
and like most things in the life

108
00:03:57,840 --> 00:04:02,239
i found that in google in this case

109
00:04:00,560 --> 00:04:04,239
actually in google

110
00:04:02,239 --> 00:04:06,159
so google's development of the sre

111
00:04:04,239 --> 00:04:09,920
handbook and the incredible

112
00:04:06,159 --> 00:04:12,319
documentation around slos slas

113
00:04:09,920 --> 00:04:14,080
and slis has kind of led me down this

114
00:04:12,319 --> 00:04:16,400
rabbit hole of telemetry and

115
00:04:14,080 --> 00:04:18,079
visualization and metrics

116
00:04:16,400 --> 00:04:20,160
that have taught me about finding a

117
00:04:18,079 --> 00:04:21,440
common language of care

118
00:04:20,160 --> 00:04:22,960
within

119
00:04:21,440 --> 00:04:26,240
that dynamic

120
00:04:22,960 --> 00:04:26,240
and the care for the user

121
00:04:27,040 --> 00:04:29,360
so

122
00:04:27,759 --> 00:04:31,280
a common complaint that i come across

123
00:04:29,360 --> 00:04:32,720
from engineers is that product

124
00:04:31,280 --> 00:04:35,280
management are changing the priorities

125
00:04:32,720 --> 00:04:37,840
too often and work never gets fully

126
00:04:35,280 --> 00:04:39,759
completed or the work that gets time of

127
00:04:37,840 --> 00:04:41,600
day is always feature development work

128
00:04:39,759 --> 00:04:43,600
and never platform or systems

129
00:04:41,600 --> 00:04:45,360
improvements

130
00:04:43,600 --> 00:04:47,600
and the reciprocal complaint comes out

131
00:04:45,360 --> 00:04:50,240
of product you know we're not delivering

132
00:04:47,600 --> 00:04:53,360
client value services being interrupted

133
00:04:50,240 --> 00:04:55,600
by issues and bugs and crappy code and

134
00:04:53,360 --> 00:04:57,680
unreliable platforms

135
00:04:55,600 --> 00:05:00,160
and this rhetoric

136
00:04:57,680 --> 00:05:01,919
justified or not from both sides is kind

137
00:05:00,160 --> 00:05:04,320
of founded in the fact that those teams

138
00:05:01,919 --> 00:05:06,400
are incentivized differently

139
00:05:04,320 --> 00:05:08,639
product is often measured on

140
00:05:06,400 --> 00:05:10,639
delivery of skus

141
00:05:08,639 --> 00:05:13,039
rather than overall customer experience

142
00:05:10,639 --> 00:05:14,880
and engineers measured on so many things

143
00:05:13,039 --> 00:05:17,280
right features shipped deployment

144
00:05:14,880 --> 00:05:19,440
frequency velocity

145
00:05:17,280 --> 00:05:21,120
and this really often leaves us already

146
00:05:19,440 --> 00:05:22,720
holding the bag

147
00:05:21,120 --> 00:05:24,479
they're responsible for the platform

148
00:05:22,720 --> 00:05:26,240
availability and the latency and the

149
00:05:24,479 --> 00:05:27,919
mean time to resolve whenever something

150
00:05:26,240 --> 00:05:29,919
goes wrong

151
00:05:27,919 --> 00:05:32,400
but all of those metrics are heavily

152
00:05:29,919 --> 00:05:33,919
influenced by not only

153
00:05:32,400 --> 00:05:37,280
the priorities that come out of

154
00:05:33,919 --> 00:05:40,800
engineering and and product but also the

155
00:05:37,280 --> 00:05:42,240
work that's done by those two teams

156
00:05:40,800 --> 00:05:44,240
and this is where the power of data

157
00:05:42,240 --> 00:05:46,720
comes in and this is where the role of

158
00:05:44,240 --> 00:05:48,560
sre into the future comes in

159
00:05:46,720 --> 00:05:51,280
because sre are holding the keys to the

160
00:05:48,560 --> 00:05:53,840
kingdom when it comes to

161
00:05:51,280 --> 00:05:57,360
data and being able to create a common

162
00:05:53,840 --> 00:05:58,160
language between those two factions

163
00:05:57,360 --> 00:05:59,759
um

164
00:05:58,160 --> 00:06:01,199
for some organizations

165
00:05:59,759 --> 00:06:03,199
uh those keys are still under

166
00:06:01,199 --> 00:06:05,280
construction and that's okay because all

167
00:06:03,199 --> 00:06:06,880
the raw materials are still there

168
00:06:05,280 --> 00:06:09,520
all of the raw materials and all of the

169
00:06:06,880 --> 00:06:10,560
raw data exists within your services

170
00:06:09,520 --> 00:06:14,560
today

171
00:06:10,560 --> 00:06:14,560
and the future is about how you use it

172
00:06:15,120 --> 00:06:19,039
i personally have

173
00:06:17,280 --> 00:06:21,199
quite an obsession with metrics and data

174
00:06:19,039 --> 00:06:22,960
in general and a belief that until

175
00:06:21,199 --> 00:06:24,319
something's measured it can't really be

176
00:06:22,960 --> 00:06:26,639
improved

177
00:06:24,319 --> 00:06:27,600
at least not on purpose

178
00:06:26,639 --> 00:06:28,560
um

179
00:06:27,600 --> 00:06:30,800
generally

180
00:06:28,560 --> 00:06:33,199
and look i'll admit this talking about

181
00:06:30,800 --> 00:06:34,880
metrics will lose the attention of the

182
00:06:33,199 --> 00:06:37,280
audience and cause people's eyes to

183
00:06:34,880 --> 00:06:38,240
glaze over i know that

184
00:06:37,280 --> 00:06:40,240
but

185
00:06:38,240 --> 00:06:42,880
it's actually because most people have a

186
00:06:40,240 --> 00:06:44,720
pretty crappy experience with metrics

187
00:06:42,880 --> 00:06:47,199
almost all of us have sat down in a

188
00:06:44,720 --> 00:06:48,960
meeting or a retro and gone through the

189
00:06:47,199 --> 00:06:50,160
data and gone

190
00:06:48,960 --> 00:06:52,479
yeah

191
00:06:50,160 --> 00:06:54,160
but that's not right because

192
00:06:52,479 --> 00:06:55,120
x

193
00:06:54,160 --> 00:06:56,720
and

194
00:06:55,120 --> 00:06:59,680
going we all end up going down this

195
00:06:56,720 --> 00:07:02,080
justification or explanation pathway of

196
00:06:59,680 --> 00:07:05,120
why that data is incorrect that even to

197
00:07:02,080 --> 00:07:07,440
our own ears feels like excuses

198
00:07:05,120 --> 00:07:09,360
and that's not our fault

199
00:07:07,440 --> 00:07:12,160
this is because in general people are

200
00:07:09,360 --> 00:07:14,319
really terrible at setting metrics and

201
00:07:12,160 --> 00:07:16,400
we often default to study metrics that

202
00:07:14,319 --> 00:07:17,599
address the symptom of a problem rather

203
00:07:16,400 --> 00:07:20,400
than a root

204
00:07:17,599 --> 00:07:20,400
a root cause

205
00:07:20,479 --> 00:07:24,960
um so let me tell you a quick story

206
00:07:22,319 --> 00:07:27,039
about metrics gone wrong

207
00:07:24,960 --> 00:07:29,280
back in the days of colonially occupied

208
00:07:27,039 --> 00:07:30,720
india the british governor was concerned

209
00:07:29,280 --> 00:07:32,160
with the number of venomous cobras in

210
00:07:30,720 --> 00:07:33,919
the streets of delhi

211
00:07:32,160 --> 00:07:36,800
he decided to implement a scheme

212
00:07:33,919 --> 00:07:39,680
offering cash rewards for cobra heads so

213
00:07:36,800 --> 00:07:41,280
basically he equated dead snakes equals

214
00:07:39,680 --> 00:07:42,319
less snakes in the street

215
00:07:41,280 --> 00:07:45,440
seems

216
00:07:42,319 --> 00:07:47,199
okay on the surface um

217
00:07:45,440 --> 00:07:49,919
initially the scheme seemed to be

218
00:07:47,199 --> 00:07:51,599
successful they were redeeming

219
00:07:49,919 --> 00:07:52,800
or lots of people were redeeming the

220
00:07:51,599 --> 00:07:55,199
cash prize

221
00:07:52,800 --> 00:07:57,280
um and you know assuming that this

222
00:07:55,199 --> 00:07:58,560
continues on track the number of snakes

223
00:07:57,280 --> 00:08:00,560
in the street was

224
00:07:58,560 --> 00:08:03,840
you know surely going to decrease

225
00:08:00,560 --> 00:08:06,879
however over time that didn't happen

226
00:08:03,840 --> 00:08:09,120
and they were trying to figure out why

227
00:08:06,879 --> 00:08:12,080
so upon investigation

228
00:08:09,120 --> 00:08:14,560
what they found was that some very

229
00:08:12,080 --> 00:08:16,560
innovative individuals and then all of

230
00:08:14,560 --> 00:08:20,319
their enterprising neighbors

231
00:08:16,560 --> 00:08:22,000
had started farming cobras

232
00:08:20,319 --> 00:08:22,879
yeah

233
00:08:22,000 --> 00:08:24,960
they

234
00:08:22,879 --> 00:08:26,400
basically started farming these cobras

235
00:08:24,960 --> 00:08:29,360
with the full intent to kill them and

236
00:08:26,400 --> 00:08:31,680
then redeem them for the cash prize

237
00:08:29,360 --> 00:08:33,120
in a rage the governor canceled the

238
00:08:31,680 --> 00:08:34,880
scheme and said this is not in the

239
00:08:33,120 --> 00:08:36,479
spirit of what we intended i was trying

240
00:08:34,880 --> 00:08:39,039
to protect you

241
00:08:36,479 --> 00:08:41,760
no more no more money for snakes

242
00:08:39,039 --> 00:08:43,919
and so what you then have is a populace

243
00:08:41,760 --> 00:08:45,200
who have snake farms in their houses and

244
00:08:43,919 --> 00:08:47,440
backyards

245
00:08:45,200 --> 00:08:49,360
who no longer have an incentive to do so

246
00:08:47,440 --> 00:08:51,600
to keep those snakes

247
00:08:49,360 --> 00:08:53,040
and so they did the easiest and most

248
00:08:51,600 --> 00:08:56,720
predictable thing

249
00:08:53,040 --> 00:08:56,720
and let the snakes go into the streets

250
00:08:56,959 --> 00:09:00,399
so

251
00:08:58,720 --> 00:09:02,399
with a well-meaning well-intentioned

252
00:09:00,399 --> 00:09:04,959
scheme to reduce the number of venomous

253
00:09:02,399 --> 00:09:07,279
cobras and protect the population

254
00:09:04,959 --> 00:09:09,440
the actual result was that by several

255
00:09:07,279 --> 00:09:13,120
orders of magnitude they increased the

256
00:09:09,440 --> 00:09:13,120
number of snakes in the streets in delhi

257
00:09:13,200 --> 00:09:17,839
this law of unintended consequences has

258
00:09:16,080 --> 00:09:20,480
come to be known as the cobra effect

259
00:09:17,839 --> 00:09:22,959
it's so eponymous with this scenario

260
00:09:20,480 --> 00:09:24,880
that i've just um described

261
00:09:22,959 --> 00:09:27,279
and you know you might be tempted to say

262
00:09:24,880 --> 00:09:29,680
look molly that was 200 years ago in

263
00:09:27,279 --> 00:09:31,519
colonial india this is not a great

264
00:09:29,680 --> 00:09:35,399
example people wouldn't do anything

265
00:09:31,519 --> 00:09:35,399
quite so stupid anymore

266
00:09:36,240 --> 00:09:39,200
well

267
00:09:37,040 --> 00:09:42,640
the most recent entry into the cobra

268
00:09:39,200 --> 00:09:45,360
effect hall of fame is uh it goes to the

269
00:09:42,640 --> 00:09:47,680
university in the usa who recently had

270
00:09:45,360 --> 00:09:50,800
to release a statement that said

271
00:09:47,680 --> 00:09:53,600
to any students who have voluntarily

272
00:09:50,800 --> 00:09:54,880
exposed themselves to covert 19 if you

273
00:09:53,600 --> 00:09:57,279
continue

274
00:09:54,880 --> 00:09:59,040
too many students who have done that you

275
00:09:57,279 --> 00:10:00,880
run the risk of being suspended or

276
00:09:59,040 --> 00:10:02,480
expelled

277
00:10:00,880 --> 00:10:04,880
they had to release this statement

278
00:10:02,480 --> 00:10:07,040
because the local plasma donation center

279
00:10:04,880 --> 00:10:09,519
had increased the money reward for

280
00:10:07,040 --> 00:10:11,760
donating plasma for anyone who had

281
00:10:09,519 --> 00:10:13,680
active covert 19 antibodies so someone

282
00:10:11,760 --> 00:10:16,880
who had had covert to more than a

283
00:10:13,680 --> 00:10:17,920
hundred dollars per donation

284
00:10:16,880 --> 00:10:19,360
the scheme was completely

285
00:10:17,920 --> 00:10:21,200
well-intentioned they were trying to

286
00:10:19,360 --> 00:10:25,279
treat people who were in critical care

287
00:10:21,200 --> 00:10:27,839
with covid which is a a use of plasma

288
00:10:25,279 --> 00:10:30,800
and instead what they created was

289
00:10:27,839 --> 00:10:33,279
more people in critical care with covert

290
00:10:30,800 --> 00:10:34,880
in that area

291
00:10:33,279 --> 00:10:37,519
so

292
00:10:34,880 --> 00:10:39,360
this is me circling back to why most

293
00:10:37,519 --> 00:10:41,279
people have a terrible traditional

294
00:10:39,360 --> 00:10:43,519
experience with metrics

295
00:10:41,279 --> 00:10:46,000
humans are pretty bad at using critical

296
00:10:43,519 --> 00:10:47,920
analysis when it comes to incentives and

297
00:10:46,000 --> 00:10:49,839
are rarely just looking for an empirical

298
00:10:47,920 --> 00:10:51,920
outcome like you should rarely be

299
00:10:49,839 --> 00:10:53,519
looking for an empirical outcome you're

300
00:10:51,920 --> 00:10:56,720
often trying to actually influence the

301
00:10:53,519 --> 00:10:59,519
behaviors that drive that outcome

302
00:10:56,720 --> 00:11:01,120
you're not really looking for snakeheads

303
00:10:59,519 --> 00:11:03,760
i hope

304
00:11:01,120 --> 00:11:05,519
you're actually looking for less snakes

305
00:11:03,760 --> 00:11:08,079
you're not really looking for more test

306
00:11:05,519 --> 00:11:10,079
coverage you're looking for better code

307
00:11:08,079 --> 00:11:12,160
so when you're proxying the behavior

308
00:11:10,079 --> 00:11:13,760
that you want with an easy to measure

309
00:11:12,160 --> 00:11:15,760
symptom

310
00:11:13,760 --> 00:11:18,320
this is a common downfall of general

311
00:11:15,760 --> 00:11:18,320
metrics

312
00:11:19,760 --> 00:11:23,040
it's pretty much just a case of be

313
00:11:21,360 --> 00:11:24,959
careful what you wish for

314
00:11:23,040 --> 00:11:26,720
um there's a couple of easy ways to

315
00:11:24,959 --> 00:11:29,279
counteract a lot of these pitfalls that

316
00:11:26,720 --> 00:11:30,399
go beyond imagine how you would game the

317
00:11:29,279 --> 00:11:32,079
system

318
00:11:30,399 --> 00:11:33,680
because people are infinitely

319
00:11:32,079 --> 00:11:37,040
resourceful particularly when you

320
00:11:33,680 --> 00:11:37,040
incentivize them to be so

321
00:11:37,120 --> 00:11:40,800
pretty much everyone should be familiar

322
00:11:38,560 --> 00:11:43,040
with this cost speed quality trade-off

323
00:11:40,800 --> 00:11:45,120
triangle and this is actually a great

324
00:11:43,040 --> 00:11:46,399
way to build metrics

325
00:11:45,120 --> 00:11:49,120
because you should have metrics that

326
00:11:46,399 --> 00:11:51,200
contradict each other

327
00:11:49,120 --> 00:11:53,440
what that allows you to do is create a

328
00:11:51,200 --> 00:11:56,639
balance where

329
00:11:53,440 --> 00:11:58,959
to exceed or to gain a metric or or to

330
00:11:56,639 --> 00:12:01,040
push one metric really hard you're going

331
00:11:58,959 --> 00:12:02,959
to influence those other metrics to

332
00:12:01,040 --> 00:12:05,440
their detriment and therefore have an

333
00:12:02,959 --> 00:12:08,240
overall worse outcome than if you hadn't

334
00:12:05,440 --> 00:12:10,320
like tried to gain that singular metric

335
00:12:08,240 --> 00:12:12,800
a really great example of this is nicole

336
00:12:10,320 --> 00:12:14,639
fourgrin's uh forsgren's metrics for

337
00:12:12,800 --> 00:12:15,920
high performing teams

338
00:12:14,639 --> 00:12:17,680
she outlines these in her book

339
00:12:15,920 --> 00:12:19,200
accelerate which i highly recommend you

340
00:12:17,680 --> 00:12:21,920
read

341
00:12:19,200 --> 00:12:24,160
but basically the premise is that

342
00:12:21,920 --> 00:12:26,720
lead time deployment frequency change

343
00:12:24,160 --> 00:12:28,720
fail ratio and mean time to resolve are

344
00:12:26,720 --> 00:12:31,120
metrics that are indicators of high

345
00:12:28,720 --> 00:12:32,800
performing teams

346
00:12:31,120 --> 00:12:35,279
lead time and deployment frequency

347
00:12:32,800 --> 00:12:37,440
obviously correlate to speed and change

348
00:12:35,279 --> 00:12:41,839
fail ratio and mean time to resolve

349
00:12:37,440 --> 00:12:41,839
obviously correlate to quality or um

350
00:12:42,959 --> 00:12:47,440
reliability sorry thank you um

351
00:12:46,079 --> 00:12:48,880
and so

352
00:12:47,440 --> 00:12:51,120
you know if you're trying to optimize

353
00:12:48,880 --> 00:12:52,959
for your change fail ratio and you know

354
00:12:51,120 --> 00:12:55,200
you're trying to push out perfect code

355
00:12:52,959 --> 00:12:56,560
that has no bugs you're going to impact

356
00:12:55,200 --> 00:12:59,680
your lead time and your deployment

357
00:12:56,560 --> 00:13:01,680
frequency negatively

358
00:12:59,680 --> 00:13:03,839
these are the kind of principles that

359
00:13:01,680 --> 00:13:05,440
led me to the investigation of slos

360
00:13:03,839 --> 00:13:07,600
right the new problem is how do we

361
00:13:05,440 --> 00:13:10,000
define metrics that encompass the whole

362
00:13:07,600 --> 00:13:11,920
product and incentivize good behaviors

363
00:13:10,000 --> 00:13:13,360
between competing elements

364
00:13:11,920 --> 00:13:16,000
it starts with kind of finding that

365
00:13:13,360 --> 00:13:17,920
balance of power between two factions

366
00:13:16,000 --> 00:13:20,320
so no longer talking about speed and

367
00:13:17,920 --> 00:13:22,880
quality or reliability but now talking

368
00:13:20,320 --> 00:13:24,240
about product and engineering as the

369
00:13:22,880 --> 00:13:28,000
business

370
00:13:24,240 --> 00:13:28,000
kind of equivalence of those things

371
00:13:28,480 --> 00:13:31,920
the thing that's often

372
00:13:30,079 --> 00:13:34,480
kind of missing when you do have two

373
00:13:31,920 --> 00:13:36,800
powerful factions vying for control

374
00:13:34,480 --> 00:13:38,720
is an impartial arbiter particularly

375
00:13:36,800 --> 00:13:40,240
when those factions have different

376
00:13:38,720 --> 00:13:41,519
objectives and they're being measured

377
00:13:40,240 --> 00:13:44,800
differently

378
00:13:41,519 --> 00:13:45,519
so the role of sre that's where we come

379
00:13:44,800 --> 00:13:47,279
in

380
00:13:45,519 --> 00:13:50,399
so sre

381
00:13:47,279 --> 00:13:52,880
owning the data across the business and

382
00:13:50,399 --> 00:13:54,399
being able to provide a common language

383
00:13:52,880 --> 00:13:56,560
and a common

384
00:13:54,399 --> 00:13:57,760
thread of discussion between those two

385
00:13:56,560 --> 00:14:00,240
factions

386
00:13:57,760 --> 00:14:02,560
with data to back it up becomes a new

387
00:14:00,240 --> 00:14:05,440
way forward

388
00:14:02,560 --> 00:14:07,519
the role of sre is defined by google

389
00:14:05,440 --> 00:14:09,519
is responsible for so many pieces of

390
00:14:07,519 --> 00:14:11,680
data so they're responsible for the

391
00:14:09,519 --> 00:14:13,680
availability the latency the performance

392
00:14:11,680 --> 00:14:16,000
the efficiency the change management

393
00:14:13,680 --> 00:14:17,120
monitoring emergency response capacity

394
00:14:16,000 --> 00:14:19,120
planning

395
00:14:17,120 --> 00:14:21,600
of all of their services

396
00:14:19,120 --> 00:14:23,680
and this places sre in this pivotal area

397
00:14:21,600 --> 00:14:26,720
of control as the provider and the

398
00:14:23,680 --> 00:14:26,720
purveyor of data

399
00:14:26,959 --> 00:14:32,720
if we rephrase this balance again but

400
00:14:30,000 --> 00:14:34,880
now in terms of sre as the middleman

401
00:14:32,720 --> 00:14:37,279
providing a common language that impacts

402
00:14:34,880 --> 00:14:39,440
both product and engineering we're now

403
00:14:37,279 --> 00:14:40,560
talking about speed and reliability

404
00:14:39,440 --> 00:14:41,519
again

405
00:14:40,560 --> 00:14:43,760
but

406
00:14:41,519 --> 00:14:45,680
where we have slos

407
00:14:43,760 --> 00:14:47,360
or any kind of metric to come in and

408
00:14:45,680 --> 00:14:49,920
bridge the gap and provide a

409
00:14:47,360 --> 00:14:52,320
quantifiable answer to what is important

410
00:14:49,920 --> 00:14:54,720
at any given time and why

411
00:14:52,320 --> 00:14:58,079
as well as that common language to how

412
00:14:54,720 --> 00:14:59,279
to productively discuss it in a way that

413
00:14:58,079 --> 00:15:02,160
we're bringing

414
00:14:59,279 --> 00:15:05,440
logic and data into what is often a very

415
00:15:02,160 --> 00:15:05,440
emotional exchange

416
00:15:07,040 --> 00:15:10,720
so

417
00:15:08,560 --> 00:15:13,600
expectations from users around features

418
00:15:10,720 --> 00:15:16,480
reliability availability security and

419
00:15:13,600 --> 00:15:17,760
quality are all increasing exponentially

420
00:15:16,480 --> 00:15:19,920
right

421
00:15:17,760 --> 00:15:21,760
but most organizations are making

422
00:15:19,920 --> 00:15:24,160
trade-offs and are not really set up to

423
00:15:21,760 --> 00:15:26,639
deliver on all of these vectors which is

424
00:15:24,160 --> 00:15:28,399
the speed and reliability compromise

425
00:15:26,639 --> 00:15:31,040
that we've been talking about because it

426
00:15:28,399 --> 00:15:32,560
really is fundamentally underlying the

427
00:15:31,040 --> 00:15:33,839
the conflict between engineering and

428
00:15:32,560 --> 00:15:36,000
product

429
00:15:33,839 --> 00:15:38,320
on one axis there's this desire and need

430
00:15:36,000 --> 00:15:40,959
to have rock solid stability

431
00:15:38,320 --> 00:15:43,360
um and reliability but the challenge is

432
00:15:40,959 --> 00:15:44,880
if that's what you're optimizing for

433
00:15:43,360 --> 00:15:46,160
you're not innovating to your full

434
00:15:44,880 --> 00:15:47,920
potential

435
00:15:46,160 --> 00:15:50,399
and you're going to have issues within

436
00:15:47,920 --> 00:15:52,000
the market if not now in the future

437
00:15:50,399 --> 00:15:53,920
and on the other hand you can't spend

438
00:15:52,000 --> 00:15:56,639
all of your time pushing for feature

439
00:15:53,920 --> 00:15:58,720
delivery without regard to stability or

440
00:15:56,639 --> 00:16:00,800
you'll rapidly accrue risk and technical

441
00:15:58,720 --> 00:16:03,440
debt and potentially churn your existing

442
00:16:00,800 --> 00:16:03,440
customers

443
00:16:04,800 --> 00:16:07,920
so

444
00:16:06,240 --> 00:16:09,920
what i haven't defined for you and i

445
00:16:07,920 --> 00:16:12,880
want to touch on really briefly is what

446
00:16:09,920 --> 00:16:14,560
are slas slos and slis

447
00:16:12,880 --> 00:16:17,600
this all comes straight from the google

448
00:16:14,560 --> 00:16:18,880
handbook sre handbook please i encourage

449
00:16:17,600 --> 00:16:21,759
you to read it

450
00:16:18,880 --> 00:16:24,720
it is much more thrilling than it sounds

451
00:16:21,759 --> 00:16:27,120
but slos slis and slas are exclusively

452
00:16:24,720 --> 00:16:29,600
used as metrics that capture parts of

453
00:16:27,120 --> 00:16:31,920
your user journey such as availability

454
00:16:29,600 --> 00:16:34,240
or request latency or throughput or

455
00:16:31,920 --> 00:16:36,399
error rate they give you a metric for

456
00:16:34,240 --> 00:16:38,639
both sides of that coin

457
00:16:36,399 --> 00:16:40,959
um as like the queen of product versus

458
00:16:38,639 --> 00:16:43,279
engineering if you assume that both

459
00:16:40,959 --> 00:16:44,240
engineering and product care about your

460
00:16:43,279 --> 00:16:47,199
user

461
00:16:44,240 --> 00:16:47,199
which i hope they do

462
00:16:47,440 --> 00:16:51,360
so your sla is your service level

463
00:16:49,279 --> 00:16:53,600
agreement it's your external metric to

464
00:16:51,360 --> 00:16:56,000
which your business has committed to

465
00:16:53,600 --> 00:16:57,839
legally and with generally a monetary

466
00:16:56,000 --> 00:17:00,079
obligation to meet

467
00:16:57,839 --> 00:17:02,720
this is often something

468
00:17:00,079 --> 00:17:04,640
reasonable but after which it's the kind

469
00:17:02,720 --> 00:17:05,760
of the baseline after which your clients

470
00:17:04,640 --> 00:17:08,480
are

471
00:17:05,760 --> 00:17:09,360
not happy with their service

472
00:17:08,480 --> 00:17:12,240
um

473
00:17:09,360 --> 00:17:14,959
your slo is your service level objective

474
00:17:12,240 --> 00:17:16,640
it is the internal target for the metric

475
00:17:14,959 --> 00:17:18,400
that you're measuring that should

476
00:17:16,640 --> 00:17:20,880
represent

477
00:17:18,400 --> 00:17:23,760
where your client starts to feel pain it

478
00:17:20,880 --> 00:17:26,480
it represents your optimal point where

479
00:17:23,760 --> 00:17:27,839
you want to hit from a reliability

480
00:17:26,480 --> 00:17:31,039
versus speed perspective or a

481
00:17:27,839 --> 00:17:32,720
reliability versus risk perspective

482
00:17:31,039 --> 00:17:35,120
and your sli is your service level

483
00:17:32,720 --> 00:17:36,559
indicator it's the it's the measure of

484
00:17:35,120 --> 00:17:37,600
service reliability it's what you're

485
00:17:36,559 --> 00:17:39,120
measuring

486
00:17:37,600 --> 00:17:40,960
um

487
00:17:39,120 --> 00:17:42,240
slis will tell you that something is

488
00:17:40,960 --> 00:17:46,919
wrong and you need to use all of your

489
00:17:42,240 --> 00:17:46,919
other metrics to figure out what that is

490
00:17:48,000 --> 00:17:51,039
so

491
00:17:49,360 --> 00:17:52,320
this is a good depiction of how this

492
00:17:51,039 --> 00:17:54,160
works right

493
00:17:52,320 --> 00:17:56,320
so

494
00:17:54,160 --> 00:17:58,320
your agreement is just enough to stop

495
00:17:56,320 --> 00:18:00,320
your customer being unhappy or leaving

496
00:17:58,320 --> 00:18:02,799
or churning that's what they've agreed

497
00:18:00,320 --> 00:18:05,039
to as an acceptable level

498
00:18:02,799 --> 00:18:07,039
your objective your slo has to be

499
00:18:05,039 --> 00:18:08,799
tighter than that agreement and it

500
00:18:07,039 --> 00:18:10,559
should represent your desired user

501
00:18:08,799 --> 00:18:12,480
experience

502
00:18:10,559 --> 00:18:14,480
breaching your objective has to have

503
00:18:12,480 --> 00:18:16,799
consequences as well you shouldn't just

504
00:18:14,480 --> 00:18:18,880
be only if we breach the sla is there a

505
00:18:16,799 --> 00:18:21,120
problem breaching your objective is

506
00:18:18,880 --> 00:18:22,240
where you need to be able to leverage

507
00:18:21,120 --> 00:18:24,240
that data

508
00:18:22,240 --> 00:18:25,919
within the business to change priorities

509
00:18:24,240 --> 00:18:27,120
of what's being worked on because your

510
00:18:25,919 --> 00:18:28,400
hand you're heading in the wrong

511
00:18:27,120 --> 00:18:30,160
direction

512
00:18:28,400 --> 00:18:33,200
and it allows you to be proactive before

513
00:18:30,160 --> 00:18:36,799
there is a monetary problem

514
00:18:33,200 --> 00:18:39,600
where you're impacting your sla

515
00:18:36,799 --> 00:18:40,960
anything above your slo means that

516
00:18:39,600 --> 00:18:43,039
you're spending

517
00:18:40,960 --> 00:18:45,039
too much time on reliability and you're

518
00:18:43,039 --> 00:18:46,880
wasting effort that could be used to

519
00:18:45,039 --> 00:18:50,080
deliver features

520
00:18:46,880 --> 00:18:52,799
so anything better than your sl slo is

521
00:18:50,080 --> 00:18:52,799
wasted effort

522
00:18:54,320 --> 00:18:58,720
very quickly about error budgets error

523
00:18:56,160 --> 00:19:00,559
budgets is the next stage in this

524
00:18:58,720 --> 00:19:03,280
scenario where an error budget is

525
00:19:00,559 --> 00:19:04,799
monitoring your slo over time

526
00:19:03,280 --> 00:19:08,400
if your slo

527
00:19:04,799 --> 00:19:10,400
as in the previous slide is 99.95

528
00:19:08,400 --> 00:19:12,160
availability then

529
00:19:10,400 --> 00:19:15,520
your error budget would be 1 minus your

530
00:19:12,160 --> 00:19:18,160
slo 0.05

531
00:19:15,520 --> 00:19:19,120
this means that if you map that out over

532
00:19:18,160 --> 00:19:21,600
the month

533
00:19:19,120 --> 00:19:24,480
0.05 of the minutes in the month gives

534
00:19:21,600 --> 00:19:26,080
you 22 minutes

535
00:19:24,480 --> 00:19:28,080
sre practices encourage you to

536
00:19:26,080 --> 00:19:30,960
strategically burn that budget to zero

537
00:19:28,080 --> 00:19:33,200
on purpose to do things like

538
00:19:30,960 --> 00:19:34,799
deliver new features run expected

539
00:19:33,200 --> 00:19:37,120
systems changes

540
00:19:34,799 --> 00:19:39,840
use planned downtime or just do a

541
00:19:37,120 --> 00:19:41,760
slightly risky experiment it means that

542
00:19:39,840 --> 00:19:43,840
if you're hitting or using your error

543
00:19:41,760 --> 00:19:46,000
budget you're running as fast as you

544
00:19:43,840 --> 00:19:47,760
possibly can without impacting your

545
00:19:46,000 --> 00:19:49,840
availability and without impacting your

546
00:19:47,760 --> 00:19:51,280
client

547
00:19:49,840 --> 00:19:53,360
error budgets are not something you need

548
00:19:51,280 --> 00:19:55,600
to do right away if you're investigating

549
00:19:53,360 --> 00:19:58,559
slos for your own business start at the

550
00:19:55,600 --> 00:20:00,080
start start with slos and slis and

551
00:19:58,559 --> 00:20:01,280
setting those metrics and starting to

552
00:20:00,080 --> 00:20:03,360
measure them and make sure they're the

553
00:20:01,280 --> 00:20:04,799
right thing error budgets are things for

554
00:20:03,360 --> 00:20:06,640
down the track they don't have to be

555
00:20:04,799 --> 00:20:10,159
something that you start with you don't

556
00:20:06,640 --> 00:20:10,159
have to go all out right away

557
00:20:10,960 --> 00:20:16,720
so martin fowler very famously 2018 said

558
00:20:14,320 --> 00:20:18,720
evidence refutes the bimodal it

559
00:20:16,720 --> 00:20:21,120
notion that you have to choose between

560
00:20:18,720 --> 00:20:23,520
speed and stability instead speed

561
00:20:21,120 --> 00:20:25,520
depends on stability so good it practice

562
00:20:23,520 --> 00:20:27,440
gives you both

563
00:20:25,520 --> 00:20:28,960
so

564
00:20:27,440 --> 00:20:31,919
what does good i.t practice actually

565
00:20:28,960 --> 00:20:33,200
look like and how does data and slos and

566
00:20:31,919 --> 00:20:35,200
all of the things that i've talked about

567
00:20:33,200 --> 00:20:38,240
implementation of sre within a business

568
00:20:35,200 --> 00:20:40,960
how does that get you there

569
00:20:38,240 --> 00:20:44,080
using your metrics for good starts with

570
00:20:40,960 --> 00:20:46,960
what are you trying to get out of them

571
00:20:44,080 --> 00:20:48,960
so common language slos provide you with

572
00:20:46,960 --> 00:20:50,880
that common language between product and

573
00:20:48,960 --> 00:20:53,440
engineering of how do you talk about

574
00:20:50,880 --> 00:20:56,960
what is important and what is important

575
00:20:53,440 --> 00:20:56,960
should almost always be your customer

576
00:20:58,080 --> 00:21:02,559
you also now have hard data to influence

577
00:21:00,400 --> 00:21:04,559
that as i said previously like quite

578
00:21:02,559 --> 00:21:05,840
emotional decision product and

579
00:21:04,559 --> 00:21:09,039
engineering are

580
00:21:05,840 --> 00:21:10,480
often rewarded and incentivized on the

581
00:21:09,039 --> 00:21:12,320
metrics that they as individual

582
00:21:10,480 --> 00:21:15,200
departments care about rather than that

583
00:21:12,320 --> 00:21:17,919
centralized core component of the client

584
00:21:15,200 --> 00:21:19,919
so having hard data that backs up that

585
00:21:17,919 --> 00:21:22,480
work needs to happen to stabilize the

586
00:21:19,919 --> 00:21:24,480
product is really important

587
00:21:22,480 --> 00:21:26,000
it also allows you to be proactive you

588
00:21:24,480 --> 00:21:27,760
know when things are trending in a bad

589
00:21:26,000 --> 00:21:30,640
direction where you're going to breach

590
00:21:27,760 --> 00:21:32,640
an slo or an sla you also know that you

591
00:21:30,640 --> 00:21:34,880
can be proactively releasing innovative

592
00:21:32,640 --> 00:21:37,520
pieces of work because you're consuming

593
00:21:34,880 --> 00:21:38,559
your error budget to do so

594
00:21:37,520 --> 00:21:40,320
and

595
00:21:38,559 --> 00:21:42,840
the justification piece is really about

596
00:21:40,320 --> 00:21:45,600
how you have that productive

597
00:21:42,840 --> 00:21:47,360
discussion so having slos in place for

598
00:21:45,600 --> 00:21:50,000
your production services allows you to

599
00:21:47,360 --> 00:21:51,600
remove all of that emotional ambiguity

600
00:21:50,000 --> 00:21:54,320
when it comes to figuring out the impact

601
00:21:51,600 --> 00:21:56,720
of an unplanned change or outage

602
00:21:54,320 --> 00:21:59,039
um businesses often refuse to invest in

603
00:21:56,720 --> 00:22:00,720
availability or reliability until the

604
00:21:59,039 --> 00:22:02,799
bottom line's impacted

605
00:22:00,720 --> 00:22:04,480
so it's really important to have that

606
00:22:02,799 --> 00:22:06,720
data to back up what you're trying to

607
00:22:04,480 --> 00:22:06,720
say

608
00:22:07,280 --> 00:22:11,280
this is where sre comes in as a whole

609
00:22:09,280 --> 00:22:13,200
and the role of sre may change going

610
00:22:11,280 --> 00:22:16,320
forward in the future at least

611
00:22:13,200 --> 00:22:17,200
from kind of what i understand behind it

612
00:22:16,320 --> 00:22:19,280
um

613
00:22:17,200 --> 00:22:21,600
initiatives like slos can be really

614
00:22:19,280 --> 00:22:24,080
difficult to get buy-in for and the

615
00:22:21,600 --> 00:22:26,000
owner to kick off really has to be the

616
00:22:24,080 --> 00:22:28,320
sre team

617
00:22:26,000 --> 00:22:30,960
[Music]

618
00:22:28,320 --> 00:22:32,799
but it can't be them in isolation you

619
00:22:30,960 --> 00:22:34,480
know you're talking about metrics of

620
00:22:32,799 --> 00:22:36,559
what impacts the clients and what the

621
00:22:34,480 --> 00:22:38,400
clients or the customers care about

622
00:22:36,559 --> 00:22:40,080
which means that you need to be involved

623
00:22:38,400 --> 00:22:42,159
with your product owners with your

624
00:22:40,080 --> 00:22:43,919
customers themselves or with your

625
00:22:42,159 --> 00:22:45,760
customer success teams as well as your

626
00:22:43,919 --> 00:22:47,440
engineers who have to do

627
00:22:45,760 --> 00:22:49,200
the work to make sure that we're hitting

628
00:22:47,440 --> 00:22:52,000
those slos because your slos shouldn't

629
00:22:49,200 --> 00:22:55,799
be aspirational they need to be

630
00:22:52,000 --> 00:22:55,799
something that you can achieve

631
00:22:56,400 --> 00:23:01,280
there are lots of selling points behind

632
00:22:58,880 --> 00:23:03,440
slos as a program

633
00:23:01,280 --> 00:23:05,039
but in general

634
00:23:03,440 --> 00:23:06,240
the ability to surface your technical

635
00:23:05,039 --> 00:23:09,280
debt

636
00:23:06,240 --> 00:23:11,760
related to reliability or lack thereof

637
00:23:09,280 --> 00:23:14,000
means that you know you can advocate for

638
00:23:11,760 --> 00:23:15,760
what you need in terms of allocation of

639
00:23:14,000 --> 00:23:17,360
engineering resources

640
00:23:15,760 --> 00:23:19,600
you're reducing your manual effort from

641
00:23:17,360 --> 00:23:21,200
an sre perspective of generating this

642
00:23:19,600 --> 00:23:22,240
data you know what you're going to talk

643
00:23:21,200 --> 00:23:24,080
about you know what you're going to

644
00:23:22,240 --> 00:23:26,159
review you know what the metrics are and

645
00:23:24,080 --> 00:23:28,400
what we care about

646
00:23:26,159 --> 00:23:31,200
and as i said you're also reducing your

647
00:23:28,400 --> 00:23:33,120
risk around monetary risk to the

648
00:23:31,200 --> 00:23:35,280
business so this is how you sell it into

649
00:23:33,120 --> 00:23:37,520
the business is reducing that that

650
00:23:35,280 --> 00:23:39,280
monetary risk or risk of churn by

651
00:23:37,520 --> 00:23:42,320
increase improving your customer

652
00:23:39,280 --> 00:23:42,320
satisfaction rates

653
00:23:42,480 --> 00:23:46,559
um

654
00:23:44,720 --> 00:23:49,520
these are not for everyone i'm not

655
00:23:46,559 --> 00:23:52,159
trying to say that slos or slis or slas

656
00:23:49,520 --> 00:23:53,200
any of those things are for every

657
00:23:52,159 --> 00:23:55,120
business

658
00:23:53,200 --> 00:23:56,720
uh there is a level of maturity

659
00:23:55,120 --> 00:23:58,400
particularly in the businesses that have

660
00:23:56,720 --> 00:24:00,880
actually already implemented this to

661
00:23:58,400 --> 00:24:02,960
great success the googles the

662
00:24:00,880 --> 00:24:04,480
evernotes the twitters

663
00:24:02,960 --> 00:24:06,240
those guys are huge

664
00:24:04,480 --> 00:24:07,840
you don't have to start there you don't

665
00:24:06,240 --> 00:24:09,919
have to look at them and go this is this

666
00:24:07,840 --> 00:24:12,159
huge unachievable mountain

667
00:24:09,919 --> 00:24:14,480
um you know you can start with two to

668
00:24:12,159 --> 00:24:16,720
three slos and iterate and work your way

669
00:24:14,480 --> 00:24:18,400
up you can start to build out that team

670
00:24:16,720 --> 00:24:20,480
or you can build it internally as a

671
00:24:18,400 --> 00:24:22,320
proof of concept within your sre team

672
00:24:20,480 --> 00:24:24,000
and then start to use

673
00:24:22,320 --> 00:24:25,840
the quality and the relevance of the

674
00:24:24,000 --> 00:24:28,080
data that you're producing

675
00:24:25,840 --> 00:24:29,760
to get buy-in from other pieces of the

676
00:24:28,080 --> 00:24:33,360
business as well

677
00:24:29,760 --> 00:24:35,760
and drive that adoption

678
00:24:33,360 --> 00:24:37,279
failure is going to happen like this is

679
00:24:35,760 --> 00:24:39,600
the devops track you guys have been

680
00:24:37,279 --> 00:24:42,400
listening to failure and learnings and

681
00:24:39,600 --> 00:24:44,320
you know amazing triumphs through that

682
00:24:42,400 --> 00:24:46,240
all day

683
00:24:44,320 --> 00:24:47,919
as you implement these kinds of things

684
00:24:46,240 --> 00:24:49,679
failures are going to occur slos are

685
00:24:47,919 --> 00:24:52,480
going to be breached systems are made by

686
00:24:49,679 --> 00:24:54,320
humans and we've already discussed very

687
00:24:52,480 --> 00:24:55,520
in quite a bit of detail how humans are

688
00:24:54,320 --> 00:24:57,760
imperfect

689
00:24:55,520 --> 00:24:59,919
so what's important is learning from

690
00:24:57,760 --> 00:25:01,840
these and continuing to iterate on your

691
00:24:59,919 --> 00:25:03,840
slos you should be going back and

692
00:25:01,840 --> 00:25:05,039
looking at them on a regular cadence to

693
00:25:03,840 --> 00:25:06,799
make sure that they're reflecting the

694
00:25:05,039 --> 00:25:09,679
things that you want or the things that

695
00:25:06,799 --> 00:25:11,919
their clients still want

696
00:25:09,679 --> 00:25:11,919
um

697
00:25:12,000 --> 00:25:16,400
and like you have to be collaborative in

698
00:25:14,240 --> 00:25:17,679
this iteration as well you have to

699
00:25:16,400 --> 00:25:19,200
incorporate those other parts of the

700
00:25:17,679 --> 00:25:21,520
business that have those touch points

701
00:25:19,200 --> 00:25:22,880
with the clients as well as the ones

702
00:25:21,520 --> 00:25:25,200
that pay the bills

703
00:25:22,880 --> 00:25:26,480
uh to make sure that you're delivering

704
00:25:25,200 --> 00:25:27,840
not just for yourself and your

705
00:25:26,480 --> 00:25:30,240
department but also for the whole

706
00:25:27,840 --> 00:25:30,240
business

707
00:25:32,159 --> 00:25:36,880
the other piece to all of this right is

708
00:25:35,360 --> 00:25:38,080
the blameless mentality and how

709
00:25:36,880 --> 00:25:39,600
important it is

710
00:25:38,080 --> 00:25:41,760
because you're not going to get it right

711
00:25:39,600 --> 00:25:43,039
away right straight away or even

712
00:25:41,760 --> 00:25:45,520
continuously

713
00:25:43,039 --> 00:25:47,679
what the clients want is going to change

714
00:25:45,520 --> 00:25:49,679
um but nothing's about who tripped over

715
00:25:47,679 --> 00:25:51,520
the power cord but how do we stop people

716
00:25:49,679 --> 00:25:52,799
from tripping over the power cord next

717
00:25:51,520 --> 00:25:56,080
time

718
00:25:52,799 --> 00:25:58,400
slos should never ever ever be tied to

719
00:25:56,080 --> 00:26:01,279
individual performance metrics

720
00:25:58,400 --> 00:26:03,440
they need to be the goal around elixir

721
00:26:01,279 --> 00:26:05,600
the goal should always be defining more

722
00:26:03,440 --> 00:26:07,520
slos to get greater visibility and

723
00:26:05,600 --> 00:26:10,000
understanding rather than blaming teams

724
00:26:07,520 --> 00:26:12,320
for not meeting slos your slos are meant

725
00:26:10,000 --> 00:26:14,240
to be a way to empower the discussion of

726
00:26:12,320 --> 00:26:17,760
what needs to happen next and what went

727
00:26:14,240 --> 00:26:17,760
wrong and how we fix it in the future

728
00:26:19,440 --> 00:26:23,200
i just want to circle quickly back to

729
00:26:21,520 --> 00:26:25,360
martin fowler's idea of good i.t

730
00:26:23,200 --> 00:26:27,279
practices and bridging the gap between

731
00:26:25,360 --> 00:26:30,080
speed and quality

732
00:26:27,279 --> 00:26:31,600
so wrecker point my my company is still

733
00:26:30,080 --> 00:26:35,760
in its infancy when it comes to

734
00:26:31,600 --> 00:26:38,000
implementing any of these slos sli slas

735
00:26:35,760 --> 00:26:39,679
and i came to talk to you today about

736
00:26:38,000 --> 00:26:41,200
the research that i'd done in the hopes

737
00:26:39,679 --> 00:26:43,760
that other people had seen these

738
00:26:41,200 --> 00:26:45,919
problems in their own organizations and

739
00:26:43,760 --> 00:26:48,799
found this something interesting

740
00:26:45,919 --> 00:26:52,960
um as you know a way to

741
00:26:48,799 --> 00:26:55,520
bridge that gap and go forward

742
00:26:52,960 --> 00:26:57,440
um slos are a great way to start

743
00:26:55,520 --> 00:26:58,960
leveraging your existing data and

744
00:26:57,440 --> 00:27:00,720
setting metrics that really mean

745
00:26:58,960 --> 00:27:03,120
something both to your customer and to

746
00:27:00,720 --> 00:27:04,840
your product and addressing a root cause

747
00:27:03,120 --> 00:27:06,880
rather than just

748
00:27:04,840 --> 00:27:09,760
symptoms um

749
00:27:06,880 --> 00:27:11,840
and the more research that i dive into

750
00:27:09,760 --> 00:27:13,919
of this side of sre

751
00:27:11,840 --> 00:27:16,559
and its potential role as the collector

752
00:27:13,919 --> 00:27:18,640
and the curator of data and acting as a

753
00:27:16,559 --> 00:27:21,360
mediator between the two factions within

754
00:27:18,640 --> 00:27:23,279
the business means that they get to be

755
00:27:21,360 --> 00:27:25,039
an impartial driver for the great

756
00:27:23,279 --> 00:27:27,600
customer experience

757
00:27:25,039 --> 00:27:29,600
and the more i believe that even if this

758
00:27:27,600 --> 00:27:31,520
isn't the final ansel

759
00:27:29,600 --> 00:27:33,279
i think it's a really good step forward

760
00:27:31,520 --> 00:27:35,520
for technology in general and the way

761
00:27:33,279 --> 00:27:38,080
that our businesses need to continue to

762
00:27:35,520 --> 00:27:39,360
move forward

763
00:27:38,080 --> 00:27:41,919
i'm certainly not trying to say that

764
00:27:39,360 --> 00:27:43,760
slos will solve any problem or even that

765
00:27:41,919 --> 00:27:45,600
they're right for every business

766
00:27:43,760 --> 00:27:47,279
but i am trying to say that this

767
00:27:45,600 --> 00:27:49,279
framework is something that i see as a

768
00:27:47,279 --> 00:27:53,039
potential bomb for the pain that i have

769
00:27:49,279 --> 00:27:57,440
seen across so many businesses globally

770
00:27:53,039 --> 00:27:57,440
um and every day in my own business

771
00:27:58,240 --> 00:28:02,720
it's highly likely that i'll be back

772
00:28:00,000 --> 00:28:04,799
here next year for the we tried this and

773
00:28:02,720 --> 00:28:06,720
this is what we learnt uh metrics for

774
00:28:04,799 --> 00:28:08,000
good not evil part two

775
00:28:06,720 --> 00:28:09,520
and we can definitely go over the

776
00:28:08,000 --> 00:28:11,039
blooper reel then

777
00:28:09,520 --> 00:28:12,399
um but i hope that this has been a bit

778
00:28:11,039 --> 00:28:14,559
of an inspiration to do your own

779
00:28:12,399 --> 00:28:16,480
research and explore if this might be a

780
00:28:14,559 --> 00:28:18,159
good solution to your problems

781
00:28:16,480 --> 00:28:20,640
thank you so much for your time i know

782
00:28:18,159 --> 00:28:21,679
this is not a normal talk even for this

783
00:28:20,640 --> 00:28:23,039
track

784
00:28:21,679 --> 00:28:25,120
but it's something that i really wanted

785
00:28:23,039 --> 00:28:27,120
to share and if you've got your own

786
00:28:25,120 --> 00:28:29,039
horror stories or success stories for

787
00:28:27,120 --> 00:28:30,880
implementing slos in your business i'd

788
00:28:29,039 --> 00:28:32,480
love to hear them please hit me up on

789
00:28:30,880 --> 00:28:33,679
linkedin or twitter

790
00:28:32,480 --> 00:28:35,200
and uh

791
00:28:33,679 --> 00:28:37,200
if you've got nothing else out of this

792
00:28:35,200 --> 00:28:41,360
talk please remember that whatever else

793
00:28:37,200 --> 00:28:41,360
you do don't put bounties on snakeheads

794
00:28:42,559 --> 00:28:47,039
thank you dawn that has that has been a

795
00:28:44,320 --> 00:28:48,159
tremendous talk thank you um

796
00:28:47,039 --> 00:28:50,640
it's

797
00:28:48,159 --> 00:28:52,320
yes sorry molly what am i saying dawn i

798
00:28:50,640 --> 00:28:54,000
have my next talk already queued up in

799
00:28:52,320 --> 00:28:55,600
my head excellent more failure for the

800
00:28:54,000 --> 00:28:58,960
for the devops track

801
00:28:55,600 --> 00:29:00,559
um my apologies molly uh metric is close

802
00:28:58,960 --> 00:29:02,559
to my heart though i'd love everything

803
00:29:00,559 --> 00:29:04,159
that you were talking about there um

804
00:29:02,559 --> 00:29:05,840
people in the chat also really really

805
00:29:04,159 --> 00:29:08,559
got a lot out of that we have a couple

806
00:29:05,840 --> 00:29:10,480
of questions already queued up um

807
00:29:08,559 --> 00:29:12,080
so if you could jump into the chat and

808
00:29:10,480 --> 00:29:13,840
uh and talk to people about those

809
00:29:12,080 --> 00:29:16,080
questions either just in the chat for

810
00:29:13,840 --> 00:29:18,080
the the room itself or into the hallway

811
00:29:16,080 --> 00:29:20,320
track that would be superb um people

812
00:29:18,080 --> 00:29:21,919
would love to talk to you some more

813
00:29:20,320 --> 00:29:23,600
all right no worries i'll jump in now

814
00:29:21,919 --> 00:29:25,840
thank you great

815
00:29:23,600 --> 00:29:28,000
and we have another short break now um

816
00:29:25,840 --> 00:29:30,399
we'll be back in 15 minutes for our

817
00:29:28,000 --> 00:29:32,559
final talk of the day um which is by

818
00:29:30,399 --> 00:29:34,240
dawn this time getting people's names

819
00:29:32,559 --> 00:29:37,200
right is important uh which is about

820
00:29:34,240 --> 00:29:40,000
accessibility uh accessibility overlays

821
00:29:37,200 --> 00:29:41,840
a cautionary tale hmm i wonder what the

822
00:29:40,000 --> 00:29:45,039
cautionary tale will be

823
00:29:41,840 --> 00:29:46,880
and uh yes so grab a quick drink ask

824
00:29:45,039 --> 00:29:49,039
some questions about molly's talk there

825
00:29:46,880 --> 00:29:53,159
about metrics and we will see you back

826
00:29:49,039 --> 00:29:53,159
here in about 15 minutes

827
00:29:58,559 --> 00:30:00,640
you