#learningfromincidents — Public Fediverse posts on home.social

Rich Lafferty @[email protected] · 2024-10-19 · 16:12 UTC

Well this thing seems to be getting popular. Intro post(s) time! Professionally, I'm a staff reliability engineer (#SRE) at PagerDuty, where my interests lie in #LearningFromIncidents, #HumanFactors and safety. Personally... (1/2 - cat pictures to follow)

#learningfromincidents #humanfactors

Phil @[email protected] · 2024-04-16 · 07:24 UTC

Completed an 8 page incident report at work today without using or suggesting the phrase "root cause" once.
#ResilienceEngineering #LearningFromIncidents

#resilienceengineering #learningfromincidents

Fred Hebert @[email protected] · 2024-02-10 · 18:21 UTC

New #paper from Hutchinson, Dekker, Rae: How audits fail according to incident investigations: a counterfactual logic analysis

https://aiche.onlinelibrary.wiley.com/doi/10.1002/prs.12579

They use the counterfactual reasoning from incident investigations criticizing audis, and used that to extract what the investigators think auditing should do, and figure out how audits fall short of expectation.

Notes at https://ferd.ca/notes/paper-how-audits-fail-according-to-incident-investigations.html & https://cohost.org/mononcqc/post/4435633-paper-how-audits-fa

#LearningFromIncidents

#paper #learningfromincidents

Fred Hebert @[email protected] · 2023-08-20 · 01:16 UTC

This week's #paper is Sidney Dekker's "The psychology of incident investigations"

(https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=651a8dd8660178b9e987c2a32fb8b888f8323fa4)

Which covers 4 motives to incident investigations: epistemological, preventative, moral, and existential (what happened, how to prevent it, which boundaries were transgressed, what's the meaning of the suffering?) and how they all fit together (or don't).

Notes at https://ferd.ca/notes/paper-the-psychology-of-incident-investigations.html & https://cohost.org/mononcqc/post/2551362-paper-the-psycholog

#LearningFromIncidents

#paper #learningfromincidents

Fred Hebert @[email protected] · 2023-07-29 · 00:49 UTC

Cool #paper for this week, Ben Lupton and Richard Warren's "Managing Without Blame? Insights from the Philosophy of Blame" at https://link.springer.com/article/10.1007/s10551-016-3276-6

They look at no-blame approaches, then contrast them with at least 4 broad philosophical conceptualizations of blame, and then try to suggest a better alternative to blamelessness, which builds upon more careful blame within communities of practice.

Notes at https://ferd.ca/notes/paper-managing-without-blame.html & https://cohost.org/mononcqc/post/2256727-paper-managing-with

#LearningFromIncidents

#paper #learningfromincidents

Fred Hebert @[email protected] · 2023-07-25 · 22:59 UTC

Dug out my older notes on Gary Klein's Anticipatory Thinking #paper — https://www.researchgate.net/publication/228953044_Anticipatory_Thinking

The paper looks at what is described as "gambling with your attention" with multiple variants: pattern matching, trajectory tracking, and convergence. It then covers problems and blockers to these functioning well, with suggested work-arounds for individuals and organizations.

Notes at https://ferd.ca/notes/paper-anticipatory-thinking.html & https://cohost.org/mononcqc/post/2186632-paper-anticipatory

#LearningFromIncidents

#paper #learningfromincidents

Fred Hebert @[email protected] · 2023-07-23 · 03:08 UTC

@deliverator I’ve expanded on that latter quote in a blog post for the #LearningFromIncidents community at https://www.learningfromincidents.io/posts/carrots-sticks-and-making-whings-worse — you may find that one interesting

#learningfromincidents

juno suárez @[email protected] · 2023-07-21 · 04:59 UTC

Found this in an old technology and society book, in a footnote by Madeleine Akrich:

#lfi #LearningFromIncidents

#lfi #learningfromincidents

Fred Hebert @[email protected] · 2023-07-03 · 13:27 UTC

A discussion in the #LearningFromIncidents slack had me quickly pull up my notes from Unruly Bodies of Code in Time by Marisa Leavitt Cohn:

https://www.jstor.org/stable/j.ctv1xcxr3n.14#metadata_info_tab_contents

The chapter covers sample stories from the ethnographic work, done by embedding in the software development teams at the JPL labs (NASA) responsible for the Cassini mission. She reviews what maintainability means to them.

Notes at https://ferd.ca/notes/paper-unruly-bodies-of-code-in-time.html & https://cohost.org/mononcqc/post/1840750-paper-unruly-bodies

#learningfromincidents

Fred Hebert @[email protected] · 2023-06-19 · 15:33 UTC

Huh, #LearningFromIncidents folks just shared a link to https://metrist.io/blog/the-data-behind-delayed-status-page-updates/

I for one, believe each of our pageable alerts should also page all of our customers so they have the freshest information available at all times.

What do you mean that's a terrible idea? It's the most automated of all solutions! Could it be that time-to-customer-notification isn't that useful of a signal?

#learningfromincidents

Fred Hebert @[email protected] · 2023-05-31 · 17:03 UTC

Today's #paper was Accident Report Interpretation by Derek Heraghty: https://www.mdpi.com/2313-576X/4/4/46/htm

He takes a linear fact-centric accident report from a construction site, uses its investigation data to write two other reports, one based on a systems analysis, and one that publishes the stories told by workers.

He then compares the resulting suggested fixes by various test groups, to show the impact of framing.

Notes at https://cohost.org/mononcqc/post/1591376-paper-accident-repo

#LearningFromIncidents

#paper #learningfromincidents

Fred Hebert @[email protected] · 2023-04-22 · 16:56 UTC

Today's #paper was long overdue: Lisanne Bainbridge's Ironies of Automation (https://ckrybus.com/static/papers/Bainbridge_1983_Automatica.pdf)

The core thesis is that automated systems always end up being human-machine systems, and even as you automate more and more, human factors keep being of critical importance.

Two requirements clash at a fundamental level with automation: the need for someone to monitor if it behaves correctly, and to take over when it does not.

Notes at https://cohost.org/mononcqc/post/1376487-paper-ironies-of-au

#LearningFromIncidents

#paper #learningfromincidents

Fred Hebert @[email protected] · 2023-04-12 · 00:51 UTC

Fetched and transferred my old notes on Richard Cook & David Wood's "Distancing Through Differencing" #paper https://www.researchgate.net/publication/292504703_Distancing_Through_Differencing_An_Obstacle_to_Organizational_Learning_Following_Accidents

In this one, they point that very local incident investigation reports and audiences who over-emphasize the differences between worksites can end up ignoring useful potential learnings that could apply to them, even in organizations with strong safety cultures

Notes at: https://cohost.org/mononcqc/post/1321331-paper-distancing-th

#LearningFromIncidents #ResilienceEngineering

#paper #learningfromincidents #resilienceengineering

Fred Hebert @[email protected] · 2023-03-26 · 14:46 UTC

This week's #paper is "Nine Steps to Move Forward from Error" by Woods and Cook. It states 9 steps and 8 maxims (with 8 corollaries) to provide ways in which organizations and systems can constructively respond to failure, rather than getting stuck around concepts such as "human error."

https://www.researchgate.net/publication/226450254_Nine_Steps_to_Move_Forward_from_Error

It's a sort of quick overview of a lot of the content from both authors.

Notes at: https://cohost.org/mononcqc/post/1235221-paper-nine-steps-to

#ResilienceEngineering #LearningFromIncidents

#paper #resilienceengineering #learningfromincidents

Fred Hebert @[email protected] · 2023-03-10 · 15:24 UTC

A work discussion had me dig up my notes on one of my favorite texts On People and Computers in JCSs at Work, Chapter 11 of the book Joint Cognitive Systems: Patterns in Cognitive Systems Engineering by David Woods.

https://www.researchgate.net/publication/284173496_Chapter_11_On_People_and_Computers_in_JCSs_at_Work

It explains the concept of the "context gap" from #cybernetics and why humans and computers do balancing work in a joint alliance, rather than a strict separation of concerns.

Notes at https://cohost.org/mononcqc/post/1157774-paper-on-people-and

#LearningFromIncidents #ResilienceEngineering

#cybernetics #learningfromincidents #resilienceengineering

Fred Hebert @[email protected] · 2023-03-04 · 21:50 UTC

I decided to revisit Richard Cook's paper titled "Those found responsible have been sacked: Some observations on the usefulness of error".

https://www.researchgate.net/publication/220579378_Those_found_responsible_have_been_sacked_Some_observations_on_the_usefulness_of_error

The paper classifies human error as not useful in investigations, but instead as useful for organizations as a whole to limit liability, provide an illusion of control, distance yourself from incidents, and as a sign for observers of failed investigations.

Notes at https://cohost.org/mononcqc/post/1127828-paper-those-found-r

#LearningFromIncidents

#learningfromincidents

Fred Hebert @[email protected] · 2023-02-26 · 23:51 UTC

Digging up some older notes for a #paper this week:

When mental models go wrong. Co-occurrences in dynamic, critical systems by Denis Besnard: https://hal.archives-ouvertes.fr/docs/00/69/18/13/PDF/Besnard-Greathead-Baxter-2004--Mental-models-go-wrong.pdf

This paper looks at a pattern that in many incidents where someone's mental model and understanding of a situation is wrong, and they end up repeatedly ignoring cues and events that contradict it, and into what causes this when trying to actually do a good job.

Notes at https://cohost.org/mononcqc/post/1097297-paper-when-mental-m

#LearningFromIncidents

#paper #learningfromincidents

Pirmin @[email protected] · 2023-02-24 · 20:05 UTC

Last week I was able to participate in the first ever Learning-From-Incident conference in Denver. It was amazing.
I also ticked off a big goal: Giving a public conference talk. https://youtu.be/LrK_1ePmz54 Nerve-wracking, but I'm glad I could share what we're doing.
#lficonf23 #LearningFromIncidents #CommunityOfPractice

#lficonf23 #learningfromincidents #communityofpractice

Tim Nicholas @[email protected] · 2023-02-11 · 01:41 UTC

Well, the journey begins. 48 hours earlier than initially planned (in anticipation of Cyclone Gabrielle travel disruption) but I’m on my way to Denver for #LFIcon23 #LearningFromIncidents

#lficon23 #learningfromincidents

Fred Hebert @[email protected] · 2023-02-10 · 21:36 UTC

Re-posting some old notes I had on a #paper by Sidney Dekker: Failure to adapt or adaptations that fail: contrasting models on procedure and safety

https://lean-construction-gcs.storage.googleapis.com/wp-content/uploads/2022/11/18015945/D_DekkeronProcedures.pdf

The paper mentions that deviating from procedures can both be a source of errors, but also of success; preventing all deviance can be as risky as tolerating them all. It's a skill worth training in people, and a procedural gap to monitor.

Notes at https://cohost.org/mononcqc/post/1002599-paper-failure-to-ad

#LearningFromIncidents #ResilienceEngineering

#paper #learningfromincidents #resilienceengineering

Fred Hebert @[email protected] · 2023-02-04 · 03:29 UTC

This week I decided to revisit Sidney Dekker's #paper titled "MABA-MABA or Abracadabra? Progress on Human–Automation Co-ordination", which discusses something called "the substitution myth", a misguided attempt at replacing human weaknesses with automation.

Instead, the suggestion is to focus on cooperation and team work, rather than substitution:

https://www.researchgate.net/publication/226605532_MABA-MABA_or_abracadabra_Progress_on_human-automation_co-ordination

My notes are at: https://cohost.org/mononcqc/post/960352-paper-maba-maba-or

#LearningFromIncidents #HumanFactors

#paper #learningfromincidents #humanfactors

Fred Hebert @[email protected] · 2023-01-31 · 17:45 UTC

Ended up writing about how we (@honeycombio) run incident response: dealing with the unknown, limited cognitive bandwidth, coordination patterns, psychological safety, and feeding information back into the organization.

https://thenewstack.io/how-we-manage-incident-response-at-honeycomb/

#SRE #ResilienceEngineering #LearningFromIncidents

#sre #resilienceengineering #learningfromincidents

Fred Hebert @[email protected] · 2023-01-22 · 16:02 UTC

This week's #paper: Richard Cook and Jans Rasmussen's "Going Solid": https://qualitysafety.bmj.com/content/qhc/14/2/130.full.pdf

The paper highlights properties of loosely-coupled systems saturating, then going tightly-coupled, and situating it within Rasmussen's Drift Model for accidents to frame the risks of hitting these points. It also suggests that better understanding of what your operating point is can help improve safety.

Notes at https://cohost.org/mononcqc/post/888958-paper-going-solid

#ResilienceEngineering #LearningFromIncidents

#paper #resilienceengineering #learningfromincidents

Fred Hebert @[email protected] · 2023-01-18 · 18:54 UTC

Post I wrote on @honeycombio, on why counting incidents is not a useful target (though a possibly useful signal).

Your objectives should be things you can do, not events you wish do not happen.

You hope that forest fires don’t happen, but there’s only so much that prevention can do. Likewise with incidents. You want to know that your response is adequate. And you want to have a systemic perspective that's actually useful in guiding work.

#SRE #LearningFromIncidents

https://www.honeycomb.io/blog/counting-forest-fires

#sre #learningfromincidents

Fred Hebert @[email protected] · 2023-01-14 · 15:26 UTC

Good #paper: Crista Vesel's Agentive Language in Accident Investigation: Why Language Matters in Learning from Events: https://web.archive.org/web/20200310145622/https://pubs.acs.org/doi/pdf/10.1021/acs.chas.0c00002

The paper states that inadvertent ways to structure your sentences in a text or a report may carry implications of blame and convey more deliberate actions from participants than they actually intended, and harm your ability to learn from events.

My notes at: https://cohost.org/mononcqc/post/843393-paper-agentive-lang

#LearningFromIncidents

#paper #learningfromincidents

Rich Lafferty 🐀 @[email protected] · 2022-12-14 · 16:29 UTC

pleased with this slide of mine from our monthly major incident meta-review, encouraging us towards #LearningFromIncidents and away from focusing on incident statistics

the first half says: "The insights generated from reviewing incidents are primarily qualitative, because incidents are emergent behavior"

the second half says "There is no relationship between the impact of an incident and the quality of insights generated through the review process"

#Postmortems #SRE #IncidentResponse

#learningfromincidents #postmortems #sre #incidentresponse