home.social

#learningfromincidents — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #learningfromincidents, aggregated by home.social.

  1. Well this thing seems to be getting popular. Intro post(s) time! Professionally, I'm a staff reliability engineer (#SRE) at PagerDuty, where my interests lie in #LearningFromIncidents, #HumanFactors and safety. Personally... (1/2 - cat pictures to follow)

  2. Completed an 8 page incident report at work today without using or suggesting the phrase "root cause" once.
    #ResilienceEngineering #LearningFromIncidents

  3. New #paper from Hutchinson, Dekker, Rae: How audits fail according to incident investigations: a counterfactual logic analysis

    aiche.onlinelibrary.wiley.com/

    They use the counterfactual reasoning from incident investigations criticizing audis, and used that to extract what the investigators think auditing should do, and figure out how audits fall short of expectation.

    Notes at ferd.ca/notes/paper-how-audits & cohost.org/mononcqc/post/44356

    #LearningFromIncidents

  4. This week's #paper is Sidney Dekker's "The psychology of incident investigations"

    (citeseerx.ist.psu.edu/document)

    Which covers 4 motives to incident investigations: epistemological, preventative, moral, and existential (what happened, how to prevent it, which boundaries were transgressed, what's the meaning of the suffering?) and how they all fit together (or don't).

    Notes at ferd.ca/notes/paper-the-psycho & cohost.org/mononcqc/post/25513

    #LearningFromIncidents

  5. Cool #paper for this week, Ben Lupton and Richard Warren's "Managing Without Blame? Insights from the Philosophy of Blame" at link.springer.com/article/10.1

    They look at no-blame approaches, then contrast them with at least 4 broad philosophical conceptualizations of blame, and then try to suggest a better alternative to blamelessness, which builds upon more careful blame within communities of practice.

    Notes at ferd.ca/notes/paper-managing-w & cohost.org/mononcqc/post/22567

    #LearningFromIncidents

  6. Dug out my older notes on Gary Klein's Anticipatory Thinking #paperresearchgate.net/publication/2

    The paper looks at what is described as "gambling with your attention" with multiple variants: pattern matching, trajectory tracking, and convergence. It then covers problems and blockers to these functioning well, with suggested work-arounds for individuals and organizations.

    Notes at ferd.ca/notes/paper-anticipato & cohost.org/mononcqc/post/21866

    #LearningFromIncidents

  7. Found this in an old technology and society book, in a footnote by Madeleine Akrich:

    #lfi #LearningFromIncidents

  8. A discussion in the #LearningFromIncidents slack had me quickly pull up my notes from Unruly Bodies of Code in Time by Marisa Leavitt Cohn:

    jstor.org/stable/j.ctv1xcxr3n.

    The chapter covers sample stories from the ethnographic work, done by embedding in the software development teams at the JPL labs (NASA) responsible for the Cassini mission. She reviews what maintainability means to them.

    Notes at ferd.ca/notes/paper-unruly-bod & cohost.org/mononcqc/post/18407

  9. Huh, #LearningFromIncidents folks just shared a link to metrist.io/blog/the-data-behin

    I for one, believe each of our pageable alerts should also page all of our customers so they have the freshest information available at all times.

    What do you mean that's a terrible idea? It's the most automated of all solutions! Could it be that time-to-customer-notification isn't that useful of a signal?

  10. Today's #paper was Accident Report Interpretation by Derek Heraghty: mdpi.com/2313-576X/4/4/46/htm

    He takes a linear fact-centric accident report from a construction site, uses its investigation data to write two other reports, one based on a systems analysis, and one that publishes the stories told by workers.

    He then compares the resulting suggested fixes by various test groups, to show the impact of framing.

    Notes at cohost.org/mononcqc/post/15913

    #LearningFromIncidents

  11. Today's #paper was long overdue: Lisanne Bainbridge's Ironies of Automation (ckrybus.com/static/papers/Bain)

    The core thesis is that automated systems always end up being human-machine systems, and even as you automate more and more, human factors keep being of critical importance.

    Two requirements clash at a fundamental level with automation: the need for someone to monitor if it behaves correctly, and to take over when it does not.

    Notes at cohost.org/mononcqc/post/13764

    #LearningFromIncidents

  12. Fetched and transferred my old notes on Richard Cook & David Wood's "Distancing Through Differencing" #paper researchgate.net/publication/2

    In this one, they point that very local incident investigation reports and audiences who over-emphasize the differences between worksites can end up ignoring useful potential learnings that could apply to them, even in organizations with strong safety cultures

    Notes at: cohost.org/mononcqc/post/13213

    #LearningFromIncidents #ResilienceEngineering

  13. This week's #paper is "Nine Steps to Move Forward from Error" by Woods and Cook. It states 9 steps and 8 maxims (with 8 corollaries) to provide ways in which organizations and systems can constructively respond to failure, rather than getting stuck around concepts such as "human error."

    researchgate.net/publication/2

    It's a sort of quick overview of a lot of the content from both authors.

    Notes at: cohost.org/mononcqc/post/12352

    #ResilienceEngineering #LearningFromIncidents

  14. A work discussion had me dig up my notes on one of my favorite texts On People and Computers in JCSs at Work, Chapter 11 of the book Joint Cognitive Systems: Patterns in Cognitive Systems Engineering by David Woods.

    researchgate.net/publication/2

    It explains the concept of the "context gap" from #cybernetics and why humans and computers do balancing work in a joint alliance, rather than a strict separation of concerns.

    Notes at cohost.org/mononcqc/post/11577

    #LearningFromIncidents #ResilienceEngineering

  15. I decided to revisit Richard Cook's paper titled "Those found responsible have been sacked: Some observations on the usefulness of error".

    researchgate.net/publication/2

    The paper classifies human error as not useful in investigations, but instead as useful for organizations as a whole to limit liability, provide an illusion of control, distance yourself from incidents, and as a sign for observers of failed investigations.

    Notes at cohost.org/mononcqc/post/11278

    #LearningFromIncidents

  16. Digging up some older notes for a #paper this week:

    When mental models go wrong. Co-occurrences in dynamic, critical systems by Denis Besnard: hal.archives-ouvertes.fr/docs/

    This paper looks at a pattern that in many incidents where someone's mental model and understanding of a situation is wrong, and they end up repeatedly ignoring cues and events that contradict it, and into what causes this when trying to actually do a good job.

    Notes at cohost.org/mononcqc/post/10972

    #LearningFromIncidents

  17. Last week I was able to participate in the first ever Learning-From-Incident conference in Denver. It was amazing.
    I also ticked off a big goal: Giving a public conference talk. youtu.be/LrK_1ePmz54 Nerve-wracking, but I'm glad I could share what we're doing.
    #lficonf23 #LearningFromIncidents #CommunityOfPractice

  18. Well, the journey begins. 48 hours earlier than initially planned (in anticipation of Cyclone Gabrielle travel disruption) but I’m on my way to Denver for #LFIcon23 #LearningFromIncidents

  19. Re-posting some old notes I had on a #paper by Sidney Dekker: Failure to adapt or adaptations that fail: contrasting models on procedure and safety

    lean-construction-gcs.storage.

    The paper mentions that deviating from procedures can both be a source of errors, but also of success; preventing all deviance can be as risky as tolerating them all. It's a skill worth training in people, and a procedural gap to monitor.

    Notes at cohost.org/mononcqc/post/10025

    #LearningFromIncidents #ResilienceEngineering

  20. This week I decided to revisit Sidney Dekker's #paper titled "MABA-MABA or Abracadabra? Progress on Human–Automation Co-ordination", which discusses something called "the substitution myth", a misguided attempt at replacing human weaknesses with automation.

    Instead, the suggestion is to focus on cooperation and team work, rather than substitution:

    researchgate.net/publication/2

    My notes are at: cohost.org/mononcqc/post/96035

    #LearningFromIncidents #HumanFactors

  21. Ended up writing about how we (@honeycombio) run incident response: dealing with the unknown, limited cognitive bandwidth, coordination patterns, psychological safety, and feeding information back into the organization.

    thenewstack.io/how-we-manage-i

    #SRE #ResilienceEngineering #LearningFromIncidents

  22. This week's #paper: Richard Cook and Jans Rasmussen's "Going Solid": qualitysafety.bmj.com/content/

    The paper highlights properties of loosely-coupled systems saturating, then going tightly-coupled, and situating it within Rasmussen's Drift Model for accidents to frame the risks of hitting these points. It also suggests that better understanding of what your operating point is can help improve safety.

    Notes at cohost.org/mononcqc/post/88895

    #ResilienceEngineering #LearningFromIncidents

  23. Post I wrote on @honeycombio, on why counting incidents is not a useful target (though a possibly useful signal).

    Your objectives should be things you can do, not events you wish do not happen.

    You hope that forest fires don’t happen, but there’s only so much that prevention can do. Likewise with incidents. You want to know that your response is adequate. And you want to have a systemic perspective that's actually useful in guiding work.

    #SRE #LearningFromIncidents

    honeycomb.io/blog/counting-for

  24. Good #paper: Crista Vesel's Agentive Language in Accident Investigation: Why Language Matters in Learning from Events: web.archive.org/web/2020031014

    The paper states that inadvertent ways to structure your sentences in a text or a report may carry implications of blame and convey more deliberate actions from participants than they actually intended, and harm your ability to learn from events.

    My notes at: cohost.org/mononcqc/post/84339

    #LearningFromIncidents

  25. pleased with this slide of mine from our monthly major incident meta-review, encouraging us towards #LearningFromIncidents and away from focusing on incident statistics

    the first half says: "The insights generated from reviewing incidents are primarily qualitative, because incidents are emergent behavior"

    the second half says "There is no relationship between the impact of an incident and the quality of insights generated through the review process"

    #Postmortems #SRE #IncidentResponse