home.social

#assessment-integrity — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #assessment-integrity, aggregated by home.social.

fetched live
  1. We need structural changes to assessment rather than discursive changes

    This is the slightly overstated thesis of this paper. It rests on what I think is a genuinely useful distinction between discursive and structural changes to assessment:

    Modifications that rely solely on the communication of instructions, rules, or guidelines to students, such that their success depends entirely on student awareness, understanding, and voluntary compliance with these communica- tions. These changes leave the underlying structure and mechanics of the assessment task unchanged, focusing instead on specifying how students should approach or complete the task.

    1091

    Modifications that directly alter the nature, format, or mechanics of how a task must be completed, such that the success of these changes is not reliant on the student’s understanding, interpretation, or compliance with instructions. Instead, these changes reshape the underlying framework of the task, constraining or opening the student’s approach in ways that are built into the assessment itself.

    1092-1093

    The traffic light systems, the 4/5 point AI assessment scale (AIAS) and declarations all constitute discursive approaches in that they fundamentally change how we communicate about assessment. There are three problems which the authors identify with these approaches:

    • They assume student understanding when the application of abstract categories to real world practice will always be ambiguous, particularly when those categories are formatted at the level of abstraction necessary for a large multidisciplinary university.
    • They assume student voluntary compliance with the approach, in spite of significant incentives to non-compliance and the aforementioned ambiguity about what constitutes compliance.
    • They assume student compliance can be meaningfully assessed when there is not really any mechanism through which to do this.

    In contrast structural changes actually modify the assessment “by creating conditions where inappropriate AI use becomes difficult or impossible” (1093). These changes can vary but effect ones involve a move from product to process, as well as designing interconnections between assessments such that “the validity of assessment comes not from any single component but from the coherent demonstration of learning across multiple appropriately designed touch points unfortunately” (1095).

    The obvious problem that I’m abundantly familiar with as someone who ran a large PGT programme is that it is extremely hard to scale processual assessment. In large cohorts you need to resort to digital platforms in order to do it, which mitigates exactly the assessment security that processual assessment is supposed to provided. This is clearly the way to go in a perfect environment: processual assessment strategy with a healthy dose of authentic tasks and well-designed group would go some way to solving the problems we are no encountering. But I remain unconvinced you can do this reliably in any environment other than, say, the Oxbridge system. The class sizes have to be small and the teacher/student ratio has to be healthy with stable relationships between them. Otherwise it breaks down.

    I say that I think this thesis is overstated because it’s not clear to me that discursive changes are necessarily toothless. Firstly, if we assume that the majority of students start from the position of wanting to learn and to follow the rules (two different things) then clarifying expectations is inherently valuable. It provides students with guidance about how to ensure they are learning and to ensure they are not engaged in malpractice. The fact the sector has been crap at doing this doesn’t license the weird dismissiveness in the paper towards clarifying expectations. Secondly, once we have clarified those expectations it becomes possible to have malpractice processes which are more targeted and fine grained. It doesn’t solve the problem but it seems to me inherently better than not having the discursive shift in the first place.

    I think their assumption is that assessment structural shift has to happen so why not start now? As they put it on 1096:

    The time invested in developing and implementing these discursive approaches is time that could otherwise be used to consider structural changes that will actually work to ensure assessment validity as well as the veracity and reputation of our degrees. When assessment validity hinges on student compliance with unenforceable rules rather than on inherent assessment design, we build educational systems on foundations of sand. Long term solutions require fun- damentally rethinking how assessments are structured rather than how they are explained.

    I’m somewhat sympathetic to this view but I also think it’s such a long term process, in such a resource-constrained environment, that we do seriously risk a complete collapse of trust in credentials before then. So how do we undertake discursive approaches (adapting to AI in my terms) while still working towards structural changes (integrating AI in my terms)? How do we stop the former crowding out the space for the latter? The way they describe the two-lane approach opens up a framework for thinking institutionally about how that might be possible. From 1095

    One immediate benefit of adopting this structural perspective is that it provides a clearer lens for evaluating emerging institutional frameworks, such as the university of sydney’s ‘two-lane approach’ (Liu and Bridgeman 2023). This framework distinguishes between ‘secure’ (Lane 1) assess- ments which are conducted in-person with controlled conditions, and ‘Open’ (Lane 2) assessments where AI use is uncontrolled (Tertiary Education Quality and standards Agency 2024, p. 51). The structural/discursive distinction we propose offers a potentially useful lens for understanding and extending the efficacy of such approaches. While Lane 1 assessments incorporate structural ele- ments by creating environments where inappropriate AI use is physically restricted, the effectiveness of Lane 2 assessments depends on how they are designed structurally, as simply designating an assessment as ‘Open’ without reconsidering its structural mechanics perpetuates the enforcement illusion we have identified. The most effective implementations of dual-track approaches such as these will therefore be those that recognise the need for structural reconsideration of assessment design in both lanes, albeit in different ways.

    #AI #assessment #assessmentIntegrity #higherEducation #malpractice
  2. How are students using Generative AI in UK universities?

    Honestly I’m not sure how worried we should be about these findings from HEPI (n=1,041) given it seems the sector has got passed its initial inclination to try and prohibit. If we’re in a situation where only 12% of students are not using LLMs in their assessment then what matters is steering use towards epistemic agency* and way from LLMs supporting a turbo-charged transactional engagement with knowledge.

    It’s interesting to contrast these findings with Anthropic’s study of university students using Claude, classified in terms of Bloom’s taxonomy:

    The dynamics of cognitive outsourcing (and potential lock-in) differ as you move up from lower to higher-order thinking skills for students. I struggle to see a problem with students using LLMs to support understanding materials, much as I struggle to see a problem with academics using LLMs to produce materials which are easier to understand. Sure we might rapidly end up in a situation where this learning interaction is mediated by LLMs by default but I don’t see a fundamental difference in type from that being mediated by other kinds of digital platforms (e.g. the LMS) or outputs (e.g. Powerpoint). It’s a case of better or worse design rather than something human being lost through the introduction of a technological element.

    I think applying and analysing by definition lend themselves to agentive engagements with knowledge. You can’t get the LLM to do something useful unless you’re thinking about what you’re asking, which means to at least some extent an epistemic capacity is being exercised. Certainly students could try and fail to do this, but that’s a different kind of problem to be addressed through the register of AI literacy. The pedagogical challenge comes in recognising how students are doing this in order to design learning processes which support increasingly purposive applications rather than just assuming they will be learning in the same way we did.

    It’s evaluating and creating where it gets more concerning. If you’ve already developed these capabilities LLMs can be used to speed up the process (though a soft lock-in might result over time) or enhance the process in the activity I describe as rubber ducking. The problem arises if you haven’t learned how to do this without the LLM, such that the composite capacity (e.g. writing a report) develops in a way that has the LLM baked into it from the outset. For example reliance on LLMs for an outline only concerns me if students haven’t learned to do this without the LLM in the first place. To rely on it to critically evaluate your work and suggest room for improvement carries a similar risk of cognitive outsourcing which is unlikely to be addressed after university by most students.

    This is a long-winded way of saying that we urgently need to get beyond the category of ‘AI’ in how we think about these pedagogical challenges. The relationality within the LLM becomes more important to recognise the further up the taxonomy we go. Exactly what ‘creating’ means can now vary immensely depending on the pattern of interaction the student has with the LLM.

    It’s also interesting to see that:

    • The main factors putting students off using AI are being accused of cheating (said by 53% of respondents) and getting false results or ‘hallucinations’ (51%). Just 15% are put off by the environmental impact of AI tools.
    • Students still generally believe their institutions have responded effectively to concerns over academic integrity, with 80% saying their institution’s policy is ‘clear’ and three-quarters (76%) saying their institution would spot the use of AI in assessments
    • The proportion saying university staff are ‘well-equipped’ to work with AI has jumped from 18% in 2024 to 42% in 2025.

    I think students are over-estimating how effectively institutions can identify (and act!) on problematic LLM use and over-estimating the AI literacy of academic staff. If I’m right and student perception catches up to that reality, could ‘cheating’ as an inhibiting factor start to collapse from that figure of 51%?

    *Thanks to my collaborator Peter Kahn for introducing me to this notion

    #assessmentIntegrity #BloomSTaxonomy #cheating #higherEducation #learning #LLMs #pedagogy