The Deceptive Simplicity of Percentages: Why Our Grading Systems Need a Revolution

For decades, percentages have been the cornerstone of academic assessment, a seemingly objective and straightforward way to quantify student performance. We assign a numerical score, average it out, and present it as a clear indicator of learning. Yet, beneath this veneer of simplicity lies a system riddled with flaws, one that actively hinders effective assessment, distorts learning, and perpetuates unfairness. It’s time we critically re-examined our reliance on percentages in grading and recognized their unscholarly nature, as highlighted by Chris Rust in “The Unscholarly Use of Numbers in Our Assessment Practices: What Will Make Us Change?”

Rust’s article, published in the International Journal for the Scholarship of Teaching and Learning, serves as a powerful indictment of current assessment practices, particularly the ubiquitous use of numbers. He argues that academics, despite their intelligence, often behave “uncleverly” when it comes to numbers in assessment, exhibiting a “number blindness” that stops them from truly thinking about what these numbers represent. This isn’t just about minor inaccuracies; it’s about a fundamental misunderstanding of what percentages can, and cannot, tell us about a student’s learning journey.

One of the most glaring issues with percentages is their inherent lack of meaningfulness. What does a 55% score truly signify? As Rust points out, two students achieving the same percentage mark can possess vastly different strengths and weaknesses. The single number obscures these crucial details, providing a superficial snapshot rather than a nuanced understanding of a student’s grasp of the material. Imagine a student who excels in critical analysis but struggles with grammar, versus another who has perfect grammar but offers shallow analysis. Both might receive a 70%, but their learning needs and proficiencies are entirely distinct. The percentage flattens this complexity, leaving both student and instructor in the dark about specific areas for improvement or commendation.

Furthermore, the illusion of precision created by percentages is deeply problematic. When a marker assigns a 73% instead of a 72%, it implies an ability to distinguish between qualities of work to a hundredth of a percentage point. This level of fine-grained judgment is often impossible, especially when dealing with subjective assessment criteria or multiple criteria. In the UK, for instance, the full range of percentages is rarely used in arts and humanities, with marks typically falling between 35% and 75%. This still demands a precision of one-fortieth of a difference, a task that becomes even more challenging when multiple criteria are involved. In the US, the trend of rarely awarding marks below 70% further compresses the scale, yet the underlying issues of false precision remain. This manufactured exactitude lends an unearned air of scientific rigor to what are, at their core, subjective judgments.

The aggregation of percentages introduces another layer of distortion. When individual scores from different assessment criteria are combined, a critical question arises: can a deficiency in one area be genuinely compensated for by excellence in another? Rust illustrates this with the chilling possibility of a student consistently failing to meet a crucial criterion, yet always managing to pass due to an aggregated mark. This phenomenon is supported by research, which reveals that students can “often show serious lack of understanding of fundamental concepts despite the ability to pass examinations”. Percentages, when aggregated, obscure these fundamental gaps in understanding, creating a false sense of competency.

Beyond their inherent imprecision, percentages are also fundamentally “not scalable”. A piece of work scoring 90% is not, mathematically speaking, one and a half times better than a piece of work scoring 60%. This means that standard arithmetic operations, such as averaging, are illegitimate when applied to these grades. Yet, university systems routinely treat these percentages as if they were true quantitative data, adding and combining them without regard for their symbolic nature. This practice is statistically unsound, particularly when ignoring differences in range and mean deviation across different assessments. A mark for a lab report and a mark for an exam represent different types of learning outcomes, and treating them as interchangeable numbers obscures these underlying differences.

The pressure to conform to a “normal” distribution curve is another insidious consequence of a numerical grading system. Some institutions even arithmetically force marks to fit a bell-shaped curve, a practice that Bloom et al. (1971) rightly condemned as evidence of “our failure to teach”. Education, unlike random activity, is purposeful, aiming for all students to learn what is taught. Forcing a normal curve fosters a competitive environment where students might be disincentivized from helping each other, fearing they might jeopardize their own “few higher grades”. This undermines the very notion of collaborative learning and promotes a cut-throat mentality that has no place in genuine education.

Furthermore, assessment judgments can be distorted by “erroneous other factors”. As Sadler (2009) points out, “transactional and bestowed credit” can infiltrate marks, meaning some grades are not even related to actual learning. Marks for attendance or penalties for late submissions, when combined with marks for academic performance, render the resulting combined score even more meaningless. Rust shares a poignant anecdote of a friend whose US degree classification was significantly hampered by late submissions in his early years, despite strong performance later on. This illustrates how grading can become a “game” rather than a scholarly reflection of learning, a practice Rust strongly argues against.

Systemic unfairness also permeates percentage-based grading, stemming from variations across assessment tasks, subject disciplines, and institutional rules. In the UK, coursework often yields higher scores than examinations, and more numerate disciplines tend to have higher marks than humanities. Despite these well-documented differences, university systems treat all marks as equivalent, aggregating and processing them as if they carry the same weight. This leads to disparities, such as mathematics graduates being four times more likely to achieve a “first” (the top grade) compared to history students. This systemic unfairness has very real consequences for students, impacting their future career prospects, especially when many graduate jobs no longer require a specific discipline but focus on degree quality.

Finally, and perhaps most critically, the reliance on numerical grading can have a profoundly negative effect on learning itself. Research consistently shows a decline in deep and contextual approaches to study as students progress through their degrees. Students become increasingly focused on the mark rather than the subject matter. Compelling evidence from UK schools reveals that when work is returned with feedback but without a mark, students are far more likely to engage with and act upon that feedback, leading to improved future work. Conversely, when a mark is attached, students are less likely to even read the feedback and consequently less likely to improve. The focus shifts from learning to a mere numerical outcome, transforming education into a quest for points rather than understanding. Even simply introducing a grading system (as opposed to a pass/fail) can lead to students adopting a surface approach and viewing assessment less as a learning opportunity.

The arguments against the unscholarly use of numbers in assessment are compelling and well-supported by evidence. Rust’s call to action for a “Scholarship of Assessment,” or SoTLA, is not merely a semantic change, but a crucial step towards fostering a critical mass of faculty who truly understand the implications of their assessment practices. It’s time to move beyond the deceptive simplicity of percentages and embrace assessment practices that genuinely reflect and promote deep learning, fairness, and scholarly integrity. Our students deserve nothing less.

References

Bloom, B., Hastings, J., Madaus, G. (1971). Handbook on Formative and Summative Evaluation of Student Learning. McGraw-Hill

Rust, C. (2011). “The unscholarly use of numbers in our assessment practices: What will make us change?” International
Journal for the Scholarship of Teaching and Learning 5(1). https://doi.org/10.20429/ijsotl.2011.050104.

Sadler, D. (2010). Fidelity as a precondition for integrity in grading academic achievement. Assessment & Evaluation in Higher Education35(6), 727-743.