Education Really Matters: Assessment

Frank O’Hagan

A complex and controversial topic

One of the many contentious areas in education relates to the purpose and nature of how evaluations are made, and reports are written, about the abilities and talents of children and older students. The main issues which appear in the media and are of concern to teachers and parents may seem initially to be relatively simple. However, further inspection shows them to be both elaborate and multifaceted. For a start, the focal point of an appraisal might be related to everyday knowledge, practical skills, analytical capability, dexterity, originality, inventiveness, problem-solving, creativity or a mixture of these features. Furthermore, assessment practices cover a wide variety of age and social groups and can take many forms – written examinations with open or multiple-choice questions; one-to-one interviews and participation in group discussions; hands-on and experimental tasks; project work; research dissertations; and so forth. In all of these procedures, judgments can be made along a broad spectrum ranging from the use of narrow, strict criteria to very vague and subjective guidance. Predictably, the opinions and conclusions made by students, guardians, schools, universities and employers with regard to the significance and value of the information contained in assessment reports are frequently open to large discrepancies.

Educationalists have to consider the reason for an assessment as well as its modus operandi. For example, initial or base-line assessment is used to establish the abilities of pupils prior to starting a programme of work whereas summative assessment is intended to measure competences at the end of a course. Conversely, formative assessment has an important role to perform in monitoring advancement and to highlight improvements which are being implemented while coursework is still on-going. Two other approaches, worthy of note as they can be particularly helpful to students, are: (1) self-referenced assessment which enables them to measure advances against their previous standards during the interval from a designated starting point and (2) goal-based assessment to record the achievement of targets which previously had been set for, and understood by, individuals or groups.

Some cautionary notes

In practice, there is no single ideal means of gauging learners’ exact knowledge and understanding in common curricular areas, for instance, in language and literature, mathematics, scientific studies, the arts and technology. Similarly, this is true for notable human characteristics and qualities such as personality, general intelligence and employability. Even though nurseries, schools and further and higher educational institutions may place great significance on the outcomes of appraisals, caution is applicable in relation to their management at all ages and stages. A case in point would be excessive use of quantitative and psychometric tests which are often administered inappropriately. Probably some serve the interests of their publishers and professional test users much better than pupils or parents who can be confused or misled by what these methods pertain to demonstrate. An on-going problem with standardised measurements arising from personality profiles or details about intelligence is that they can reinforce the spurious notion that personal attributes and ability are fixed entities.

Other factors – such as the context in which examinations take place, the emotional stress levels of those being appraised, and the criteria for grades to be decided by assessors – regularly feature in civic deliberations. Moreover, there is the possibility of inherent bias being concealed within administrative processes as regards gender, social class or ethnicity. Undeniably, time and again, there is a strong case to be made for having very clearly-stated ‘health warnings’ issued along with formal assessment reports. Substantial caveats also apply to cumulative data collections which are analysed to make comparisons of results among schools as well as those gathered for the circulation of national statistics.

Validity, reliability and usefulness

Despite there being recognisable difficulties and limitations, it seems to be generally agreed that to compute in a reasonably objective manner how students – or, for that matter, schools, education authorities and nations – are performing is a desirable goal. In spite of the urgency, it is unsurprising that this aim, with its various stumbling blocks and obstacles to overcome, continues to be mired in uncertainties and disagreements. Students from all backgrounds are the blameless victims of these predicaments. They deserve clarification and elucidation as assessment and its subsequent effects are matters of extreme importance to them. If satisfactory solutions are to be found, it is absolutely necessary for educationalists to be confident that approved procedures possess validity, reliability and usefulness. These three concepts are intricate and only a brief outline of them is provided in what follows.

Validity relates to a calculation of any kind actually measuring what it claims to measure. Questions about how well everyday assessments really do judge targeted features need to be raised more often than is currently happening. Frequently, they are well wide of the mark in terms of accuracy, or in worse-case scenarios, they measure something else. In such circumstances, there is a pressing requirement to re-evaluate whatever approach is being undertaken. Currently, many assessments are largely, if not entirely, paper-based which raises questions regarding validity in relation to practical and life skills beyond educational establishments. Fixed, restricted conventions should be challenged if they are viewed as falling far short of determining competences appropriately. Public debate, including through the use of social media, should be encouraged to examine issues about how best to develop well-founded and justifiable arrangements for appraisals.

In general, reliability is largely concerned with the extent to which an analysis provides consistent results in what it is measuring. One form of reliability, referred to as stability, is when there are consistent scores if repeated at different junctures. Features to be taken into consideration include the methods, frequency and organisation of assessments. An evaluation can be consistent but invalid through giving a constant result when repeated but, in reality, not measuring what is intended. Indeed, some tests are consistently invalid! At times, snags relating to validity and reliability may appear to be present simultaneously. For instance, coursework for national examinations with input largely completed by parents and tutors, or purchased over the internet as occasionally happens, would most likely not be of the same specification if it had been completed without any assistance. Likewise, tests undergone after a holiday period can indicate poorer academic performances than would have been the case if they had taken place at the end of term prior to vacation. In particular, it would be of no surprise to class teachers if they found this feature to be more marked for pupils from deprived backgrounds who did not have the same level of academic support as others while away from school.

What is often overlooked when debates rumble on about assessment is consideration of the usefulness of current practices. To meet the ‘utility’ criterion, assessors need to be able to show conclusively that the processes are genuinely worthwhile in terms of duration, costs and realistic gains. If they are a disservice to pupils’ and teachers’ efforts, too bureaucratic or of little value to stakeholders, why have them? All undertakings ought to guarantee trustworthy purposes which are clearly understood by recipients, including those who use the results when they are making decisions about students’ futures. Assessors – whether in educational establishments or industry – are in very influential positions. They have the power to arrive at conclusions which will impact on the life-long consequences of individuals. With such dominance comes great responsibility.

Prioritising the advantages of those teaching and taught

From the perspective of learners, there are occasions when little or no thought seems to have been given to the suitability of common assessment practices. As indicated previously, the question which needs to be addressed is ‘What are the benefits for both those being taught and their teachers?’ For instance, sometimes arrangements and frequency in gauging practical skills should be more akin to driving tests for motor vehicles. Students could be assessed when they rate themselves ready and, if they do not reach appropriate prerequisites, have further opportunities to re-sit their examinations.

Additionally, pupils experiencing difficulties may achieve targets within their individualised educational programmes but have had unsatisfactory learning experiences while working towards them. As a result, they may be much less motivated to participate in forthcoming work or to proceed to the next stages. In this situation, what appears in a report to have been a success may actually have been detrimental to their further development.

For a comprehensive review of progress, a blend of mixed tactics may be necessary to obtain greater accuracy than, as often happens, results being devised after a nondescript, written and timed examination. Merely bestowing a number or a rating on levels of attainment can be very limited as to denoting further intellectual growth or applied expertise. Personalised profiles covering important features of potential, attainments and achievements can convey much more relevant and detailed information.

The introduction of new procedures should be designed with the key principle of enabling scholars to understand how to move forward in a positive fashion. Categorisation arising from judgments and decisions can so easily be the forerunner of an unintended form of stigmatisation. As previously indicated, a recurrent hazard – widely acknowledged – is that assigning grades brings with it the possibility of dispiriting students who either perceive themselves as failures or are labelled as such by others. Too much emphasis on testing, especially when students are ill-prepared, can lead to unnecessary pressure and anxiety. When supplementary forms of monitoring are planned, their efficiency ought to be substantiated beforehand rather than be introduced as a fad or political gesture. If, as is sometimes claimed by politicians, regular national tests of young pupils are helpful in ensuring that standards are being raised, then this assertion should be supported by well-documented research.

Prevailing pressures on tutors can coerce them towards giving too much attention to quantitative methods of reporting at the expense of qualitative approaches. Discerning teachers realise that formative and dynamic assessment techniques are very advantageous in many ways to students of all ages. When evaluating achievements, there is much to be gained from objectively observing learners’ awareness and responsiveness, investigating their contributions, and listening to their explanations of what they feel they are accomplishing. Such courses of action can identify: superior learning strategies; productive work habits; successful incentives; the most effective forms of instruction; and the levels of intervention and support to fulfil potential abilities and giftedness.

One aspect in which traditional techniques fail significantly relates to the appraisal of complex competencies which are relevant – at times essential – with regard to inter-personal relationships and professional proficiency. For example, in both formal and informal assessment across age groups, know-how concerning decision-making, problem-solving, self-evaluation and cooperative work with others are often neglected. Nonetheless, such aspects of performance are highly valued by students themselves, educationalists and employers. Current practices require to be upgraded to address this significant weakness.

Conclusions

Ascertaining features about learners’ abilities, dexterity and personal traits can be highly functional and profitable in the enhancement of their educational experiences and progress. However, careful scrutiny and reflection are necessary in the formulation of guidelines. In turn, these always should be implemented in an appropriate, well-designed and purposeful manner.

High quality assessment has the following characteristics: (1) it has proven validity, reliability and usefulness; (2) its administration is undertaken by skilled and committed personnel who have received suitable training; (3) it provides substantial information, feedback and guidance which will augment the quality of learning and teaching; (4) its execution and outcomes are of benefit to all relevant stakeholders, particularly the students involved; (5) it has an apposite health warning, especially when it forms the basis of vital decisions about a student’s future.

Unfortunately, there are those with responsibilities for assessment within education who are fully aware of the failures and shortcomings of current practices but negligently continue to promote the status quo. While acknowledging the obstacles and challenges which they face, their report card perhaps should begin mischievously with that familiar, if unwanted, adage ‘Not good enough! Must do much better!’ – followed, of course, by positive and constructive suggestions on how matters could be considerably improved!

(Frank O’Hagan previously was the Adviser of Studies to Bachelor of Education students at the University of Strathclyde. Later, he was a member of Her Majesty’s Inspectorate of Education.)

Related

By O'Hagan