This paper reviews the use of tests of Australian school students’ abilities, concentrating on literacy and numeracy. Since 2008, nationwide annual tests held in May have required the vast majority of students in years 3, 5, 7 and 9 to participate, as summarised below.
This article is dated early October 2015. Towards the end of the same month, President Barack Obama wrote an open letter to American parents and teachers urging them to go easier on tests. This is essential reading, and listening as the letter ends with an audio-visual presentation of the president. Please read and listen — Australia is in a similar predicament!
The idea of using high-stakes tests goes back at least to 1957 in America, when apprehension over US education standards escalated after the Soviet Union’s launch of Sputnik. This concern over a country’s competitiveness spread internationally, especially since 2000 when the triennial OECD Program for International Student Assessment (PISA) was launched to test 15-year-old school pupils' scholastic performance on mathematics, science and reading. 
America in the new century has led other nations into “an era of strong support for public policies that use high-stakes tests to change the behaviour of teachers and students in desirable ways. But the use of high-stakes tests is not new, and their effects are not always desirable. ‘Stakes’, or the consequences associated with test results, have long been a part of the American scene,” and many states introduced schemes to develop minimum competency standards to reform schools and “ensure, in theory, that all students would learn at least the minimum needed to be a productive citizen.” During the same period, Australia became apprehensive that it might fall behind in the international competition from China, South Korea, and European nations such as Finland and Estonia.
The above quote on high-stakes testing indicates that these tests have controversial aspects, as will indeed become obvious despite the fact that some of the NAPLAN information is valuable and unique. We rely on several main sources in the analysis that follows:
Since 2008, the Australian Curriculum, Assessment and Reporting Authority (ACARA), an independent statutory authority, has conducted annual tests in May of over one million students in years 3, 5, 7 and 9. The National Assessment Program—Literacy and Numeracy (NAPLAN) covers reading, writing, language conventions (spelling, grammar and punctuation), and numeracy. The 2014 NAPLAN report states that NAPLAN provides “important information about whether young Australians are reaching important educational goals.”
NAPLAN reports that the tests are adjusted statistically so that the 2014 results can be compared with previous years, and across geographic, demographic and educational groups. All students at the same year-level are assessed on the same test items. The tests were developed collaboratively by ACARA, the state and territory governments, the non-government school sectors and the Australian Government.
NAPLAN also reports that tests are designed to broadly reflect aspects of literacy and numeracy within the curriculum in all school jurisdictions. The test questions and test formats are chosen so that they are familiar to students and teachers across Australia. National Protocols for Test Administration ensure consistency in the administration of NAPLAN tests by all test administration authorities and schools across Australia. Statistical analysis is used throughout, focusing in the national report on the standard deviation test to indicate the variability in student performances.
The introduction to the 2014 report concludes (p iv): “NAPLAN tests are the only Australian assessments that provide nationally comparable data on the performance of students in the vital areas of literacy and numeracy. This gives NAPLAN a unique role in providing robust data to inform and support improvements to teaching and learning practices in Australian schools.”
This conclusion is disputed because the statistical NAPLAN results are based on a very small part of the curriculum which bears little relation to the wide range of literacy and numeracy (and by extension, other subjects including the arts) which comprises the educational significance of school teaching. Any similarity between statistical and educational significance may be accidental, because the statistical significance is based on mathematical formulae associated with an isolated test, whereas educational significance is a value based on the whole wide area taught in school and difficult or impossible to express in numbers.
This paper covers two main areas. It shows selected NAPLAN results, concentrating on the two main classes of literacy and numeracy, and it critically assesses their educational significance. Appendix 1 provides a set of statistical background tables on reading and numeracy offered to those who want further insights into the time series that have been built from 2008 to 2014.
NAPLAN results are reported using five national achievement scales, one for each of the NAPLAN assessment domains of reading, writing, spelling, grammar and punctuation, and numeracy. However, our focus is on trends and the 2014 NAPLAN report only show the annual time series from 2008 to 2014 for the subject areas of reading and numeracy. Each assessment consists of ten bands, which represent the increasing complexity of the knowledge and skills assessed by NAPLAN from years 3 to 9. The stated intention with the NAPLAN reporting scales is that any given score represents the same level of achievement over time. For example, according to the NAPLAN report a score of 700 in reading in one year represents the same level of achievement in other testing years.
The bands are related to national standards in Table 1. Six of the 10 bands are used at each year-level of students, and are then related to a national minimum standard. A keyword search for "minimum standard" found plenty of examples in relation to the detailed statistical results in the 365-page 2014 NAPLAN report but no further definition or discussion of the origin of the concept or its validity.
The exact quote on page v is (emphasis added): "The national minimum standard is the agreed minimum acceptable standard of knowledge and skills without which a student will have difficulty making sufficient progress at school. Students whose results are in the lowest band for the year level have not achieved the national minimum standard for that year. These students are likely to need focused intervention and additional support to help them achieve the skills they require to progress in schooling." We guess that the main criterion was concern for quality education but we couldn't find a description of how agreement was reached, including the relative roles of ACARA and the other stakeholders listed from the NAPLAN report in the introductory section.
The lowest of the six bands at each year-level is deemed to be below the minimum standard and the highest band is the highest possible for a given student year (band 6 for year 3, band 8 for year 5, band 9 for year 7, and band 10 for year 10). The minimum standard, as defined, is achieved in the second band. Table 2 shows four indicators for each year level for reading and numeracy, respectively:
In summary, the proportion of students with scores at or above the minimum standard is high with only 5-8% failing to reach that standard. But there is a significant difference between years in the proportion who are just at the minimum standard (rising) and the proportion who in terms of these criteria is classified excellent (declining). Most students fall in the three intermediate bands between "just making it" and "excellent", and these percentages develop a little differently from one year level to the next. For reading it rises from 60.4% in year 3 through 67.1% in year 5 and 71.8% in year 7, and then declines to 69.7% in year 9. The equivalent residual for the three intermediate bands is more constant for numeracy, changing from 70.0% for year 3 to 71.2% for year 5, 69.1% for year 7 and 68.3% for year 9 (on average close to 70% with no apparent trend through the years).
The tables that follow describe the differences in the top scores among the various groups. Recognising the different patterns that came out of Table 2, our focus is on excellence because of its intrinsic interest as the differences that are revealed among the groups are among the most valid and useful that come out of the NAPLAN approach. 
The four tables in this section relate to reading and numeracy skills rated in the top band in the NAPLAN tests, for each of years 3, 5, 7 and 9. Each state and territory is represented and show consistent patterns. Table 3 shows all students and the proportion of each gender in the top band, Table 4 the percentage of Indigenous and non-indigenous students in the top band, and Tables 5 and 6 the proportions in the top band according to "geolocation": metropolitan, provincial, remote and very remote school locations — all students in Table 5 and Indigenous students in Table 6.
In Table 3, the primary observation is a dramatic decline in the proportion of students in the top band measured by NAPLAN's reading test: from 24.5% of year 3 students through 14%, 10% and 5.8% in the subsequent year-levels. The pattern is different for numeracy, which falls from 14.6% in the top band of year 3 to 8.8% in year 5, but it then increases to 11.8% in year 7 before settling at 9.2% in year 9. Using measures like standard deviation liberally, ACARA evidently intends these statistics to be based on rigorous analysis, but the differences in these patterns are not explained in the commentary on the 2008-2014 time series on pages 300-302 of the NAPLAN report.
On the criteria set for the national assessment scale, agreed by ACARA and other stakeholders in the NAPLAN survey, and with the focus on excellence adopted in this section, the rapid deterioration of reading excellence as children progress through school is not explained, nor is the lack of match with numeracy excellence through the school years. Should there be a difference in excellence levels between literacy and numeracy? Wouldn't that mean that year 9 school students are "better" at numeracy and maths than at reading and literacy? If so, is that carried over into years 10 and 12 and into tertiary education? Where is the evidence for that?
Excellence as such should be an important determinant, given the degree of competitiveness that is highlighted by the development of international comparisons (epitomised by the OECD Program for International Student Assessment, PISA), which has sent waves of apprehension through educational communities and appears to have been a factor in former Prime Minister Julia Gillard's decision to introduce NAPLAN with the 2008 survey.
Still concentrating on all students (gender is covered below), definite geographical patterns emerge. For reading, the Australian Capital Territory has the highest proportion in the top band in year 3 (33%), followed by Victoria and NSW, Tasmania, and then by Queensland, South and Western Australia, tailed by the Northern Territory. This pattern largely persists through year 5, but more senior years see a relative improvement in the ranking of NSW and Victoria falling back. Western Australia's ranking improves consistently from years 5 to 9, perhaps assisted by the exceptional economic conditions favouring that state through the mineral boom. The Northern Territory remains behind in all years.
The rankings are somewhat different for numeracy. NSW actually pips the ACT at the post, and the two are then followed by Victoria, WA, Queensland, Tasmania, SA and the NT. One consistent pattern is SA being last among the states. Queensland and Tasmania rank fifth and sixth, respectively. WA comes across better in fourth position, Victoria as number three, the ACT second, and NSW topping the eight states and territories on excellence in numeracy.
The above description should not be taken out of context. In general, the two subject areas have similar patterns. After combining them, NSW and ACT vie for first position followed by Victoria in third and WA in fourth position. Tasmania and Queensland are toss-ups for sixth, SA is seventh, and the Northern Territory is in last position.
These statistics give an approximate picture of the opportunities that exist for school students across Australia, but many things are left out including the distribution of schools by jurisdiction (government, Catholic, independent), and distances and relative importance of metropolitan, provincial and remote school locations. The position of Indigenous students, unhappily, is another significant factor. Finally, ranking itself is a normative or qualitative rather than exact measure. Nevertheless, the scores by excellency as measured by the highest national assessment band at each year-level appears highly relevant given the Australian sense that, rightly or wrongly, we are losing out to other countries on educational quality. Excellence, as distinct from general quality, is apparently neglected in the NAPLAN report. It should be recognised as another highly significant indicator.
This geographically related detail is important in an overview, but other differences can be more briefly described in the rest of this section. More details can be gleaned, as needed, by anyone delving into the tables. Table 3, however, contains the results for each gender at each of the four year-levels. For reading, the percentage in the top band falls consistently for both genders; for numeracy less so.
In reading, girls do consistently better at the "excellent" level than boys, from year 3 through 5, 7 and 9. The opposite applies to numeracy, where a higher proportion of boys reach the highest level of student achievement. As already established, however, the percentage in the highest band for both genders is better maintained across the year-levels in numeracy than in reading.
Table 4 contrasts the proportion of Indigenous and other students reaching the highest band of NAPLAN's national assessment scale. On literacy, more than one-quarter of non-indigenous children in year 3 doing the 2014 test were rated on this criterion of excellence, compared with only 5% of Indigenous students. By year 9, the excellence ratio had fallen to 0.6% for Indigenous students — one in about 170 students, compared with one in 16 mainstream students. We have already queried the rapid decline in reading excellence computed by the national assessment scale.
The numeracy results are not much better, though they don't show the same degree of deterioration as the year-level increases. Numeracy is consistently rated lower than literacy in the earlier school years but catches up in years 7 and 9. Still, only 0.8% of Indigenous students in year 9 were rated excellent on numeracy in 2014, equivalent to about one in 125 Indigenous students in the top band, compared with almost one in 10 non-indigenous students.
There are significant geographic differences here, of which some might be expected considering the different Indigenous population patterns. For reading, Tasmania tops the Indigenous scale in year 3 (10.9% in the top band), followed by Victoria, the ACT and NSW. WA and the Northern Territory are at the bottom, both areas with relatively large Indigenous populations. This does not explain why the percentages are lower for Indigenous people counted on their own, which it is more likely to be a function of the relative importance of metropolitan versus remote school locations.
Whatever the reason, the underlying issue of discrimination should be addressed urgently, because the year 9 ratios sink to abysmal levels anywhere in Australia, though the same areas keep doing relatively better: Tasmania with 1.7% of Indigenous students followed by the ACT (1.2%), NSW and Victoria (0.7%). WA again trails the states (0.4%), and the NT clocks in at a minuscule 0.1% — equivalent to about one in 1,000 Indigenous students sitting the test.
The excellence patterns, while lower in the early years, are similar for numeracy. In year 3, Tasmania tops with 4.7%, followed by the ACT, Victoria and NSW. In year 9, the maximum ratio for Indigenous students was 1.3% (NSW), followed by Tasmania and Victoria. The Northern Territory supplied the only nil ratio found in Table 4, which in this case means less than 0.5%.
Summarising Table 4, non-indigenous students are about five times as likely as Indigenous students to reach the top band for reading in year 3, and 10 times as likely to do so in year 9 (within its lower overall ratios discussed above). For numeracy, non-indigenous students are some 10-12 times as likely to be in the top band in years 5, 7 and 9.
The impact of school location in 2014 is examined in Table 5 (all students) and Table 6 (Indigenous students). These tables cover years 3 and 9 only, but figures for the two intervening years are available as well.
The evidence is clear. For reading in year 3, 26.8% reached the highest band in metropolitan schools compared with 19.1% in provincial schools, 13.3% in schools classified as remote, and 5.8% in "very remote" schools. The same pattern at much lower levels applies to year 9 students: 6.8% for metropolitan, 3.3% for provincial, 1.9% for remote and 0.7% for very remote schools. For numeracy, the ratios in year 3 going from metropolitan to very remote declined from 16.4% through 10.3%, 6.6% to 3%. The numeracy ratio by year 9 also declined seriously, ranging from 10.9% in the top band in metropolitan schools through 4.6% in provincial, 2.3% for remote and 0.6% for very remote schools.
The year 3 findings on the top band for reading/literacy was highest in the ACT followed by Tasmania, Victoria and NSW, with a similar pattern for provincial schools (not represented in the ACT). Victorian remote schools showed up most favourably with as many as 20% in the top band compared with a national average of 13.3%. It stands out as well in the numeracy part of Table 5. This may be a function of Victoria being a relatively compact area, coupled with a relatively advanced education system.
In summary, a metropolitan location is heavily advantaged (presumably even more concentrated towards central metropolitan areas). Remote and very remote are heavily up against the odds. The state and territory governments are of course highly conscious of these inequalities but it is a major problem to fix them, not just for the teaching of Indigenous students but all students.
Table 6 concludes this section. It is structured in the same way as Table 5 but covers Indigenous students only. Again, metropolitan schools lead year 3, with 7.5% of students in the top band (exceeded in Victoria, Tasmania, NSW and the ACT). The proportion of Indigenous students in the top band in provincial schools falls to 5.2%, with Tasmania leading. This state also scores relatively highly in "remote" schools, way above the Australian average — presumably because there are few schools in this category in the island state which also has relatively more Indigenous people than any other state, and one or a very small number of schools happen to cater well for Indigenous children.
The the top band for reading at year 9 declines drastically with only two observations above 1% (Hobart, Tas and Canberra, ACT).
The observations for numeracy are generally lower in year 3, ranging from 3.7% in metropolitan schools to a mere 0.2% (one in 500) in very remote schools. By year 9, only metropolitan schools average more than 1% of Indigenous students in the top band (assisted by Hobart and Sydney, and to a lesser extent Melbourne and Perth). All provincial and remote schools in year 9 scored less than 1% in the top band in year 9.
The subject matter in this section is based on the statistical work presented in Appendix 1. NAPLAN calculates the achievements of all students tested in the survey to reach the highest level in Year 9 as a function of its literacy and numeracy national assessment scale, based on 10 “bands” of which bands 1 to 6 are used in year 3, 3 to 8 in year 5, 4 to 9 in year 7, and 5 to 10 in year 9 (as described in the section on “Key Results from the National Assessment”). Appendix Table A1 shows that the annual variations between 2008 and 2014 were slight, expressed by the standard deviation (SD) which is a tiny proportion of the average in any year including years 3 and 5 in reading which show the highest SDs. Another expression of the low variability between years is the median observation, varying between 99.89% and 100.40% of the annual average for reading, and between 99.85% and 100.40% for numeracy. This median is close to one (100%) if the trend is flat.
The 2008-14 trends look flat for all combinations of year and subject matter (Charts 1 and 2). This suggests that subjecting more than a million students every year to these tests — largely unrelated to the main literacy and numeracy curricula and eating into the time available for teach these and other subject groups — would be very difficult to justify in benefit-cost terms, not to mention the added stress on students, parents and teachers revealed by the many critical comments which cannot be readily measured as financial costs. A triennial NAPLAN test would be sufficient to provide virtually the same information. Moreover, what is now essentially a census of students conducted by NAPLAN might be replaceable by a large, well-designed survey — using methodology already in place elsewhere, including the Australian Bureau of Statistics.
Some minor apparent trends may be detected. Chart 1 shows overall reading achievement for each school year-level for each year from 2008 to 2014, and a dotted line showing the average across the seven years. For year 9 the dotted line is horizontal (no trend), and the apparent trends for the lower year-levels are much reduced or would disappear when 2008 is eliminated from the analysis. Arguably the first NAPLAN survey may not have been quite as perfectly executed in a technical sense as subsequent surveys, despite the NAPLAN report declaring all years fully compatible.
Trends are practically non-existing for numeracy (Chart 2), including year 5 despite a relatively low observation for 2008. There may be reasons for the apparent differences between the reading and numeracy scores that have been calculated, but any differences remain minor and the idea that an expensive annual analysis is needed to capture such differences has to be challenged. If the trend is flat it is also quite predictable, and there is little point in confirming this annually.
Table 7 shows a statistic called “the nature of the difference”, which represents an attempt to strengthen the standard measure of statistical standard deviation. According to NAPLAN 2014 (page iv), it was introduced in the 2014 comparison calculations to help interpret differences in results. However, NAPLAN’s adoption of this measure is based on a misconception.
The usual assumption in statistical analysis is that differences between two groups are due to chance, unless there is a rationale for why such differences might occur. This is known as the null hypothesis (H0) to verify or reject statistical assumptions. It is assumed that sample observations result purely from chance rather than being influenced by some non-random cause.
“Non-random” implies that the observations come from different “populations”, a term referring to the group from which the sample is drawn. In simple terms, the alternative hypothesis (H1, also known as the research hypothesis) predicts a difference between the base and current populations associated, say, by a scientific experiment. If the difference that can be related to the experiment is demonstrated by a statistically significant change compared to the situation prior to the test, the alternative hypothesis that H1 differs from H0 is supported. If not, the null hypothesis wins out, at least until further experimental work suggests otherwise.
The NAPLAN report doesn’t explain and justify why it applies an H1 hypothesis rather than H0. It just superimposes another measure, “effect size”, on the conventional statistical test based on the null hypothesis. Considering the modest slope of any trend and the tendency for these trends to disappear across the seven years according to Charts 1 and 2, the added measure called “nature of the difference” is unlikely to be highly relevant, as well as actually conflicting with scientific method in rejecting the null hypothesis without providing a rationale for doing so. The justification for dropping the null hypothesis is further eroded vis-à-vis the cost and inconvenience of schools conducting annual full-size NAPLAN tests. See Appendix 1 for supplementary remarks.
The annual cost of running NAPLAN has been consistently quoted as more than $100m by numerous people criticising the tests but the actual cost analysis is hard to find. The most concrete evidence we have detected is in an article by Bethany Hiatt, education editor of The West Australian newspaper, ‘Tests lashed as $100m waste’ dated 15 May 2012, when the NAPLAN tests were launched for that year. It begins as follows:
“Nearly 100 Australian university academics have signed a letter criticising national literacy and numeracy tests that start today as having "little merit".”
"As a group we are appalled at the way in which the Commonwealth Government has moved to a high-stakes testing regime … despite international evidence that such approaches do not improve children's learning outcomes," the letter says. "These tests have little merit given that they focus on assessment of learning rather than assessment for learning and they are being misused for … political agendas."
The academics, who signed to support a "Say No to NAPLAN" campaign discussed in a subsequent section (“Other Comments”), said the tests cost $100 million a year, money that could help children with learning difficulties. This amount includes the effort schools put into preparing students for the survey with the disturbances allegedly caused by the diversion or resources away from the main curricula.
It is important to check whether there are any plausible cost calculations apart from the $100m estimate. The only alternative numbers we have are from a Senate inquiry in preparation for the 2013-14 Federal Budget estimates in a document responding to three questions on notice:
These estimates add to about $25m — a far cry from $100m-plus which was the estimate according to the nearly 100 academics who signed the letter citing the NAPLAN tests for having little merit. Unless education authorities increased the school budgets immediately it implies that three-quarters of the total cost is carried largely by the schools. This "cost" is difficult to establish in the absence of details on the survey that resulted in the $100m-plus estimate.
Bethany Hiatt of The West Australian highlighted the cost issue by quoting a respected private school principal who said that the NAPLAN literacy and numeracy tests cost $45 for each student and the money could be better spent on other educational needs. In a strongly worded note to parents, the former chair of the Association of Heads of Independent Schools of Australia WA branch and Tranby College principal Jo Bednall said the NAPLAN tests were a waste of money.
She said Tranby spent $14,567 on the tests “last year”, "which is more than a little frustrating when the school doesn't have a choice about whether to participate or not".
There are plenty of comments from educationalists and others, in fact we have rarely seen so much criticism of a public-sector Australian project, which is saying a lot. We are concluding that little useful knowledge would be lost by cutting the NAPLAN surveys to a triennial basis, but it would be beneficial also to review the basic assumptions behind conducting the survey by harmonising it with the broader principles of educating Australian school students.
As noted above, hard evidence of the cost to schools to account for the $100m or more is difficult to find, but the academic evidence is hardly made out of thin air. A senior lecturer in education at Murdoch University, Perth, WA, Greg Thompson, was awarded a $375,000 three-year research project in February 2012 to investigate the effects of NAPLAN on school communities. A paper co-written with Allen G. Harbaugh reports on a preliminary survey of teachers in WA and SA in 2012 generally expressing concern about the impact on pedagogy and curriculum. The paper still did not quantify the associated costs but concentrated on gathering evidence on teacher concerns.
In conclusion, the $100m cost estimate looks credible, but it would be good to have it verified, and we would appreciate hearing from anyone who has evidence to do so.
Appendix 2 reviews two academic views, by Dr Justin Coulson and Professor John Polesel (the latter heading a literature review of what they call NAPLAN’s “annual bureaucratic extravagance”). Coulson discusses the NAPLAN tests that were launched in schools in May 2015 (results not yet published). Both concentrate on the annual $100m cost of the survey; both are highly critical and no further comments are needed here.
Appendix 3 (“And One from the Coalface”) is part of a Senate submission from a primary school teacher in Canberra, Remana Dearden. It concerns the difficulties schools have in explaining and administering the NAPLAN survey, and its irrelevance to teachers. Please turn to it both for her comments and the cartoon she included which seems to tell all that is wrong with the NAPLAN approach. Rightly or wrongly attributing the advice to Einstein, “our education system” demands that we all have to learn to climb a tree whether we are a monkey, fish, dog, bird or elephant — or we will live our lives believing that we are stupid.
Appendix 4 summarises comments to a Life Matters program on ABC Radio National, dated 19 May 2015. It is background only but illustrates many aspects which bear on the subject of this article but largely defy formal analysis. It is another set of signals from the coalface.
We continue with the presentation of selected sets of views intended to be representative. There are many more in the Internet. Like the Life Matters program, the first two comment on the 2015 tests.
Christopher Bantick teaches English literature at an Anglican grammar school for boys in Melbourne. His comments relate to ACARA’s reported decision to replace the paper-and-pencil approach with computerised tests from 2016. He notes that the students’ fear of the tests causes school principals to recommend that students are tutored for the NAPLAN tests to limit this fear. “My own students have been prepared for the NAPLAN tests this week; I have given them practice NAPLAN questions and assessed them. I expect the class to not be intimidated by the NAPLAN experience.”
“Where NAPLAN testing fails is that it is a Neanderthal blunt club of a tool to determine progress. … The essential point of NAPLAN is to identify schools, owing to their results over time, which may need special assistance and support with the establishment of skills. … As a secondary-school English teacher, my concern with NAPLAN testing is that it does not measure development. What about the boy who sits in the back row of my class who is struggling with his writing and reading? His development on a NAPLAN test will seem negligible but I know he has made significant progress. He, and many like him nationally, are understandably nervous about NAPLAN. Their individual academic growth will not be measured. Their confidence will not be measured and they will be clustered as a mere unit of a statistical number-crunched graph.”
“Let there be no mistake, computer marking would lessen my workload instantly. It would also give me printouts of neat data. But it would not allow me to see the germination of insight and understanding, or the surprise of something beautiful written from a tender heart that has taken courage to pen. That's what NAPLAN can't assess. Creativity can't be number crunched, and computers don't get it.”
Sydney education reporter Eryk Banshaw noted: “Psychologists across the country are still witnessing high-levels of stress in young children in the lead up to the NAPLAN exams despite repeated warnings that precautions need to be taken with students in high-pressure environments. As more than a million school children prepare to sit the exams on Tuesday, students as young as eight are getting so anxious they are vomiting.”
“NAPLAN chiefs are urging teachers and parents to help calm down children across the country while psychologists continue to report increased levels of illness, sleeplessness and school avoidance. The warnings have followed years of principals voicing their concern that the burden of NAPLAN is harder on their youngest students, with those in year three confronting an entirely new format of testing at such a young age.”
NAPLAN's chief administrator, Dr Stanley Rabinowitz, urged parents and teachers to "control the stress". "Treat it as a normal day and move on," he said.
The Banshaw article includes 10 multiple-choice questions from NAPLAN which can be viewed by clicking the website and scrolling to the end of it.
The “Say No to NAPLAN” campaign was formed by a group of Australian academics teaching in universities. “As a group we are appalled at the way in which the Commonwealth government has moved to a high stakes testing regime in the form of NAPLAN, despite international evidence that such approaches do not improve children’s learning outcomes” (David Hornsby, 16 December 2013). At the end of September 2015 the list of academic signatories to the letter of support to say no to NAPLAN had grown to 129 (counted at the website).
The 20 two-page papers supporting the campaign (mostly dated 2012) are listed here. They deal with a range of aspects and interested parties should take a personal look. We refer briefly to a few of the papers dealing with NAPLAN issues generally, plus two papers in the next section which are among those advocating a greater role for music and other arts in school education.
The first paper in the collection is ‘Inappropriate Uses of NAPLAN Results’ by Margaret Wu and David Hornsby. It reinforces the general criticism which our own analysis generates, and is representative of the general thrust of the “say no” campaign:
The second paper, again by Hornsby and Wu, is “Misleading everyone with statistics”. All scores have a margin of error, which the NAPLAN reports themselves show to be large. The 95% confidence interval may surround a score of, say, 488 with a standard error of ±54 so the “true” score may be as low as 434 or as high as 542 within the 95% interval.
The tests are, as the first paper showed, not diagnostic and don’t provide the kind of information required to inform teaching programs. Furthermore, the margin of error in those tests is so high that students may appear erroneously to be performing worse in successive tests, that is, being in different achievement bands solely due to statistical uncertainty.
ABC Radio National’s morning program Life Matters on 5 September 2015 broadcast “Music education, more important than you think”. It was presented by Natasha Mitchell with Isabelle Summerson as producer. The initial paragraph on the website sets the stage, asking for listeners to call the program with their experiences:
“When you think about what classes are most important to children's formal education, what comes to mind? Is it maths? English? Maybe science or history? Chances are you probably didn't think of music, but maybe we need to rethink the value we place on music class. Maybe it's not just a bludge lesson but critically important to children's development of analytical skills and other areas of learning. Internationally renowned conductor and music educator, Richard Gill, believes music should be at the top of the food chain in children’s schooling (our emphasis). Agree? What are your memories of music class at school? Did you get a musical education in or out of school? How do you feel it influenced your relationship with music today?”
The three guests at the program covered a comprehensive range of backgrounds. Richard Gill as already noted is conductor and music director of the Victorian Opera Company, advocating the view that the school curriculum should be book-ended by music at one end and physical activities at the other. Margie Moore is known to this Knowledge Base in connection with the outback NSW festival Moorambilla Voices and listed in the program as arts and music consultant with Music Australia’s Music. Count Us In.. The trio of guests was completed by Randy Glazer, senior music facilitator at Street University, Mt Druitt in Sydney’s west, who works with young people and others outside and after school.
Music and the arts appear to have had something of a renaissance among commentators in Australia which may be partly associated with the negative reputation acquired by NAPLAN. At least we didn’t expect two of the 20 two-page papers in the Say No to NAPLAN campaign to be specifically advocating music and the arts. Much more is needed, of course, for the arts and cultural policy to be properly recognised.
Robyn Ewing is Professor of Teacher Education and the Arts at the University of Sydney. Her contribution to the “Say No to NAPLAN” campaign is ‘The risks of NAPLAN for the Arts in Education.’
“The Arts are as old as human civilization and they enrich our lives in a myriad of ways. Quality arts experiences can and should have a profound experience on children’s lives and life chances and therefore should be an important part of the school curriculum.”
“Over the last fifteen years a succession of international research reports have clearly demonstrated that children who engage in quality arts processes, activities and experiences achieve better academically; … “
“Yet with increasing emphasis on high stakes testing such as NAPLAN in Australian schools, the Arts will continue to be relegated to the margins of the mandated curriculum. Those subjects that … can be measured by multiple choice testing will be given increasing priority. Arts, poetry, creative writing, music-making, aesthetic appreciation and dramatic performance cannot easily be graded after a thirty to forty minute test.”
“The kind of engagement with ideas and processes inherent in all Arts disciplines (reading, dance, drama, literature, media arts, music and visual arts) helps develop children’s already rich imagination and creativity.”
“It is the arts processes and the making or creating rather than the final outcome … that is the most important learning because that making process will inform the next one and provide opportunities to extend and amplify understandings.”
Conductor and music director of the Victorian State Opera Richard Gill is hard-hitting in ‘Wake up Australia, or we’ll have a nation of unimaginative robots’ (2011).
“I want to make my stance very clear from the outset: NAPLAN and My School have NOTHING absolutely NOTHING to do with the education of a child. This abhorrent and insidious method of assessing children, teachers and their schools needs to stop now.”
“[They] will be subjected to a style of teaching which is directed exclusively to producing satisfactory results in the NAPLAN tests and consequently scoring high ratings with My School.”
“Screaming the words literacy and numeracy from Canberra does not constitute having an educational policy. In fact the race to become the most literate and numerate schools with the best rankings nationally is exacting a terrible price.”
“Evidence is now available that schools all over the country are cutting back on arts education to devote more time to subjects which will make children literate.”
“Activities used in teaching NAPLAN tests destroy individuality, stifle creativity, stultify thought… . The very things that promote literacy and numeracy are the arts, beginning with serious arts education in the early years. If we want a creative nation, an imaginative nation, a thinking nation and a nation of individuals, then we must increase the time for arts education especially music education. If we want a nation of non-imaginative robots who can do NAPLAN tests then we are well on the way to achieving that condition.”
“Music … requires the student to have a capacity to work in the abstract, an ability to work across several skill areas simultaneously and the ability to rationalise this verbally. Children’s involvement in musical activity has a profound effect on the development of a child’s general learning.”
“Wake up Australia before it’s too late.”
Music Australia’s senior writer and editor of its Music Journal Graham Strahle wrote on 14 August 2014:
“One of the continuing concerns about NAPLAN is that it takes teachers’ attention away from subjects that it does not test. That includes music, along with other arts subjects, foreign languages and history. The consequence is not just that children may be under-achieving in these subjects but that their whole development may be negatively impacted.”
Like the present paper, Strahle quotes Greg Thompson, Allen G. Harbaugh, Nicky Dulfer, Justin Coulson and Richard Gill for promoting arts education as against NAPLAN’s negative impact on music and creativity. Music Australia chief Chris Bowen on 30 September 2015 welcomed the incoming communications and arts minister in the Turnbull cabinet, Mitch Fifield, agreeing with Joanna Mendelssohn’s welcoming the shift from “a very patrician approach to the arts as items to be consumed by the professions in their leisure hours” during Senator George Brandis’s reign in the portfolio. Mendelssohn continued (20 September):
“In his communications ministry Fifield will have to deal with the NBN and the digital revolution, and this fits in very well with some of the concerns of different areas of the arts. … I only hope Brandis’ Program for Excellence in the Arts is quietly abandoned and due process is restored.”
The omens seem to a better future for lobbying against NAPLAN in favour of a greater role for arts education, in favour of a stronger cultural policy in Australia.
This Knowledge Base has for several years pointed to the potential that culture has for an economy. Recognition of our cultural assets and the need to protect and enrich them is part of a lecture we gave as early as 2007. It demonstrated a close parallel with another vulnerable type of assets, ecological capital because they both such large non-renewable elements. But Australia’s cultural and environmental assets are also rich and, given proper high-priority support, robust.
The talk in 2007 has passed the test of time in one essential respect, as shown by this definition: “Cultural capital is the sum total of a country’s tangible and intangible cultural assets not already counted as other forms of capital. Culture is here defined widely not just to include museums and concert halls, music and the arts, but also the ambience which makes up the cohesive power of a society, its traditions and norms.”
Our current set of music scenarios for the next two decades outlines the big difference in prospects between nurturing our rich cultural capital and government policy largely ignoring it. Professor Julianne Schultz is a leading observer of cultural policy in Australia. Her 2015 address Comparative Advantage. Culture, Citizenship and Soft Power testifies to that and is strongly recommended reading. Her views are very similar to our own given that resulted from different development paths.
Julianne Schultz notes that politicians often equate cultural policy with the arts. “A more sophisticated way of framing this builds the links between the creation of art of intrinsic value, and the commercialisation of related products and services – rather than considering it as a binary option.”
While she was advising the Australian government during its attempt to replace the 1994 Creative Nation cultural plan in 2013, “considerable effort went into ways of defining the activity that could derive from investment in artists. The best analogy was to equate this to the investment in pure scientific research. It may have a commercial and instrumental value, but the research itself is of singular importance. In Australia this debate has been stymied by equating culture with arts defined quite narrowly as the non-commercial sector.”
We totally agree with Julianne Schultz that there are more efficient ways of "meeting the competition" from other countries than those pursued in general trade policy, currently in the doldrums partly because of the collapse of the mineral boom. They include assessing Australia's strengths in the cultural area, and how to benefit from it. Schultz points out that the cultural sector keeps growing strongly globally, in contrast to commodity trade. Australia is well placed by its geographic location and its own long cultural traditions in a variety of forms. But culture is treated as a political stepchild where a unified portfolio would aim at bringing together the various components to reinforce each other:
“In 2009 UNESCO devised and adopted a statistical framework that was designed to capture the scale of activities and by providing an agreed international definition, make comparisons, and assessments of success more robust. The framework takes the major areas of cultural activity and divides them into six broad cognate groups and two related domains: heritage (which includes archaeological, physical, environmental, structural and intangible dimensions), performance (theatre, music, festivals), visual arts (from fine art to photography), audio-visual (film, tv, video), publishing (books, newspapers, magazines, libraries), design (fashion, architecture, graphic design, advertising), tourism, sport.”
Relative to NAPLAN in the current context, culture including arts education is crowded out — the $100m annual cost leaves little space for other possibilities, though it should if the widespread critique of the scheme is to have any effect. Again, policy-makers may have to look more widely, taking in science and technology which do not explicitly enter the Schultz framework in the previous paragraph.
The legitimate question is, how good or bad are we really when it comes to science and technology? We have a world-class organisation called the CSIRO and fine universities everywhere. The issues confronting us are very big, all with cultural overtones: some associated with the prevailing trends towards inequality, others with a lack of political understanding that economics depends on natural and cultural ecology and not just on the "classical" inputs of machines and humans, to take but two. Our largest ally, the United States, have similar apparent weaknesses but they are also the world's most advanced community when it comes to tertiary research.
Australia and America certainly cannot afford to be complacent — both need to think laterally and not just be hung up by matters like the international PISA tests which seem to have scared our educators and their governmental masters so badly. Contributions like Julianne Schultz’s would help immensely if those on the hill would only listen. Upgrading the abysmally low priority given to cultural policy would be an essentially first step.
From an arts-related perspective, cutting back on the annual $100m cost of conducting NAPLAN would leave more than ample resources to devote to the stepchild in Australian statistical research, music and the other arts which are funded by a fraction of NAPLAN. It would enable Australia to rid itself of the overtone of fear that Australia is lagging behind other countries in education quality, and start concentrating on our strengths to conduct an internationally adapted cultural policy incorporating all its aspects and as Schultz recommends unifying the portfolio in a common promotional thrust.
The main hope is that the bad days characterised by the Commonwealth budgets in 2015 and 2016 are coming to an end. They saw our main funding organisation, the Australia Council, being drastically cut back (especially the most creative activities showing new ways for the arts to develop), though the new Turnbull government still have to be tested on this.
This paper goes to further lengths than most other topics covered in the Knowledge Base because of the complexities and deep consequences associated with the issues. To ease reading but retaining the case for change, much of the material has been relegated to footnotes, appendices and background commentary. The key narrative in the body of the report is all considered necessary for understanding how and why the topic has become so dominant, and why it is particularly important for arts-related school education including music.
The following dot points represent the synthesis. See “Our Own Critique to This Point” for a more elaborate version.
File:Naplan analysis.pdf These statistics were compiled in support of the analysis presented in the body of the report. The basic structure is determined by the achievement data for each year-level, for reading and numeracy respectively. We designed eight worksheets based on the NAPLAN 2014 data: “R Year 9” (and 7, 5 and 3) and “N Year 9” (and 7, 5 and 3).
The structure of each worksheet is identical. The basic top left-half section shows annual statistics 2008-14 for all students, and by gender, Indigenous status and state/territory. These are averaged in the last column of this section, printed in red. The analysis of seven years of data covers each school year in the annual NAPLAN surveys. Please click the pdf icon to show four tables covering reading and four covering numeracy, each on a separate page. Software such as Adobe Reader is available commercially to convert the file back to Excel if so required. Alternatively we can email the Excel file (NAPLAN analysis.xlsx) to interested readers.
The gender difference is based on the arithmetic averages of male and female achievements, because we are interested in the absolute differences in the distributions and therefore treat each gender as a separate population. The gender analysis is based on the number of male achievements as a ratio of total achievements.
The Indigenous achievements are shown as the ratio to non-Indigenous achievements, again based on arithmetic averages for the two main groups.
The state and territory findings are related to the arithmetic average for all eight states and territories in the bottom part of the worksheet. The columns “relative to average” (printed in green in the right-hand part of each worksheet) provide the basis for deriving conventional standard deviations and medians for each statistical series (printed in red at the extreme right). The analysis is summarised in the following four small tables.
Table A1 shows average achievement for reading and numeracy separately, for all students for each of the seven survey years, and year-level. Averages, standard deviations and medians are derived from this. The standard deviation measure of statistical variability (SD) is generally tiny but it does indicate what might be interpreted as trends as shown by the larger SDs for years 5 and 3 in reading and year 3 in numeracy. NAPLAN’s use of “nature of the difference” in Table 7 of the main report attempts to take this further, but the concept is not justified and conflicts with scientific method. The median observations in Table A1 also remain close to each average, ranging from 99.85% for year 9 numeracy to 100.40% for both reading and numeracy in year 3.
The “nature of the difference” concept adopted by NAPLAN (incorrectly because the initial and current years are part of the same group (“population”) according to the basically accepted null hypothesis) distinguishes between five measures of statistical significance of change in average achievement between a base and current year (2008 and 2014 here). The current average achievement can be “substantially above” or just moderately “above” the initial finding, or it can be close to or not statistically significant, or it can be “substantially below” or “below”. In Table 7, no results are substantially below or above the base year, nor are any “substantially above”. Of the 52 criteria measured in Table 7, 33 showed no statistically significant changes for reading, but 19 moved upwards to a moderate extent. For numeracy, as many as 46 changes were not statistically significant leaving only six with moderate increases.
Any trend judging from this is upward. Most of the findings judged statistically significant from the “nature of the difference” tests relate to the geographical distributions, concentrating on Queensland, WA, Tasmania and the ACT for both literacy and numeracy. Significance was also attributed to reading in years 3 and 5 for the total sample but not extending to years 7 and 9, and applied to each gender and Indigenous status. It might be argued that these categories are plausibly leading some modest growth in average achievement — to some extent representing a catch-up with other states and territories and with non-indigenous people still having a long way to go. The argument fails if the “nature of the difference” test is statistically invalid.
The largest conventional SD findings for reading at year 3 applied to Queensland and the NT, and to Indigenous people. Probably more significantly, all SDs were substantially reduced for reading as year-levels progressed, and more modestly for numeracy. Crude average SDs of the Appendix 1 worksheet for reading was .019 in year 3, .017 in year 5, .008 in year 7, and .006 in year 9. The comparable findings for numeracy were .011 in both years 3 and 5, and .007 in years 5 and 7.
The “null” hypothesis discussed in the initial discussion headed “Flat Annual Trends” is generally assumed to be the appropriate one until evidence indicates otherwise. The usual null hypothesis is that sample observations result purely from chance. The “alternative” hypothesis is that sample observations are influenced by some non-random cause, which must be explicitly justified.
The null hypothesis is the more straightforward — the only set of statistical probabilities is calculation of chance effects. The null hypothesis must be tested, and rejected, before alternative hypotheses are introduced including how, in effect, the NAPLAN population in the current year could come to differ from the population in the base year. We can find no evidence that the “nature of the difference” analysis followed a rejection of the null hypothesis that the nature of the sample has changed.
The main conclusion is that adding other variables such as those in the “nature of the difference” analysis to the statistical analysis based on conventional standard deviations and measuring the significance of differences is not legitimate without prior testing. There are some relatively minor changes including improvements in the assessment of students in Queensland, WA, Tasmania and perhaps the two territories, and among Indigenous students, which are causing performance standards in these groups to get closer to other states and to non-indigenous people. There is still a long way to go, and adopting a method which is in effect adding a new element without proper justification is not the way to proceed.
Some differences are real, of course, and show up year after year – in fact, the annual NAPLAN numbers show remarkably consistent patterns which vouch for their statistical compatibility. In Table A2, girls consistently exceed the average reading achievements of boys by 1-2%, but the average boy is marginally (1%) better at numeracy.
Table A3 shows that Indigenous students consistently lag considerably behind non-indigenous students, but the difference is gradually reduced between year 3 and year 9 — possibly associated with dropout of the most disadvantaged groups (because they don’t take the test, and allegedly have been occasionally encouraged not to sit for the test in the interest of better overall scores for the school, or because some have left school by year 9).
There are also real differences in the performance of students in different states and territories (Table A4). The ACT consistently leads, with Victoria and then NSW following, ahead of Queensland, SA, WA and Tasmania. The Northern Territory is consistently at the bottom.
All this is of course valuable, as acknowledged in the body of this report, but it doesn’t warrant a constant check through an expensive annual survey.
Honorary Research Fellow at the Australian Institute of Business Wellbeing of the University of Wollongong Dr Justin Coulson is a psychology researcher, author and speaker. His opinion piece in The Daily Telegraph 20 May 2015 is headed, ‘Just admit it: Naplan is a complete failure’. Full article here.
“NAPLAN 2015 begins on May 12 in schools around Australia. This annual bureaucratic extravagance in the name of quality education and ¬enhanced transparency will, by some conservative estimates, cost Australian taxpayers $100 million.”
“What do we get for our $100 million? Improved teaching standards? Greater insight into school performance? Increased student interest in learning? Enhanced student resources or boosts to teacher and student wellbeing? Better resources?”
“Sadly, Naplan delivers little, if any, educational value. In a report for the Whitlam Institute, Professor John Polesel described how the test is shown to possess poor reliability. The tests provide poor quality data. There is widespread anecdotal evidence of cheating and other breaches of testing protocol (such as schools asking poor-performing students to remain at home so as not to lower the school’s score on the My School website).” End of Coulson quote. Comments from the Polesel review follow.
In 2012, Professor Polesel headed a team with colleagues Nicky Dulfer and Dr Malcolm Turnbull from the University of Melbourne which completed a literature review commissioned by the Whitlam Institute of the University of Western Sydney.
“Considerable evidence may be found in the international literature regarding the negative impact of high stakes testing on students’ well-being, including its potential to impact on students’ self-esteem and lower teachers’ expectations of children. There is also evidence of negative effects on service delivery and professional-parent relationships and stress, anxiety, pressure and fear experienced by students.”
“Detailed findings such as these are not available in the Australian context, although similar concerns regarding NAPLAN have emerged from various sources, including a recent Australian survey of principals and teachers in independent schools, the recent Senate hearing into NAPLAN testing and reporting and a recent Queensland Studies Authority report, which expressed concern at the capacity of full cohort testing to lower the self-esteem, self-image and long-term confidence of under-performing students, thus widening the gap between them and higher achieving peers.”
We add that the adverse evidence on NAPLAN has accumulated since the Polesel review was completed in 2012.
“I teach Year 5/6 at a public primary school in the A.C.T. This submission is based on my experience with NAPLAN testing (I administered NAPLAN to the Year 5 cohort in my school this year  and have taught Year 5 students each year of my teaching career so far) and discussions with teaching colleagues. The main purpose of this submission is to make the point that NAPLAN is not a useful diagnostic assessment tool for teachers and it uses valuable funding which could be better spent elsewhere.”
“The results of NAPLAN do not tell me, or any of the other teachers in my school, anything we do not already know about the students we teach.“
“This quote and cartoon best sums up how effective a standardised test is in assessing the true abilities of a child (there is some contention surrounding whether or not Einstein actually said it but it is still applicable):”
“Last year, as a staff, we analysed our NAPLAN results to determine which areas we needed to ‘target’ for future teaching. The test was administered in May and we were doing our analysis more than 3 months later. Spending almost an hour trying to work out why so many of the year 3 students chose the same wrong answer for a multiple choice question about the use of an ellipsis was, in my opinion, a complete waste of time which would have been better spent on planning lessons or preparing resources.”
“The cost of implementing NAPLAN each year has been estimated to be around $100 million. My school can no longer afford to have a teacher librarian on staff. $100 million would certainly go a long way towards ensuring every school had a teacher librarian who fully utilised the limitless potential of the school library to foster a love of literacy and learning in every student.”
“There is A LOT more to the success and effectiveness of a school than the results of four tests administered once a year that narrowly assess a miniscule part of what children should be able to do and what they should be learning. NAPLAN provides interesting and useful population data but it certainly does not give a true indication of the knowledge and ability of a child. Education is about the whole student, and the stakeholders of our education system have a duty to provide our students with a world class one.”
Remana Dearden, 7 June 2013: From ‘The effectiveness of the National Assessment Program - Literacy and Numeracy: Submission 68’ (to NAPLAN Senate Inquiry).
The ABC morning radio program Life Matters featured NAPLAN 2015 on 19 May, shortly after the conclusion of the tests. It was presented by Natasha Mitchell and produced by Linda Raine. It prompted a large number of comments in the week after the tests, raising a number of points that couldn’t be treated in the body of this paper. The excerpts do not pretend to be analytic. A visit to the ABC website contains the full comments and the audio of the program.
Hans Hoegh-Guldberg, on Knowledge Base 2 October 2015.