Thursday, August 12, 2010

Interesting Plot On Grade Inflation.

The graph above, taken from the blog In The Dark, shows what percentage of students got a grade of A on their A levels.  I believe "A-levels" are exams taken by students who want to go to UK Universities.  From the Wikipedia:

A levels are usually studied over a two year period and are widely recognised around the world, as well as being the standard entry qualification for assessing the suitability of applicants for academic courses in UK universities.
The author of the blog, Peter Coles, has an interesting write up that everyone should really go and read it.  I will just post one snip-it:
Nowadays, on average, about 26 per cent of students taking an A-level get a grade A. When I took mine (in 1981, if you must ask) the fraction getting an A was about 9%. It’s scary to think that I belong to a generation that must be so much less intelligent than the current one. Or could it be – dare I say it? – that A-level examinations might be getting easier?
Looking at the graph makes it clear that something happened around the mid-1980s that initiated an almost linear growth in the percentage of A-grades. I don’t know what will happen when the results come out next week, but it’s a reasonably safe bet that the trend will continue.
I am sure similar grade inflation is happening in the US.  Furthermore, being the younger generation that has benefited from such grade inflation, I can assure Peter our generation is not overwhelmingly more intelligent.

If I was the "grade-czar", I would assign grades based off of standard deviations: A-type grades for +1sigma students, B-type for mean-+1sigma, C-type for -1sigma-mean, D-type for less than -1sigma and F for someone who just wouldn't try or gave up completely.  I would also not accept D credit counting toward graduation.

This way, an A student means the same from one generation to the next: one who repeatedly preforms at a level one standard deviation above his peers.


  1. I'm not sure I understand your proposal. Is the distribution of raw grades assumed to be normal? Are you recommending As for > +1 sd, Bs for 0 to 1 sd, Cs for -1 to 0 sd, Ds for < -1 sd, and Fs for the people who don't show up? How does the F crowd fit? They will clearly have to go somewhere in the distribution. So, maybe it should be Ds for -2 to -1 sd and Fs for < -2 sd? That would produce 16% As, 34% Bs, 34% Cs, 14% Ds, and 2% Fs.

    What is the goal of education and testing? And how would a distributional approach help? I think I prefer having a well-thought-out standard of competency or set of standards defining each of the grade levels and then letting the distribution fall however it happens to fall. I prefer grades to reflect the absolute knowledge/ability of students, not their relative knowledge/ability. Hence, I am skeptical of approaches based on standard deviations or natural breaks or strong curves.

  2. Jonathan,

    The main point is to see the trend of grade inflation. The issue of how to disperse grades is a side note.

    Of course I prefere a "well-thought-out..." . This is a blog post and so it isn't complete. I'm not going to cover all the bases in a two paragraph side-note.

    That said, standards change. What is important now isn't always what was important 100 years ago. Furthermore, as the plot in this post shows, standards changing makes it so that grades become meaningless. An "A" today does not mean what it meant a generation ago.

    Being one standard deviation above your peers is something that never changes. No matter where you go or what generation you live in, if you were an A student in my grading book, you would be someone who while at college was consistently a standard deviation above your peers. That statement will always have a well defined meaning.

    Instead, grades today are so subjective that you don't know what an A student means anymore. Nothing is well defined. You don't know how to compare one with past generations or even with others in the same generation.

  3. All that means that now students make better efforts to excel, and also they have much better tools to excel with, so there are more who meet A level.

    Take an analogy with brick layers. If we applied statisitcal bullshit you proposed only a few will be craftspersons, even though there may be lots of better brick layers than say 200 hundred years ago!

    Here is my way to look at: if a student works dilligently an A grade is given; the grades mostly measure the level of effort put in by the student, and a bit of luck in the exams.

    I would really like to see 50% or more getting A levels, demostrating the seriousness of the students.

    You guys need to understand how to apply statistics, your A levels should be rescinded!

  4. Anonymous,

    So you are proposing an A is like a standard of certification. Meaning, using your bricklayer analogy, receiving an A means you are proficient at laying bricks and so if a higher percentage of students become able to adequately lay bricks then a higher percentage should get an A. I think this is what you mean.

    First off I think that is fine, with one exception. I don't know about brick laying per say, but for many occupations the number of people proficient enough to get a task done is smaller then the number of employable positions.

    I think in the real world it works more like this : (sticking to the bricklayer analogy.) You may have a class of 100 people and they all work hard and so 90 of them become qualified to lay bricks. Fine, but the employer only has 30 positions, now who does he higher?

    With my grading procedure he sees immediately, of all the potential hires, who are the best brick layers, who are the worst and of those left who falls slightly above average and slightly below average.

  5. No, the point about how to disperse grades is not a side note. It is the very heart of the issue, since it matters for how the graph is interpreted. If you go back and read the linked post and the BBC post linked from there, you'll find that the change in the 1980s that begins the upward trend was going from something resembling your suggestion, where a set percentage of students qualify at the A-level, to something resembling my suggestion, where a set standard is applied to all students and the percentage that passes falls wherever it falls. The change in the system was reasonable: set a standard and test to see whether people meet the standard.

    We now have two ways of reading the graph: progress or grade inflation. Which way to read the graph depends on whether the standard has been getting steadily easier since the 1980s. If the standard has been constant, then the graph shows steady improvement in the sense that more students are now able to meet that standard (as a percentage) than were able to meet it in 1990. If the standard has not been constant, then the graph is meaningless until we know how the standard has been changing (and my suggestion has not been correctly implemented).

    The linked post points to an article in the Chronicle of Higher Education that claims (without any data I could lay my hands on) that students who earned a B on the 1999 A-level exams performed no better than those who failed the A-level exams in 1991 on a college-entrance diagnostic exam that hadn't changed in that time. That looks bad, but it might not be incompatible with having a consistent A-level standard, provided being able to meet the A-level standard is no help on the diagnostic exam. If the A-level standard has been set too low and the poor- or middle-ability students have been steadily improving over the last 25-30 years, you might very well see such an effect.

    Whether the standard now being deployed is reasonable or rigorous enough or high enough is a question not really answered by the graph. And for that reason, the graph by itself doesn't show that the grades are meaningless. It doesn't really show *anything* without a bunch of contextual assumptions that haven't been filled in. Have the tests changed? Have the instructors changed? Has the tutoring system changed? Has the student body changed? Has government funding of education changed? And if any of these *have* changed, how? etc. etc. etc.

    The graph only shows grade inflation if you think the composition of the student body, the tutoring system, the instructors, and the goal of the tests have been roughly the same since the 1970s, AND the standard by which the goal is assessed has been lowered steadily since the mid 1980s.

    You could "fix" the system by simply setting a percentage of people who ought to qualify at the A level, as you suggested. But if the test isn't sensitive enough, even that might not help. After all, a perfect score on the GRE math section only puts you in the 90th percentile (approximately) of test takers.

    In any event, grading on a sliding scale is bad pedagogy. And it just isn't true that your proposal decreases inter-generational subjectivity of grades. If the population changes in any significant way with time, then being one standard deviation above the mean in generation 1 will be very different from being one standard deviation above the mean in generation 20. If you know what the test looks like and what the A-level pass requirements are, then yes, you do know what an A-level pass means today. The fact that we don't know those things doesn't mean that the A-level pass isn't perfectly well-defined or that grades aren't comparable intra-generationally.

  6. I liked the two questions that Jonathan had, "What is the goal of education and testing? And how would a distributional approach help?" Because these questions determine whether or not the grade inflation is a problem. If we are only interested in getting the students to some absolute level of understanding then grade inflation is not a problem, because ultimately the goal is to get as many people past the minimum requirement.

    But if we are trying to separate out "the smartest" or at least those who test better than everyone else, then we should expect to have the same percentage of people getting A's, and grade inflation would be a problem. So if you only want the "Top 10%" then by the standard metric that cannot be measured. Thus the grade scale has become "useless" because it does not accomplish its designed task of separating out a specific group of people from the rest.

    It just depends on what is trying to accomplish. If we are trying for bringing the students to some absolute level then an increase in the number of A's is a desired effect, but if we are trying to separate the students into levels of proficiency, then the increase in A's is a problem.

    And lastly as for my opinion as to what this data shows, because there is such a sudden change in the number of A's beginning in the mid 1980's, that indicates a change in policy, and I don't think that change has been clearly expressed, understood or communicated. I think it indicates a change from the relative measurement of students to a more absolute measurement of students, but because the change was gradual rather than sudden (i.e. the Ministry of Education announcing a whole new grading system for A-levels) this indicates that the shift was made by steadily lowering the expectation for what makes an A. So apparently while the system presents itself as separating the students into different levels of proficiency, those levels have become defined in such a way that they have lost their meaning. This means that if a university administrator looks at an application, they look at at the A-level scores and in their mind they interpret them as meaning, "This student is at least better than X% of students that graduated.", but in reality it now means that the student has passed a minimum level of aptitude, which is not what was expected by administrator. Thus in a sense the grades are a lie because they do not represent what is expected of them.

  7. I really appreciate the comments and people have raised good points. Thank you all for making them. I think Jonathan brings up a good point that, despite what I said, how grades are dispersed are at the heart of the issue.

    So far I still like the idea of using standard deviations, however I welcome the well thought out alternatives being presented.

  8. I can say from having served on my department's graduate admissions committee that the change from "grades as a ranking system" to "grades as a certificate of proficiency" (and it's not just on A-levels in the U.K.) makes it almost impossible to use grades for admissions purposes. Our department has kept informal statistics that show that as long as an incoming student has above a 3.2 GPA (we require at least a 3.0) when they enter the program there is no correlation between undergraduate GPA and comprehensive exam scores, graduation rates, or time to graduation.

  9. One other point - there is an advantage in going easy on students. One of the professors in the physics department at CU has shown that, unsurprisingly, that majors where the first two years of classes give out higher grades tend to have far more students than majors that give out lower grades on average over their first two years of courses. Essentially students choose to study what they perceive themselves to be good at and that perception is not comparative but rather based on grades. So if you want more STEM majors, one way to do it is give everybody A's in calculus.

  10. While I'm here, I might also praise something useful that's done by a handful of universities around the country, notably Cornell: they publish the median grade in a given course on the student's transcripts along with the grade earned. That way the grade can at the same time be comparative and recognize a level of achievement.

  11. Wow, that is cool Cornell did the median thing. It would be fun to sit on an admissions committee once just to see what various schools do.

  12. Joseph Smidt:

    We have very small classes in most colleges in Physics, Math and Engineering. If we apply just the statistics, we wind up with Garrison Killer's intro of his program, "Where all the children are above average".

    What is wrong with a standard of proficiency? Professional Engineering exams just try to do that. Once you pass a PE, you are qualified to sign and certify. Same thing is true for bar exams for lawyers. What I have proposed is meritocracy, not number control.

    Statistics allows us to bs away a position and that is what you are doing. If a prof through your work deems you qualified for an A, that is it. That is the only standard one needs.

  13. Anonymous,

    You are right. It is hard to do adequate statistics if there are only a small number of students.

  14. Joseph Smidt,

    There is a similar problem in work environment. A manager usually is responsible for about seven, with a range for 4 to 11. He/she winds up rating each in either five or seven levels, and last two or three spell trouble. Now if you were to build a crack pot team of five physics guys for five different specialties and you are forced to rate them such that two are least desirable... I hope you see the problem.

  15. Anonymous,

    The average college class has far more than 4 to 11 students. In many college classes you have 300 students. The odds of getting a significantly better or worse group of 11 is amazingly higher than with a group of 300.

    What this really comes down to is a view of what the purpose of grades are. If grades serve primarily a comparison purpose than grade inflation is simple unacceptable. If grades signify levels of proficiency then grade inflation isn't a problem at all assuming that student performance is improving.

  16. I did my grad studies at Tufts. Average class size was SIX; largest class was in Fluid Mechanics, 21.

    Grade inflation at BS level does not mean much. In less than three years, it becomes what have you done lately syndrome, and nobody looks at the grades.

  17. I should have also said that you are correct to observe: Students are getting better because more of them work harder, and their efforts should not be averaged out! Give them an A, if they worked for it.

  18. Anonymous,

    What I would have liked to have seen is BYU suppress freshman grades the way schools like Caltech do. Then my GPA would have *really* been something. As some schools like Caltech understand, sometimes good students make a few mistakes when they first show up not being used to college and living away from parents for the first time etc... and it is unfortunate those initial grades drag down the rest of your GPA especially given the fact that your freshman classes are probably the most irrelevant in terms of how well you understand physics.

  19. "Grade inflation at BS level does not mean much. In less than three years, it becomes what have you done lately syndrome, and nobody looks at the grades."

    By that argument grade inflation at the graduate level doesn't mean much either since nobody cares 3 years later. More to the point, I tend to agree when one has small class sizes it's really not fair to require that some students get C's or A's. What I do think is fair is making the long-term average over many small classes constant in time so that grades can be used comparatively.

  20. Joe,
    Just a note - MIT does the same thing with freshman grades as CalTech.

    Most grad schools solve this problem by putting much more weight on upper division physics and math. No graduate program really cares how you did on your freshman history class but almost all of them care how you did in E&M and quantum mechanics.


To add a link to text:
<a href="URL">Text</a>