Not too long ago, researchers at a large Midwestern university arranged to have a speaker give the same lecture to 154 undergraduates enrolled in eight sections of a required course. Or almost the same lecture: One small detail was different in half of the sections, so let’s call them lecture A and lecture B. Afterward, the students filled out a “teacher evaluation form.” The first part asked about the teacher’s competence and character, the second one asked how much the students had learned. Part three consisted of four open-ended questions.

Most students had good things to say about either lecture: 97 percent of them had some positive comment on lecture A, 96 percent on lecture B, though lecture A received more positive comments than lecture B (412 vs. 339). When it came to negative comments, though, a starkly different picture emerged: Just 30 percent of the students had critical things to say about lecture A, but 79 percent disliked something about lecture B. Lecture A received a total of 52 negative comments. Lecture B, 205 — almost four times as many.

So what was the difference? Same lecture, same speaker, same class. But in lecture A, the male teacher referred in passing to “my partner Jennifer.” In lecture B, he casually mentioned “my partner Jason.” The result? Students reported that they had learned much less from the “gay” teacher than from the “straight” teacher: On a 10-point scale, the mean for the purportedly gay teacher was 5.85; for the purportedly straight one, 8.51. And in response to the question of whether the school should hire the instructor, an overwhelming 93 percent of students would have “unquestionably” hired his straight incarnation. But just 30 percent of students said they “might” hire the gay one, with a paltry 8 percent saying “definitely.”

Would the same shocking picture emerge at the University now? I sincerely hope not. But we don’t know: The University does not collect data on our faculty’s sexual orientation (thank you, University). And even if it did, it would not tell you very much: Only controlled, randomized studies like the one above can give you any reliable insight into the effect bias has on student evaluations. And we have not done such studies at the University. If we had, we would have looked at gender, race and national origin. Or even (though I shudder to think how) at “attractiveness” — hot teachers (please insert your own scare quotes) get better ratings. There’s a reason Rate My Professors openly (and repugnantly) invites students to comment on their professors’ looks.

Of course, none of us believe that we are those people who would judge a teacher on the basis of his or her skin color, ab definition, erotic preference or comic timing. But I know this: If our more than 40,000 students (or our thousands of faculty, staff and administrators) were free of misogyny, racism, homophobia, Islamophobia and whatever other bias you are worried about (if you worry about these things, as I believe we should), an army of social science researchers would descend on our lovely town to study this unicorn population.

Last month, the Daily published a well-balanced piece on our current collective discussion about publicizing course evaluations. Here is one paragraph that caught my attention:

“(Mika) LaVaque-Manty said based on his research, individual bias due to gender and race is evident in classrooms, both in open-ended comments and quantitative measurements. However, he also stressed that those biases tend to disappear from the overall quantitative data, except for some instances of gender bias appearing when data is analyzed at the departmental level.”

I have the highest respect for my colleague Mika, but I’m not sure what to make of this. Bias does not “disappear”; it only gets masked. A faculty member who receives lower rankings because she is Black, or gay, or wearing a hijab will not feel any better because a cursory look at aggregate data paints a rosier picture.

But unconscious bias and the much rarer instances of open bigotry are not the only reasons we should think very carefully about student evaluations, and even more carefully about making them public. Study after study confirms that the strongest correlation between rankings and teaching is a single measure: “My expected grade is….” This is true at the University, and it is particularly true at the polar ends: On average, the most lenient graders get the highest rankings (which doesn’t mean that leniently graded classes cannot be fantastic for all sorts of reasons), the harshest ones get the lowest rankings (and again, a strictly graded class could of course also be sub-par for any number of unrelated reasons). This is not surprising: Why would you rank highly a professor who just messed up your GPA? Why would you not think kind thoughts about the one who made your life easier? And why would you not want to know in advance which is which?

The consequences, however, might well be dire. I recently saw a list of our peer institutions that made their evaluations public. Most of them can be found in the upper right corner of this graph:


Causation or correlation? Impossible to tell. But I know this: According to the research, the quickest way to improve your student evaluations is to give everybody an A. Or to cynically game the system. One faculty member told me, to my horror, “Oh, I grade the midterms really leniently and then I hit them on the final, after they hand in their evals.”

Students have told us that they are particularly interested in the answers to two questions: “expected grade” and “workload.” I get it. It’s perfectly legitimate to try and balance your workload. It’s perfectly legitimate to worry about your GPA. But it is less clear to me whether we, as an institution of higher learning, should assist students in finding the easiest classes. Worst-case scenario: the easier the class, the higher the enrollments, to the detriment of the difficult, challenging and perhaps uncomfortable classes that can lead to real intellectual growth. Conversely, what about faculty who’d prefer to teach smaller classes? Less grading, more time for research, right? Easy fix: Grade on a C-minus curve. If you have a thick skin and don’t care much about your public profile, your own workload just got lighter.

Don’t get me wrong. I’m not suggesting that good grades reflect problematic teaching or that a harsh grader is automatically a more challenging teacher — far from it. Personally, I’d be happy to do away with grades altogether. I’m also not suggesting that evaluations don’t tell you anything worth knowing or that I don’t trust the great majority of students. To be fair, I have gotten a great deal of valuable feedback from my own sets (in particular from the open comments). As a department chair, I do look at evaluations as one way to get a sense of how my colleagues are doing in the classroom. But like all my fellow chairs and like everybody I know who is engaged in hiring or promotion decisions, I treat that information with considerable caution, as data that need to be read in context. 

That said, I believe strongly that students should be able to find out more about the classes they are taking. But student evaluations are just a small piece of the puzzle, and not a particularly reliable one. In fact, a task force at the University recently concluded that “in courses (with) fewer than 50 students, regardless of the response rate, the evaluation is close to random.” Since just 18 percent of our classes enroll more than 50 students, that means that in 82 percent of them, the evaluations may tell you next to nothing about the quality of the education you are likely to receive.

In my personal and entirely unscientific experience, this rings true. I got one of my highest ratings ever for a class that I know was terrible. Not because I don’t generally care about teaching, but because my brother had just died, and I was consumed by grief. So this time around, I did not care much about teaching. I didn’t care much about my students. I didn’t care much about anything at all. I graded on auto-pilot. I gave almost everybody an A, not because I wanted good evaluations but because I didn’t want to punish students for what I knew was my problem, not theirs. If my evaluations had been public, you would have learned that most of my students “strongly agreed” that they had attended an “excellent class” by an “excellent teacher.” They did not. They had attended a badly prepared and badly taught class by a very sad teacher whose thoughts were elsewhere. 

Is there better information out there? Not really — and that’s a significant problem that students should indeed clamor to see addressed. Here is what I believe would help: first, a required link to a detailed syllabus for each course listing; second, a required substantial course description detailing the goals of the class, the material to be investigated, the balance between lecturing, discussion and engaged learning activities and the nature of assessments; and lastly, significantly improved academic advising (my daughter waited three months for an appointment and only got help after I sent a note complaining about the fact that her e-mails went unanswered).

All of that would provide information much more important than access to a set of data that is likely quite random, pervaded by bias, tied to grade inflation, measuring nobody-quite-knows-what and likely to create anxiety and even public shame in a good number of perfectly wonderful faculty members, particularly vulnerable ones, such as lecturers or untenured professors.

I should stress that I am not opposed to giving students access to these data, in the context of good academic advising, as long as everybody knows what they can and cannot tell you. But don’t kid yourself: Doing so will not improve teaching at the University; it actually might do the opposite.

Instead, we should work together to find better ways to assess teaching and better ways to share information about courses, and, perhaps most importantly, to create an atmosphere in which we treat each other with thought, care and respect. Students pay for their education, and it’s not cheap, to put it mildly. But that doesn’t mean that a college class is a consumer good like any other. Teachers are humans, and to grade them on a five-point scale as if they were a microwave oven on Amazon in and of itself runs the risk of devaluing this great shared enterprise that is the University.

Teaching and learning go together. The quality of a class depends on the professor; it also depends on the students. Remember Question 3 on our evals: “I learned a great deal from this course”? To be sure, it is the professor’s responsibility to create the conditions under which you can learn a great deal. But if you are a student, whether you actually learn a great deal is also up to you. Did you do all the homework? Did you attend all the lectures? Did you turn off your cellphone during class and close out of Facebook? Were you curious and engaged? Did you treat your teacher — and the very act of learning — with respect?

At present, it seems that we will thoroughly revise our current questionnaire and begin to share these new evaluation data with students beginning in Fall 2016. And to repeat, I am fine with that — provided we do so carefully and responsibly, taking into account everything we know and everything we know that we don’t know, with respect for students and teachers alike.

Silke-Maria Weineck is chair of the Department of Comparative Literature, chair of the Senate Advisory Committee on University Affairs and a professor of German studies. 

Leave a comment

Your email address will not be published. Required fields are marked *