Why colleges’ “average class size” statistics are misleading — and how to fix them
Colleges and universities often report “average class size” statistics to provide information to students, prospective students, parents, and others evaluating colleges for their academic offerings and potential fit. These statistics can take several forms. For example:
- Amherst College reports an average (mean) class size of 19.
- Harvard University reports a median class size of 12.
- Columbia University reports that 80% of classes have fewer than 20 students.
However, each of these statistics, while technically accurate, are misleading because they significantly understate the class sizes that the average student will find themselves in. That is, these metrics reported by colleges do not actually reflect what users of these statistics care about when looking up these numbers (the potential student experience). As described below, a preliminary look across a sample of national colleges and universities shows that the average class size experienced by an average student is over double what colleges report using traditionally calculated statistics.
This discrepancy is based on a subtle but key distinction between average class size and average class size experienced by a student. These metrics are not the same. In this article, I’ll draw out this distinction and why exactly the current methods are misleading, and propose a better method for measuring this statistic in a way that provides relevant information to students and prospective students. Finally, I’ll use real data to show how implementation of this method changes our measurement of average class size from top US universities.
Illustration of the problem
At first, it might seem redundant to say that larger classes have more students taking them. But this is the key of the mathematical problem with using averages with class size data. Smaller classes (however defined) have fewer seats compared to larger classes, so, on average, students will find themselves in larger classes, which comprise a greater proportion of the total seats across classes.
To illustrate, consider a hypothetical university with 100 students. Let’s say this university offers only five classes; four of these classes have a class size of 1, and one of these classes has a class size of 96. As shown below, the average (mean) class size for this university is 20 students:
However, something is misleading here. In this example, even though the average class size is 20, 96% of the students are in a class with 96 students! The classes with 1 student can, of course, only be taken by one student, so the other 96 students will end up in the larger class with each other.
To correct this statistic into something more accurate, we need to shift our thinking to change the unit of analysis from the class to the student. We can ask: how many students are in each student’s classes, on average? This type of thinking is ultimately what students and prospective students care about when evaluating these statistics: what is my experience at this school likely to be?
A more student-centric statistic
To move to a statistic that better corresponds to the student experience, we will use frequency weighting to find an adjusted mean, where we weight each class by the frequency in which students are in that class. A class that contains 96 students should have 96x weight versus a class that contains one student.
For comparison, the unweighted mean (used by Amherst College, for instance, in their statistic) uses the following formula:
where n is the number of students in a given class c, and C is the total number of classes.
Our new frequency-weighted mean uses this formula:
Why does a squared term appear in the formula? We want to weight each class size by the number of students in it. This results in the class size being multiplied by its weight, which is the same number.
Let’s return to the example above with the university of 100 students. Using the same numbers that resulted in the average class size of 20, we can calculate our adjusted mean:
Our adjusted average (mean) class size (92.2) is over 4x the traditionally used mean! The interpretation of this number is that, for this hypothetical university, the average student’s class size is 92.
Using real data: how does this change “average class sizes” in practice?
Now, let’s use some real data to illustrate how this new method changes the average class size statistics. The site CollegeData provides some statistics on the distribution of class sizes for each school, and we’ll use this as our sample. For instance, here is their reported data on Harvard:
To make this data usable for analysis, we’ll need to make some assumptions. For each range of class sizes, I’ll use the midpoint (e.g., for classes in the 20–29 student bucket, I assume there are 25 students in each class). The trickier assumption comes from what to do with classes that are “over 100 students.” Without other baseline data, we’ll presume for now that for each class over 100 students, 25% of these classes will have 125 students, 25% will have 150 students, 25% will have 175 students, and 25% will have 200 students. Further sensitivity analysis can play with the exact shape of the distribution and understand the impact on results.**
Under the original method, using this data, Harvard’s average class size is 22.1. Under the adjusted method (frequency weighting), their new average class size is 73.4 — a 232% increase!
How does this compare to other schools? We can repeat this process across a greater sample of schools. Using US News and World Report’s college and university rankings and the CollegeData statistics, I pulled the class size distribution data for the top 50 ranked national universities and top 10 ranked national liberal arts colleges.* The graph below shows, for each of these schools, the unadjusted mean on the x-axis and the adjusted mean on the y-axis. Data is available for download here.
We can see the clear linear relationship between average class size and adjusted average class size. The school with the smallest class size, Washington and Lee University, moved from an average of 15.3 to an adjusted average of 18.3, while the school with the largest class size, UC-San Diego moved from an average of 53.7 to an adjusted average of 117.9.
We can also compute the percentage change between each school’s average class size and their adjusted average class size. Doing this, we find that our new statistic is on average 116% greater, meaning that a given student’s average class size is more than double the average class size.
How much variance is there in the percentage change by school? Differences in the percentage change across colleges are driven by differences in their distribution of class sizes. Interestingly, when we look at a plot of the average class size vs. the percentage change from the unadjusted average to the adjusted average, we see no clear relationship:
Looking at the graph above, we can conclude (somewhat factitiously) that Harvard’s use of unadjusted mean for average class size would be more misleading than use by other any college in the sample!
Colleges and universities’ use of average class size of statistics can be highly misleading because these statistics do not reflect the average student experience. We proposed using frequency weighting to produce a less misleading statistic that reports an adjusted average class size that better reflects what students, prospective students, and other users of these statistics are looking for. Applying the new formula to real-world data, we showed that for top universities, a given student’s average class size is over double what would be reported by traditional “average class size” statistics.
* Data unavailable for UC-Berkeley, California Institute of Technology, Columbia University, University of Florida, University of Georgia, University of Michigan, University of Rochester, and the College of William and Mary.
** This is a somewhat conservative assumption because we know some classes in reality have over 200 students, but we can justify the assumption by the shape of the student experience utility response curve for class sizes, i.e., past a certain point of class size, the student experience is the same because the class is essentially a large lecture with probable breakout groups. Whether there are, say, 200 or 300 students in the class does not highly impact the student experience.