IB Score Reports

 

 

A senior IB examiner and test creator said recently, “You know, it’s not about sevens or threes or world averages, it’s about the two years that the students spend in your classroom.  That’s what really counts.”

He is right.  We sometimes lose sight of the fact that in many cases grades, marks and scores are not the same as education or experience.  Personal relationships are important, as is personal growth.

On the other hand, as professionals (teachers and administrators) we need to understand our jobs and our methods.  And if we are competent, dedicated professionals, then we will be looking for ways to improve.  When a school invests in the IBDP, it means that measurement of student performance is laid down in the 1-7 mark given out by the IBO (or “earned” by the candidates, depending on how you want to look at it.)  And so, in some respects, it is about the sevens and the threes.  Whenever there is a reductionist system of reporting achievement, where two years of learning and growth comes down to a single number between 1 and 7, then the highs, lows and middles will definitely be important to teachers and schools, whether they should be important or not.  This is the nature of reporting student achievement.

This analysis is meant to be constructive.  My hope is that by analyzing data effectively, the IB will be a better, stronger program and that schools who invest in it will benefit from its growth.

If you are at a school that offers IB courses, you might hear your Biology teachers complain that the IB doesn’t give out 7s to them, but they seem particularly generous with the Physics scores.  Or you might hear an English teacher reminiscing about the good old days, when students could actually get 7s, when 6 wasn’t the de facto maximum score.

Why are 7s so scarce in English and Biology?  Is it the teaching?  Is it because students who take English and Biology are generally weaker than those that attempt IB Physics?  Is it because the IB assessments are generally harder in English and Biology, so fewer students as a percentage of the population are able to achieve top marks?

Wandering Academic decided to give the IB world score distributions some serious thought.  It’s hard to avoid questions when you look at the data.  Things grab your attention like neon signs in a red light district (so I’ve heard).  In fact, the entire situation is much like a peepshow: portions of the process and philosophy behind the IB assessments are transparent, laid bare for all to see, but other portions are shrouded in mystery and intrigue, hidden behind a curtain.  If you work with IB, you probably have an overwhelming urge to pull back that curtain entirely, to see what’s happening in the darkness beyond.  We hoped that by presenting the data in a particular way, we would at least shed a dim light on what’s going on with IB assessment and marking.

The purpose of this inquiry is mainly to raise questions, as well as to present some interesting IB historical data in a format that is useful.  The IBO does not report data in a particularly useful format.  Be that as it may, even when cleaned up and clearly presented, I think you’ll find that the data doesn’t support hard conclusions.  What this inquiry might do is dissuade people from jumping to conclusions that aren’t sound, and Wandering Academic hopes put a little pressure on the IB to provide this sort of analysis for schools (who pay huge sums to adopt the curriculum) and teachers (who exert huge energies to prepare students adequately).

Before the inquiry, a note on the visualization of the numbers.

  1. Line graphs are typically used for continuous data.  This data is not continuous, and really we should be using a bar graph.  The reason we chose line graphs is that the lines illustrate the shapes of the distributions from year to year more clearly than the bunched up bar graphs.
  2. In order to not misrepresent the data fundamentally, we have included tables to the right of each graph (see Appendix for full data set of selected course distributions).  The numbers in these tables are derived from IB statistical bulletins published each year (see IB Stat Bulletin for 2009). It has been impossible so far to compare results year to year in any meaningful way – the WA graphs and tables you see in this analysis and in the appendix allow for actual comparisons and data-related questions to be raised. The collection and calculation of this data was a group effort requiring serious elbow grease, in particular the design of the visualization, which I, the editor, cannot take credit for.
  3. For distribution percentages, the values are rounded to the nearest percentage point, where “-“ is <0.5%.
  4. The data is correct.  It has been triple, double-decker checked, number by number.  The full data set can be viewed or downloaded at the link below.

See Appendix, Selected IB Score Distributions, Graphs and Tables

Let’s take a broad view of the data (referencing the Appendix here will be helpful).  There are a number of intriguing events.

  1. The difference between the shapes of the distributions is striking.  Some courses yield a very low percentage of 7s, others a relatively high percentage.  The same is true for very low scores, like 2s and 3s.  Some courses and Groups yield “flat” distributions, and others are “bunched up” in the middle.
  2. Within some courses, there is remarkable variation in the distribution from year to year, as well as a few courses that follow a regular distribution but for one obvious outlier.
  3. Some courses display an incredible regularity of score distribution from year to year.
  4. Some courses show a distinct distribution shift over time.

Now a few striking cases, by way of example.

[Figure 1]

English A1 HL is one of the most popular courses in the IBDP.  As you can see in Fig. 1, almost 140,000 students took the course in the past five years.  Another striking feature of this course is the growth, which presumably relates to the growth in the number of schools offering IB courses. In 2005, slightly more than 21,000 candidates took the course, and four years later, almost 33,000 did so.  This represents a 51% increase.  And yet the score distributions are lockstep.

What makes anyone think that the distributions from year to year would vary to any significant degree, even if the population of candidates were increasing so rapidly?  The opposite question is raised as well: what makes anyone think that the distributions wouldn’t change?  The variability in scores could be due to variability in student performance, which could be a function of the relative strength of the population as students or to the strength of their teachers (or a combination), or it could be due to variability in scoring methods.

The IB is growing.  The growth that the IB is seeing is largely in the US and UK.  Recent media attention (NYTimes.com and The Independent) on the subject speaks to this, as well as the numbers in the IBO documents (IBO school growth stats).  If the growth the IB is seeing is in US and UK schools, representing a particular population of students, it would seem that the distributions would begin to skew, at least slightly.  This might be true because the students are in some way distinct from the international crowd, or it could be because the teachers are different.  Also, new IB programs means a large influx of new teachers, those who are not as familiar with the IB program, assessments and methods for training students to perform well on those assessments.  The IB can be frustratingly beaurocratic and labrythine with rules and regulations, forms, curriculum choices and deadlines, especially for a fresh teacher.  Surely it would take a group of new teachers, or a freshly minted IB school, time to hit their stride with scores.  This doesn’t seem to be the case.

The other answer is that teaching has less to do with the results than the students themselves.  And although there doesn’t seem to be a correlation between SAT averages and average IB scores (see Wandering Academic School Rankings and Arugula Design’s Analysis) there might be some correlation: smart kids get higher scores, no matter if their teachers and administrators are new to the program or not.  Not enough data has been collected on this comparison yet, at least none that we can find.

Now, let’s consider a course with very high year-to-year variability, Environmental Systems SL (Fig. 2).

[Figure 2]

Environmental Systems is a far less popular course than English A1 HL, but even larger growth is evident from 2005 to 2009 (+73%).  Here we can see an interesting trend: the average score dropped from 4.69 in 2005 to 4.01 in 2009 (-14.5%), with a large drop between 2006 and 2007 (-7%).  Again, no solid conclusions can be drawn from this data alone, but the fact remains that as the population has increased, average scores have decreased.  But averages are only part of the story.  The line graph here is particularly vocal: the distributions are very different, compared to the lockstep agreement of the English HL distributions.  In particular, note the steep increase in 2s and 3s (100%+) compared to the decrease in 5s and 6s (approx. -50%).  Again, this is either due to the change in student population, the difficulty of the curriculum, or the scoring methods.  It’s as if Environmental Systems is falling in line with the other sciences by flattening the distribution.  Compare it to Biology SL, Physics SL and Chemistry SL (Fig. 3).

[Figure 3]

You’ll also notice that each of the other major SL science courses have seen a decrease in average score from 2005 to 2009, but less dramatic than Environmental Systems, while at the same time maintaining a far more consistent “flat” distribution.

Here’s another interesting fact: Almost no one gets 1s or 2s in English courses (Fig. 4).  English A1 courses appear to be unique in this respect (see Appendix for full comparison) among courses with significant populations, although History comes close.

[Figure 4]

The lack of 1s and 2s (and the relative scarcity of 7s) is noteworthy, but again, there are no solid conclusions available.  It may be that assessment methods particular to English A1 tend to shoe-horn candidates to the upper middle of the scale.  Anecdotally, an experienced English examiner I know told me that her highest scores tend to be moderated down and her lowest scores tend to be moderated up, which fits the worldwide distribution.  English represents the typical “bunched” distribution of humanities courses.

The last case we will consider here is Theatre SL, which jumps out from the group with its erratic distribution (Fig. 5).

[Figure 5]

The population of course takers is very low, but relatively stable.  We don’t see the growth here that other courses have experienced, possibly because not every school is large enough to support a full theatre program, or that if they do have a theatre program, they focus on the HL, which has grown (1,298 in 2005 to 1,833 in 2009).  Nonetheless, between 2005 and 2008 1% or fewer students received 7s.  This is unique in itself.  But then in 2009, the 7s jump to 5%, which is more in line with the average humanities class, if still a little low.  What else happens to the distribution?  Well, the large number of 2s and 3s are shifted up to 5s and 6s, making it look much more like the other humanities distributions.  This could be related to the change in assessment model or curriculum that went into effect in 2009.  It would be interesting for an IB Theatre teacher or other insider to outline this change and speculate on the upswing in scores.  This is one case where it seems clear that a curriculum change went along with a shift in the basic standards for marking.

Is there a target distribution for various groups?  In general, the major science courses are fairly flat, and the Environmental Systems SL has shifted in that same direction.  In general, humanities classes have a high percentage of 4s and 5s and low percentage of 1s and 2s, and the Theatre SL distribution has shifted in that direction, as well.  If I had to make a prediction, I’d say that the 2010 numbers will support those trends.  Whether or not the IBO consciously manipulates grade boundaries to reflect these distribution targets is a question well worth considering.

The Appendix includes distributions of Extended Essay scores by Group.  A brief word on these: EEs are reported on a scale from E to A, not 1 to 7.  But it is clear immediately that “Excellent” ratings are much more likely for EEs than for courses, if we take A to be the equivalent of a 7 in a course.  But if A is equal to 6 and 7, the distributions are similar.  In Group 1 (Fig. 6), for example, around 20% of EE candidates earn As and around 16-20% earn 6s and 7s in English A1 HL, also part of Group 1, although this number has dropped somewhat since 2005 (Fig. 1).

[Figure 6]

Overall, I think these trends and comparisons will at least be interesting to schools and teachers, and I hope that it starts a discussion on more wide-ranging questions of curriculum and scoring methods.  In particular, I hope to encourage teachers of English A1 courses; for whatever reason, 7s are elusive in your discipline.  Don’t let those Physics and Chemistry 7s get you down: your averages are significantly higher.

I hope this will also show administrators and teachers (and parents!) that it is practically useless to judge the quality of a particular teacher according to raw score distributions.  So much evidence points to the fact that teachers are less responsible for these scores than they like to think.  And while school average IB Diploma candidate scores are fun to compare, they also are less a function of the quality of a school than they are a reflection on the strength of a group of students.

While this sort of analysis might not change the face of the IB, it does raise some questions about marking and performance that are important.  It is surprising that the IBO itself doesn’t do this kind of reporting; their yearly statistical bulletin is woefully difficult to read, as I mentioned. Comparative analysis is next to impossible. Culling the numbers and cleaning them up was labor intensive, much too labor intensive.  But until they get their act together with data reporting, others will have to step up.

If you think this analysis is interesting, please email it to a friend or colleague by clicking the little envelope below.  You can also print this page by clicking the little printer icon.  Also consider Tweeting (Twittering?) this link, posting a link on Facebook, or other method of sharing.  You might also bookmark Wandering Academic to stay up to date on other analyses or articles we post.  If you use the images or the appendix for any other publication, please credit the website.  Also, comments or questions are welcome, leave them below.

 

25 Responses to Analysis of IB Score Distributions, 2005-2009

  1. Paul says:

    These graphs are very simple variations in grade distribution with time and we see that some vary in shape over the five year sampling period, and some do not. I’m not yet convinced there’s anything “intriguing” within them!

    Given the variation in Theater graphs, I agree that the 2009 change in curriculum may be a contributing factor. Especially since the English A1 curriculum was, I think, introduced for 2001 exams and so by 2005 teachers were very familiar with it and with how to prepare students for the assessments. That familiarity might contribute to its stability.

    As for the differences in shape (and hence relative numbers of 1s, 2s, 3s, etc), are these differences actually significant in an unrelated set of criteria-referenced distributions based on criteria that were developed at different times by unrelated groups of educators in unrelated fields?! I don’t know if the IB states anywhere that a 2 or a 7 in English HL should be equally as attainable as a 2 or a 7 in every other subject? If they do, they seem to have failed! However, if individual IB curricula are developed independently, without consideration of what proportion of students “ought” to get a 2 versus a 7 in that (or any other) subject, then it should come as no surprise at all that the shapes of the graphs vary.

    Finally, I find it interesting that it is conceded above that “while school average IB Diploma candidate scores are fun to compare, they also are less a function of the quality of a school than they are a reflection on the strength of a group of students” and yet this site prominently displays school “rankings” that include such averages!

    Best regards.

  2. I.B. Intrigued says:

    Paul,

    I find it intriguing that some courses exhibit great variation in score distributions year-to-year and that some do not. With respect to the lack of variation, let’s remember that the IB is (supposedly) not norm referencing. In courses where the distributions seem lock-step year after year, are we really to believe that students are naturally achieving the various levels in the same proportions, year after year? With respect to the variation, the obvious question is *why* is there such variation? Are student performances radically different each year? Is it measurement error? Changing standards? What assurance do we have that it’s actually the former and not one of the latter? And isn’t it intriguing that some courses do exhibit considerable variation while others exhibit essentially none? What is the explanation for that? Finally, is it in fact more difficult to earn a 7 in Biology HL than Physics HL? If so, do you think universities are made aware of that fact? Shouldn’t they be, since they often award credit for such scores, or (presumably) make admissions decisions based on such scores.

    The displays raise many important questions, which I hope the IB sees fit to address.

  3. editor says:

    Thanks, Paul, for your comments.
    You’ve hit the nail on the head, in my opinion: some of the shapes vary, some do not. You might not find this phenomenon intriguing, but I do, and I don’t think I’m alone. If nothing else, it raises the question “Why?” Why is there such remarkable regularity for some courses and in others the distributions are all over the map (or shifting in a particular direction)? Does this mean that the marking practices change dramatically from year to year? Does this mean that the student population changes that much year to year (although at least in English and History, an increasing population doesn’t seem to correlate with shifty distributions)? All sorts of questions arise from this. It’s all right if you are not interested.

    Your argument about the regularity of the English A1 curriculum is creative, but can’t be the entire story. Given the meteoric growth of the subject, the number of teachers would likewise swell, and the number of new teachers of the subject (or new to the IB in general) will necessarily be high. The theater shift is interesting because of the drama of its movement (pun intended), but also because it is a clear indication of how marking practices and curriculum affect scores. This can alert teachers and administrators to changes in other curricula, to look out for changes in marking standards as well. That seems pretty important to me.

    As for the distance between subjects: perhaps you’re right. Perhaps it is ludicrous to assume that a 7 in English would be roughly as attainable as a 7 in Physics. But is it? The scales are the same and the criterion for success in Higher Level classes applies to ALL HLs (in that, a candidate must achieve a certain score in each of the HLs in order to receive the diploma). The IB Diploma system appears to weight each HL equally. In fact, if it is an established fact that a 7 in Physics is demonstrably and admittedly harder to get than a 7 in English, well, the harder 7 ought to be weighted, shouldn’t it? Even more, wouldn’t students want to know this kind of thing? Scores are high stakes, whether or not we like that about them, and students have to game the system as well as they can. But in reality I think that the distributions are affected by more than their relative difficulty – the types of students who attempt each subject must come into play.

    At last, you found something interesting! I appreciate that, it means that the hard work has paid off. I display the rankings because they are data that can be lined up in a table, and it is fun and interesting to look at those numbers. Administrators can use the table to compare their school data with other benchmark schools. I don’t draw conclusions about the quality of schools listed, I simply rank the schools by the data available. In this case, it is pretty limited: SAT, GPA, etc. Particularly of interest is the relative weight that honors or advanced classes can receive at a school. People can use it as they see fit. You would be surprised, I suppose, by all the positive comments I get about that table. So, I guess I don’t mind if you don’t like the table. It took me a while to figure out how to make those rows sortable. (Actually, if you are a school head or counselor and want to help me update those numbers or list your school, let me know: editor [at] wanderingacademic.com)

    Thanks again Paul, I appreciate you taking the time to respond.

    • Paul says:

      Good morning “IB Intrigued”,
      Thanks for the reply.
      “You’ve hit the nail on the head, in my opinion: some of the shapes vary, some do not. You might not find this phenomenon intriguing, but I do, and I don’t think I’m alone. If nothing else, it raises the question “Why?”
      You express concern that the English data *don’t* show a variation and also that the Theatre data *do*. But I don’t know the origin of your concern: Are there statistical reasons to expect that, for a group of unrelated academic subjects, developed at different times by different people, the resulting grade distributions should or should not be similar? Or, does the IB state publicly that it’s curriculum teams develop courses with the specific intention of making each level of achievement equally attainable in each course? Any irregularity is only “remarkable” if there is a convincing reason why the data should *not* look like it does and I can’t yet figure out that convincing reason. And I’m not saying it doesn’t exist – I just don’t know what it is.
      The English sample is ~40x larger than the Theater sample, which makes comparison difficult, but samples of the order of ~30,000 (English) would seem large enough to me to eliminate some of the annual irregularities in that smaller sample (Theater).

      Regarding the awareness of universities, thousands of universities in North America (the IB’s apparent target market) make admissions offers based on regular high school grades that are *much* less reliable in terms of comparison with each other than IB or AP scores. And since universities only ever compare students against others in the same year, any variation from year to year is perhaps less relevant. A student who gets a 7 in 2009 will appear a better physics student than one who gets a 4 in the same year, regardless of how the distribution of grades changes the following year. I wonder if A grades in A-Levels are intended to be equally accessible? Do you know?

      Anyway, I’d be interested to hear more detail about exactly *why* you find the irregularities remarkable, which may help narrow down the “important questions” that should be addressed by the IB.
      Best regards.
      (PS I’m by no means an “IB apologist”! I’ve been as frustrated as anyone by some of their practices and recent decisions, but I think we have to be clear and specific about what concers we raise with them).

    • Paul says:

      Hello Editor,
      Thanks for responding to my comments. It’s an interesting topic.
      A couple of comments back to you:
      My suggestion about English wasn’t intended as an “argument”, more of an observation. In my experience, teachers take a year or two to adjust to new IB curricula and associated student expectations. So it seems reasonable to me that in a course (English A1) for which the syllabus has not changed over a relatively long period, consistency is more likely.

      I simply don’t know if it’s “ludicrous” or not to expect a 7 to be equally attainable in English as in physics. That’s what I’m asking you (like my question to “IB Intigued” above): what is your reason for thinking it *should* (or *should not*) be equally attainable? That will give me a clearer perspective from which to view the data and your analysis.

      It’s not that I “don’t like” the table! Of course it’s interesting. But the choice of “ranking” in the title rather than, say, “data” has clear connotations, intentional or not. While educators may view the table with consideration of factors (resources, demographics, course/school admission criteria, etc.) that influence the data, others will no doubt see “ranking” and assume that one school is better than another. If you simply “display the rankings because they are data that can be lined up in a table” and you “don’t draw conclusions about the quality of schools listed”, then why did you choose “ranking” as the heading and not “International School Standardized Testing Data”?

      All good thought-provoking stuff. Thanks for the time and effort.
      Paul

  4. Paul says:

    Good morning “IB Intrigued”,
    Thanks for the reply.
    “You’ve hit the nail on the head, in my opinion: some of the shapes vary, some do not. You might not find this phenomenon intriguing, but I do, and I don’t think I’m alone. If nothing else, it raises the question “Why?”
    You express concern that the English data *don’t* show a variation and also that the Theatre data *do*. But I don’t know the origin of your concern: Are there statistical reasons to expect that, for a group of unrelated academic subjects, developed at different times by different people, the resulting grade distributions should or should not be similar? Or, does the IB state publicly that it’s curriculum teams develop courses with the specific intention of making each level of achievement equally attainable in each course? Any irregularity is only “remarkable” (editor’s later comment) if there is a convincing reason why the data should *not* look like it does and I can’t yet figure out that convincing reason. And I’m not saying it doesn’t exist – I just don’t know what it is.
    The English sample is ~40x larger than the Theater sample, which makes comparison difficult, but samples of the order of ~30,000 (English) would seem large enough to me to eliminate some of the annual irregularities in that smaller sample (Theater).

    Regarding the awareness of universities, thousands of universities in North America (the IB’s apparent target market) make admissions offers based on regular high school grades that are *much* less reliable in terms of comparison with each other than IB or AP scores. And since universities only ever compare students against others in the same year, any variation from year to year is perhaps less relevant. A student who gets a 7 in 2009 will appear a better physics student than one who gets a 4 in the same year, regardless of how the distribution of grades changes the following year. I wonder if A grades in A-Levels are intended to be equally accessible? Do you know?

    Anyway, I’d be interested to hear more detail about exactly why you find the irregularities remarkable.
    Best regards.

    • I.B. Intrigued says:

      Paul,

      Thanks for your response.

      Lack of variation in score distributions raises concern for a criterion referenced assessment, especially since the IB has grown dramatically in recent years, which means many inexperienced schools have been preparing students. One explanation is that the IB is actually norm referencing their results. Tweaking their scores to achieve the same distribution year after year.

      Substantial variation in score distributions raises concern because one wonders what is causing the variable. Is it student ability and performance? If so, fine. But what if it’s measurement error or changing standards? How do we know?

      I can’t make my concerns any clearer. The burden is on the IB to explain what the data show.

      That high school grades are less reliable is hardly a defense of IB scoring.

      • Paul says:

        Hello,
        “One explanation is that the IB is actually norm referencing their results. Tweaking their scores to achieve the same distribution year after year.” That is one possible explanation, certainly – thank you for clarifying your concern. Since the scoring bands are reviewed each year, thatcould help that “tweaking”. But that’s one of my questions: does the IB have public aims regarding percentages of students who get each grade or equity between courses? Or do they flat out deny any attempts to promote consistency from year to year. Your concern *seems* to be (and please correct me if I’m misinterpreting) that the IB publicly denies the practice but does it anyway.
        If they don’t do any tweaking, I wonder how signficant (stastically, not perceptually) are the variations (or lack of) in the graphs?

        “That high school grades are less reliable is hardly a defense of IB scoring.”
        Absolutely – it wasn’t meant to be. You mentioned university admissions and suggested that universities need to be aware of these concerns about IB grades. I am not yet convinced that the concerns presented here are any more signficant than those that exist, and are recognized by many, for AP, SAT or other standardized tests. For me, any concerns that can be raised about IB scoring distributions are small compared to those that exists for non-standardized school grades, which form the basis of many more university admissions decisions than do IB grades.

        • I.B. Intrigued says:

          Hi Paul,

          The IBO explicitly states, “IB scores are criterion referenced rather than norm referenced.”
          Source: http://www.ibo.org/ibna/recognition/documents/2008DPleaflet.pdf

          Lock-step score distributions year after year strike me as surprising for a criterion referenced assessment — especially given the considerable rise in the number of examinees and the changing background of the candidates.

          Variability in score distributions on a criterion-referenced assessment should be a reflection of changing levels of achievement. Is that the case with the IB distributions? How do we know?

          Regarding universities, I mentioned not only admissions decisions, but also awarding of credit. Surely, a university expects a 7 to represent the same level of achievement year to year.

          Regarding other standardized assessments, the SAT is norm referenced, and thus is intended to show how a particular student’s performance compares with an established norm reference group. SAT scores are highly reliable.

          The AP is criterion referenced, but they use sophisticated statistical and psychometric techniques to ensure that scores are comparable year to year. In particular, they re-use questions (which are not released to the public) to equate scores on two different administrations of exams. In addition, they continually re-assess their standards to ensure alignment between AP scores and student achievement in the related university course. That, in fact, is their goal: to give assessments and scores that allow them to determine how a student would have faired in the related university course. You can read about the AP’s methods here:

          http://www.collegeboard.com/student/testing/ap/exgrd_set.html

          Best regards.

          • Paul says:

            “The IBO explicitly states, “IB scores are criterion referenced rather than norm referenced.”
            Source: http://www.ibo.org/ibna/recognition/documents/2008DPleaflet.pdf
            Lock-step score distributions year after year strike me as surprising for a criterion referenced assessment — especially given the considerable rise in the number of examinees and the changing background of the candidates.”

            Thank you for the patient explanations – this (above) summarizes it effectively.
            Some of the science distributions seem similar, but not as similar as the English. I wonder how they compare to the expected variation in criteria-referenced scores?
            An interesting topic, no doubt. I look forward to following the IB’s attempts to address the various assessment-related issues that educators (try to) raise with them.
            Regards,
            Paul

  5. Toby says:

    I don’t think I have the statistical prowess to judge either Paul’s or the editor’s comments above, but as a five year IB history teacher, one fact seems to jump out at me in looking at these graphs. It’s a point that’s been made above, I think, but seems worth repeating. Regardless of test or course difficulty, and regardless of whether or not different subjects create wildly different standards for what a “3″, “5″, or “7″ might be, in the end, the students are trying to get an IB diploma. One of the criteria for this is that they must take 3 HL classes. Moreover, they must achieve an average score of “4″ in these three classes, and cannot get a “2″ in any of them. In other words, a student could get “7′s” in English HL and History HL, but get a “2″ in Chemistry HL and lose the diploma.

    If different subjects seem to be awarding “2′s” and “3′ s” at different rates, as a student, I’d be tempted to choose HL courses that give me the greatest chance of not getting a “2″ or a “3″. Gaming the system, as you put it.

    In the end, I don’t believe we can achieve parity between the courses, and I’m not sure that I’d want it even if we could. But I am worried about what this “gaming” concept could mean for the aspect of the IB that I cherish the most: the concept that students should choose HL courses based upon what interests them. I’d hate to think that students might be signing up for my HL History course because they think it’s going to get them a better shot at getting the IB dipoma than a course they might personally find more interesting, but seems to offer a greater chance of failure.

    • I.B. Intrigued says:

      Toby,

      Students should not infer that Subject A is easier than Subject B, simply because Subject A has a higher percentage of 7′s (or whatever).

      Students choose their courses, and subjects that attract weaker students should exhibit lower score distributions (reflecting lower levels of achievement). Those that attract stronger students — even if in fact the Subject is harder — might exhibit higher score distributions (reflecting higher levels of achievement). Assuming the IB is consistently assessing students!

      Without mentioning specific subjects, there are definitely HL courses at our school that attract weaker students, and HL courses that only attract our very best.

  6. [...] reading the interesting IB Statistical Analysis over at Wandering Academic, I was left with a number of unanswered questions. The post there deals [...]

  7. Nate says:

    The distributions are fairly stable – the ones that are not probably had some curriculum shift, as has been noted.

    I’ve taken the data and done some of my own aggregations. Some interesting patterns emerge. See http://blog.aruguladesigns.com/2010/08/16/maximize-your-ib-score/

    • editor says:

      Thanks for the follow-up analysis, it is really interesting. As I posted in your comments, it appears to show (at least with this limited course listing) that the ideal schedule is:
      1 – English A1 SL (5.00)
      2 – Spanish B HL (5.51)
      3 – Economics HL (5.09)
      4 – Chemistry HL (4.60)
      5 – Math SL (4.62)
      6 – exempt, by taking French ab initio (4.97)
      EE – English A1 (+1.24)

      This yields an average score of 31.03, which is easily more than any of the past 5 years’ average scores (May 2009 was 29.51). Thanks again, Arugula.

  8. dtarcy says:

    This is good data, thanks. I would like to see some correlation on how the various aspects of the program correlate. for example, in Group 4, 48 % of the assessment comes from options or Internal Assessment. this is different from person to person meaning only 52% of the grade is from a common assessment tool. If the large part comes from student selection, say options, how does the grade on one option correlate to the common grade component compared to another option. ? Eg, Of all studenst earning a 5 on Paper 2., what is the average grade for these students in each of the various options??
    Thanks for any input. I’ve been trying to get this data but have no idea how?
    Dave Tarcy..

    • editor says:

      What you are suggesting is completely possible (if I understand your request), but the only way it could happen is if enough schools put enough pressure on the IBO to provide data on its assessments in a useful format. The statistical bulletin is not a useful format. And it doesn’t include component results, which would be hugely useful as a feedback tool for schools and teachers. If you would like to help me set up a petition, email me: editor@wanderingacademic.com. That’s really the only way that any seriously interesting analysis can be done on IB data, short of manually entering numbers from the statistical bulletin, and even that doesn’t have nearly the entire picture.

  9. Heather says:

    Hello:)! i would just like the advice of a pro to help me with my subject choices for IB
    i takee:
    English A1 HL
    Arabic A2 HL
    Mathstudies
    Enviromental System SL
    Theater Arts SL
    Geography HL

    the problem is that i dont think im capable of getting a good grade in English A1 HL so im very confused as to change it to A2 & take thaeater arts in HL wich will also be very difficult.

    May you please help me & tell me whats a better choice? English A1 HL or Theater Arts HL?
    Thank you.

    • David Tarcy says:

      Hello Heather,
      As you asked for advice, I think you are making wrong choices. Choice of subject should not be about which is easier, where can I get a better grade? That is not what life’s about. You should choose things you are good at an enjoy because that is where you ultimately meet success. If you enjoy something, albeit something difficult, you will put the time and energy into it because you enjoy it. You will find success and inherent rewards in choosing things this way. If you choose to take easy things, or things which may lead to good grades ( or good money), and those are your only reasons for choosing them, you are on a path to misery. You might meet the goal, but you won’t meet success, happiness, or any other form of internal satisfaction.

      Have you ever wondered why so many people become teachers? We don’t make great money, we work long hard hours, and often fight uphill political battles. The inherent rewards of our profession bring outstanding person after outstanding person into it. I honestly believe there are very few professions which bring the creativity, life long passion, and intense internal satisfaction as teaching. So, think about what you are good at, what you enjoy, where is your passion, your beliefs, your real dreams, and follow them.
      Sincerely,
      David Tarcy
      Jakarta International School

  10. erin schuman says:

    Interesting analyses.
    Data are plural, a datum is singular. Therefore, data are, not data is.
    ems

  11. maarten gabriels says:

    i really have nothing to add on to what was said above but i have a question about IB Chemistry.

    to give a little background information: i just started the full IB diploma this year and im taking Chemistry SL.
    in chemistry i am experiencing difficulties in writing my calculation part for the Lab reports that we do.
    so i asked my teacher for help in an early stage because i couldn’t do it after attempting to do it myself. what my teacher told me yesterday, is that according to IB, since he helped me in an early stage in my lab report the maximum grade that i can get for the DCP part is a 65% which is a D. is it true that a 65% is the highest you can get when your helped at an early stage in IB chemistry because it does not make any sense

  12. Michael Kirk says:

    Having been trained to teach IB Biology I now find myself teaching IB English A1 and English Literature. There is irony at play in your numbers. If Biology is graded more objectively and English more subjectively (are we in agreement?); shouldn’t we expect to see this in the tightness of the data? Why is English tighter?
    Also, your comment that the best predictor of IB marks is the cleverness of the incoming students also suggests that a school’s admissions policies have more to do with mark outcomes than staffing. Ouch, that hurts.

    • editor says:

      Just because it is more subjective doesn’t mean that it is more random or even more consistent. And as for your second point, it’s a fair question: do the students matter more than the teachers? When applied to a standards-based curriculum, probably. When you are looking at the value-added scores of cohorts, then the teacher is the focus. But when you are looking at the achievement of a group of students against a “rigid” standard, then the students’ initial abilities (that is, where they begin their IB work) probably matters more. But I don’t know, really. I’m just speculating.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>