Generating pairs that are too close in score harms accurate final rankings

This is more of a design consideration for future development. Our current pair generator tries to minimize the difference between scores when generating pairs. This actually enhances the ranking reliability when we have expert judges, as they are able to finely tell the difference between answers. But with untrained judges, it hurts ranking reliability as it's harder for them to tell answers apart.

For untrained judges, having a gap between the scores of two different answers makes it easier for them to tell the answers apart and also give 'incorrectly' judged answers more of a chance to climb back up. We should consider implementing this gap for our pair generator.

There's two additional factors for consideration due to the nature of ComPAIR as a learning tool rather than an assessment tool:
1. There might be more pedagogical benefit in having students try to distinguish between two very similar quality answers.
2. Even with this score gap, it's recommended that we have around 12-15 rounds of comparisons for a reliable ranking. This is far more comparisons than the usual 3 rounds that is ComPAIR's default.

So perhaps the size of the gap could be made configurable.

Thanks to Peter Thwaites (UCLouvain) for bringing this up and providing the papers below:

1. [Rangel-Smith and Lynch - 2018 - Addressing the issue of bias in the measurement of.pdf](https://github.com/ubc/compair/files/9710206/Rangel-Smith.and.Lynch.-.2018.-.Addressing.the.issue.of.bias.in.the.measurement.of.pdf)
2. [Bramley - 2015 - Investigating the reliability of adaptive comparat.pdf](https://github.com/ubc/compair/files/9710208/Bramley.-.2015.-.Investigating.the.reliability.of.adaptive.comparat.pdf)
4. [Bramley and Vitello - 2019 - The effect of adaptivity on the reliability coeffi.pdf](https://github.com/ubc/compair/files/9710209/Bramley.and.Vitello.-.2019.-.The.effect.of.adaptivity.on.the.reliability.coeffi.pdf)

Paper 1 provides recommendations for the score gap size. Papers 2 & 3 details the issue with 'highly adaptive' pair generators like ComPAIR's.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generating pairs that are too close in score harms accurate final rankings #1040

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Generating pairs that are too close in score harms accurate final rankings #1040

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions