The
tutorial introduces and summarizes recent research on the
validity of evaluation experiments in information retrieval.
Evaluations based
on the Cranfield paradigm requires basically topics as descriptions of
information needs, a document collection, systems to compare, human
jurors to
judge the documents retrieved by the systems against the information
needs
descriptions and some metric to compare the systems. How many topics,
systems,
jurors and juror decisions are necessary to achieve valid results? How
can the
validity be measured? Which metrics are the most reliable ones and
which
metrics are appropriate from a user perspective? In addition, the
tutorial will
discuss user based evaluations and their relation to batch experiments.