A particle track responses in the detector systems are used as inputs of a classifier.
For an example, tracking systems provide information about particle track parameters,
track fit quality, particle momentum and charge,
and also decay vertex coordinates. RICH detector gives a particle emission angle
or delta likelihood for
a different particle type hypotheses or quality of circles fit in the detector.
Calorimeters measure a particle energy and
number of responses for these particle for an example,
Muon system tells is a track has hits inside the system or not,
and how much active chambers correspond to the track.
There are many other features.
Outputs of the classifier are six labels.
Five of them correspond to five different particle types.
Muon, Kaon, Pion, Proton and Electron.
The last one corresponds to all other particle types,
and just noisy tracks that are called Ghosts.
Ghost is a track that was recognized by mistake by a track pattern recognition method,
and doesn't correspond to any real particle in the detector.
So, for each track recognized in the detector,
the classifier gives probabilities to belong to each of these particle types.
This problem can be solved in different modes.
In the multi-classification mode,
in one particle versus rest particles mode or in one particle versus one particle mode.
For an example, one particle versus rest mode means that you
train a classifier to separate one particle from all other particles.
Modern detectors in high energy physics provide high quality of particle identification.
In terms of the area under the ROC curve,
it corresponds to values in the range from 0.9 to 0.995 depending on the particle type.
In your programming assignment for this week,
you will be able to train your own classifier for
the particle identification using toy Monte Carlo sample,
you will estimate the quality of the identification for each particle type.
In the machine learning community,
ROC curve is plotted as a dependence of
the true positive rate from the false positive rate.
However, in the high energy physics,
ROC curve is plotted as the dependence of
the one minus false positive rate from true positive rate.
One minus false positive rate is called
Background Rejection and true positive rate is called Signal Efficiency.
Quality of the particle identification
depends on particle parameters such as its momentum,
transverse momentum or energy.
The figure on the slide demonstrates strong dependency of the area under the ROC curve
for a particle type separated from all other types from the particle momentum.
The quality significantly drops in the low and high momentum regions.
Moreover, it's preferable to have no such dependencies.
In other words, we would like to have
flat or uniform dependency of
the particle identification quality on the different particle parameters.
Consider one more example.
Consider a classifier that separates one particle type as a signal versus
the rest particle types as background.
Output of this classifier is shown at the bottom of the slide.
Let's select the classifier output threshold value that
corresponds to the 60 percent of the global signal efficiency.
It means that this threshold selects 60 percent of all signal particles in the sample.
However, in different particle transverse momentum regions,
the signal efficiency differs from the global one as it's demonstrated in the top figure.
In several regions, the efficiency is higher than 60 percent.
But also, there are a lot of regions where
the signal efficiency is much lower than 60 percent.
Such effects bring systematic uncertainties to physics analysis.
And it's preferable when the classifier selects the same ratio of
signal particles on each momentum or transverse momentum regions for example.
Similar dependencies are present is modern high energy physics experiments.
For an example, the figure shown on the slide is provided by the LHCb experiment.
The figure demonstrates dependencies of the power and
efficiency from its transverse momentum for two different classifiers,
and for three different global efficiencies:
60 percent, 80 percent and 90 percent.
In the next video,
we will consider several approaches how to train
a classifier to avoid such dependencies.