Many
common pattern recognition algorithms are probabilistic in nature, in that they use statistical inference to find the best label for a given
instance. Unlike other algorithms, which simply output a "best"
label, often probabilistic algorithms also output a probability of the instance being described by the
given label. In addition, many probabilistic algorithms output a list of the N-best labels with associated
probabilities, for some value of N,
instead of simply a single best label. When the number of possible labels is
fairly small (e.g., in the case of classification), N may be set so that the probability of
all possible labels is output. Probabilistic algorithms have many advantages
over non-probabilistic algorithms:
· They output a confidence value associated with
their choice. (Note that some other algorithms may also output confidence
values, but in general, only for probabilistic algorithms is this value
mathematically grounded in probability
theory. Non-probabilistic confidence values can in general not be given any
specific meaning, and only used to compare against other confidence values
output by the same algorithm.)
· Correspondingly, they can abstain when the confidence of choosing any
particular output is too low.
· Because of the probabilities output,
probabilistic pattern-recognition algorithms can be more effectively incorporated
into larger machine-learning tasks, in a way that partially or completely
avoids the problem of error
propagation.
Number of important feature variables
Feature selection algorithms attempt to directly prune
out redundant or irrelevant features. A general introduction to feature selection which summarizes approaches and
challenges, has been given. The complexity of feature-selection
is, because of its nonmonotonous
character, an optimization
problem where given a total of features the powerset consisting
of all subsets of features need to be
explored. The Branch-and-Bound
algorithmdoes reduce this complexity but is intractable for medium to large
values of the number of available features . For a large-scale
comparison of feature-selection algorithms see.
Techniques to transform the raw
feature vectors (feature extraction) are sometimes used prior to
application of the pattern-matching algorithm. For example, feature extractionalgorithms attempt
to reduce a large-dimensionality feature vector into a smaller-dimensionality
vector that is easier to work with and encodes less redundancy, using
mathematical techniques such as principal
components analysis (PCA). The
distinction between feature
selection and feature extraction is that the resulting features after
feature extraction has taken place are of a different sort than the original
features and may not easily be interpretable, while the features left after
feature selection are simply a subset of the original features.
EmoticonEmoticon