org.dllearner.learningproblems
Class Heuristics

java.lang.Object
  extended by org.dllearner.learningproblems.Heuristics

public class Heuristics
extends Object

Implementation of various heuristics. The methods can be used in learning problems and various evaluation scripts. They are verified in unit tests and, thus, should be fairly stable.

Author:
Jens Lehmann

Nested Class Summary
static class Heuristics.HeuristicType
           
 
Constructor Summary
Heuristics()
           
 
Method Summary
static double getAScore(double recall, double precision)
          Computes arithmetic mean of precision and recall, which is called "A-Score" here (A=arithmetic), but is not an established notion in machine learning.
static double getAScore(double recall, double precision, double beta)
          Computes arithmetic mean of precision and recall, which is called "A-Score" here (A=arithmetic), but is not an established notion in machine learning.
static double[] getAScoreApproximationStep1(double beta, int nrOfPosExamples, int nrOfInstanceChecks, int nrOfSuccessfulInstanceChecks)
          In the first step of the AScore approximation, we estimate recall (taking the factor beta into account).
static double[] getAScoreApproximationStep2(int nrOfPosClassifiedPositives, double[] recallInterval, double beta, int nrOfRelevantInstances, int nrOfInstanceChecks, int nrOfSuccessfulInstanceChecks)
          In step 2 of the A-Score approximation, the precision and overall A-Score is estimated based on the estimated recall.
static double[] getConfidenceInterval95Wald(int total, int success)
          Computes the 95% confidence interval of an experiment with boolean outcomes, e.g. heads or tails coin throws.
static double getFScore(double recall, double precision)
          Computes F1-Score.
static double getFScore(double recall, double precision, double beta)
          Computes F-beta-Score.
static double[] getFScoreApproximation(int nrOfPosClassifiedPositives, double recall, double beta, int nrOfRelevantInstances, int nrOfInstanceChecks, int nrOfSuccessfulInstanceChecks)
          This method can be used to approximate F-Measure and thereby saving a lot of instance checks.
static double getJaccardCoefficient(int elementsIntersection, int elementsUnion)
          Computes the Jaccard coefficient of two sets.
static double[] getPredAccApproximation(int nrOfPositiveExamples, int nrOfNegativeExamples, double beta, int nrOfPosExampleInstanceChecks, int nrOfSuccessfulPosExampleChecks, int nrOfNegExampleInstanceChecks, int nrOfNegativeNegExampleChecks)
           
static double getPredictiveAccuracy(int nrOfExamples, int nrOfPosClassifiedPositives, int nrOfNegClassifiedNegatives)
           
static double getPredictiveAccuracy(int nrOfPosExamples, int nrOfNegExamples, int nrOfPosClassifiedPositives, int nrOfNegClassifiedNegatives, double beta)
           
static double getPredictiveAccuracy2(int nrOfExamples, int nrOfPosClassifiedPositives, int nrOfPosClassifiedNegatives)
           
static double getPredictiveAccuracy2(int nrOfPosExamples, int nrOfNegExamples, int nrOfPosClassifiedPositives, int nrOfNegClassifiedNegatives, double beta)
           
 boolean isTooWeak(int nrOfPositiveExamples, int nrOfPosClassifiedPositives, double noise)
          Computes whether a hypothesis is too weak, i.e. it has more errors on the positive examples than allowed by the noise parameter.
 boolean isTooWeak2(int nrOfPositiveExamples, int nrOfNegClassifiedPositives, double noise)
          Computes whether a hypothesis is too weak, i.e. it has more errors on the positive examples than allowed by the noise parameter.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Heuristics

public Heuristics()
Method Detail

getFScore

public static double getFScore(double recall,
                               double precision)
Computes F1-Score.

Parameters:
recall - Recall.
precision - Precision.
Returns:
Harmonic mean of precision and recall.

getFScore

public static double getFScore(double recall,
                               double precision,
                               double beta)
Computes F-beta-Score.

Parameters:
recall - Recall.
precision - Precision.
beta - Weights precision and recall. If beta is >1, then recall is more important than precision.
Returns:
Harmonic mean of precision and recall weighted by beta.

getAScore

public static double getAScore(double recall,
                               double precision)
Computes arithmetic mean of precision and recall, which is called "A-Score" here (A=arithmetic), but is not an established notion in machine learning.

Parameters:
recall - Recall.
precision - Precison.
Returns:
Arithmetic mean of precision and recall.

getAScore

public static double getAScore(double recall,
                               double precision,
                               double beta)
Computes arithmetic mean of precision and recall, which is called "A-Score" here (A=arithmetic), but is not an established notion in machine learning.

Parameters:
recall - Recall.
precision - Precison.
beta - Weights precision and recall. If beta is >1, then recall is more important than precision.
Returns:
Arithmetic mean of precision and recall.

getJaccardCoefficient

public static double getJaccardCoefficient(int elementsIntersection,
                                           int elementsUnion)
Computes the Jaccard coefficient of two sets.

Parameters:
elementsIntersection - Number of elements in the intersection of the two sets.
elementsUnion - Number of elements in the union of the two sets.
Returns:
#intersection divided by #union.

getPredictiveAccuracy

public static double getPredictiveAccuracy(int nrOfExamples,
                                           int nrOfPosClassifiedPositives,
                                           int nrOfNegClassifiedNegatives)

getPredictiveAccuracy

public static double getPredictiveAccuracy(int nrOfPosExamples,
                                           int nrOfNegExamples,
                                           int nrOfPosClassifiedPositives,
                                           int nrOfNegClassifiedNegatives,
                                           double beta)

getPredictiveAccuracy2

public static double getPredictiveAccuracy2(int nrOfExamples,
                                            int nrOfPosClassifiedPositives,
                                            int nrOfPosClassifiedNegatives)

getPredictiveAccuracy2

public static double getPredictiveAccuracy2(int nrOfPosExamples,
                                            int nrOfNegExamples,
                                            int nrOfPosClassifiedPositives,
                                            int nrOfNegClassifiedNegatives,
                                            double beta)

getConfidenceInterval95Wald

public static double[] getConfidenceInterval95Wald(int total,
                                                   int success)
Computes the 95% confidence interval of an experiment with boolean outcomes, e.g. heads or tails coin throws. It uses the very efficient, but still accurate Wald method.

Parameters:
success - Number of successes, e.g. number of times the coin shows head.
total - Total number of tries, e.g. total number of times the coin was thrown.
Returns:
A two element double array, where element 0 is the lower border and element 1 the upper border of the 95% confidence interval.

isTooWeak

public boolean isTooWeak(int nrOfPositiveExamples,
                         int nrOfPosClassifiedPositives,
                         double noise)
Computes whether a hypothesis is too weak, i.e. it has more errors on the positive examples than allowed by the noise parameter.

Parameters:
nrOfPositiveExamples - The number of positive examples in the learning problem.
nrOfPosClassifiedPositives - The number of positive examples, which were indeed classified as positive by the hypothesis.
noise - The noise parameter is a value between 0 and 1, which indicates how noisy the example data is (0 = no noise, 1 = completely random). If a hypothesis contains more errors on the positive examples than the noise value multiplied by the number of all examples, then the hypothesis is too weak.
Returns:
True if the hypothesis is too weak and false otherwise.

isTooWeak2

public boolean isTooWeak2(int nrOfPositiveExamples,
                          int nrOfNegClassifiedPositives,
                          double noise)
Computes whether a hypothesis is too weak, i.e. it has more errors on the positive examples than allowed by the noise parameter.

Parameters:
nrOfPositiveExamples - The number of positive examples in the learning problem.
nrOfNegClassifiedPositives - The number of positive examples, which were indeed classified as negative by the hypothesis.
noise - The noise parameter is a value between 0 and 1, which indicates how noisy the example data is (0 = no noise, 1 = completely random). If a hypothesis contains more errors on the positive examples than the noise value multiplied by the number of all examples, then the hypothesis is too weak.
Returns:
True if the hypothesis is too weak and false otherwise.

getFScoreApproximation

public static double[] getFScoreApproximation(int nrOfPosClassifiedPositives,
                                              double recall,
                                              double beta,
                                              int nrOfRelevantInstances,
                                              int nrOfInstanceChecks,
                                              int nrOfSuccessfulInstanceChecks)
This method can be used to approximate F-Measure and thereby saving a lot of instance checks. It assumes that all positive examples (or instances of a class) have already been tested via instance checks, i.e. recall is already known and precision is approximated.

Parameters:
nrOfPosClassifiedPositives - Positive examples (instance of a class), which are classified as positives.
recall - The already known recall.
beta - Weights precision and recall. If beta is >1, then recall is more important than precision.
nrOfRelevantInstances - Number of relevant instances, i.e. number of instances, which would have been tested without approximations. TODO: relevant = pos + neg examples?
nrOfInstanceChecks - Performed instance checks for the approximation.
nrOfSuccessfulInstanceChecks - Number of successful performed instance checks.
Returns:
A two element array, where the first element is the computed F-beta score and the second element is the length of the 95% confidence interval around it.

getAScoreApproximationStep1

public static double[] getAScoreApproximationStep1(double beta,
                                                   int nrOfPosExamples,
                                                   int nrOfInstanceChecks,
                                                   int nrOfSuccessfulInstanceChecks)
In the first step of the AScore approximation, we estimate recall (taking the factor beta into account). This is not much more than a wrapper around the modified Wald method.

Parameters:
beta - Weights precision and recall. If beta is >1, then recall is more important than precision.
nrOfPosExamples - Number of positive examples (or instances of the considered class).
nrOfInstanceChecks - Number of positive examples (or instances of the considered class) which have been checked.
nrOfSuccessfulInstanceChecks - Number of positive examples (or instances of the considered class), where the instance check returned true.
Returns:
A two element array, where the first element is the recall multiplied by beta and the second element is the length of the 95% confidence interval around it.

getAScoreApproximationStep2

public static double[] getAScoreApproximationStep2(int nrOfPosClassifiedPositives,
                                                   double[] recallInterval,
                                                   double beta,
                                                   int nrOfRelevantInstances,
                                                   int nrOfInstanceChecks,
                                                   int nrOfSuccessfulInstanceChecks)
In step 2 of the A-Score approximation, the precision and overall A-Score is estimated based on the estimated recall.

Parameters:
nrOfPosClassifiedPositives - Positive examples (instance of a class), which are classified as positives.
recallInterval - The estimated recall, which needs to be given as a two element array with the first element being the mean value and the second element being the length of the interval (to be compatible with the step1 method).
beta - Weights precision and recall. If beta is >1, then recall is more important than precision.
nrOfRelevantInstances - Number of relevant instances, i.e. number of instances, which would have been tested without approximations.
nrOfInstanceChecks - Performed instance checks for the approximation.
nrOfSuccessfulInstanceChecks - Number of performed instance checks, which returned true.
Returns:
A two element array, where the first element is the estimated A-Score and the second element is the length of the 95% confidence interval around it.

getPredAccApproximation

public static double[] getPredAccApproximation(int nrOfPositiveExamples,
                                               int nrOfNegativeExamples,
                                               double beta,
                                               int nrOfPosExampleInstanceChecks,
                                               int nrOfSuccessfulPosExampleChecks,
                                               int nrOfNegExampleInstanceChecks,
                                               int nrOfNegativeNegExampleChecks)


SourceForge.net Logo DL-Learner is licenced under the terms of the GNU General Public License.
Copyright © 2007-2011 Jens Lehmann