org.apache.commons.math3.stat.descriptive.rank.Percentile类的使用及代码示例

x33g5p2x  于2022-01-26 转载在 其他  
字(12.8k)|赞(0)|评价(0)|浏览(360)

本文整理了Java中org.apache.commons.math3.stat.descriptive.rank.Percentile类的一些代码示例,展示了Percentile类的具体用法。这些代码示例主要来源于Github/Stackoverflow/Maven等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。Percentile类的具体详情如下:
包路径:org.apache.commons.math3.stat.descriptive.rank.Percentile
类名称:Percentile

Percentile介绍

[英]Provides percentile computation.

There are several commonly used methods for estimating percentiles (a.k.a. quantiles) based on sample data. For large samples, the different methods agree closely, but when sample sizes are small, different methods will give significantly different results. The algorithm implemented here works as follows:

  1. Let n be the length of the (sorted) array and 0 < p be the desired percentile.
  2. If n = 1 return the unique array element (regardless of the value of p); otherwise
  3. Compute the estimated percentile position pos = p * (n + 1) / 100 and the difference, d between pos and floor(pos) (i.e. the fractional part of pos).
  4. If pos < 1 return the smallest element in the array.
  5. Else if pos >= n return the largest element in the array.
  6. Else let lower be the element in position floor(pos) in the array and let upper be the next element in the array. Return lower + d * (upper - lower)

To compute percentiles, the data must be at least partially ordered. Input arrays are copied and recursively partitioned using an ordering definition. The ordering used by Arrays.sort(double[]) is the one determined by java.lang.Double#compareTo(Double). This ordering makes Double.NaN larger than any other value (including Double.POSITIVE_INFINITY). Therefore, for example, the median (50th percentile) of {0, 1, 2, 3, 4, Double.NaN} evaluates to 2.5.

Since percentile estimation usually involves interpolation between array elements, arrays containing NaN or infinite values will often result in NaN or infinite values returned.

Further, to include different estimation types such as R1, R2 as mentioned in Quantile page(wikipedia), a type specific NaN handling strategy is used to closely match with the typically observed results from popular tools like R(R1-R9), Excel(R7).

Since 2.2, Percentile uses only selection instead of complete sorting and caches selection algorithm state between calls to the various evaluate methods. This greatly improves efficiency, both for a single percentile and multiple percentile computations. To maximize performance when multiple percentiles are computed based on the same data, users should set the data array once using either one of the #evaluate(double[],double) or #setData(double[]) methods and thereafter #evaluate(double)with just the percentile provided.

Note that this implementation is not synchronized. If multiple threads access an instance of this class concurrently, and at least one of the threads invokes the increment() or clear() method, it must be synchronized externally.
[中]提供百分比计算。
基于样本数据估计百分位数(又称分位数)有几种常用方法。对于大样本,不同的方法非常一致,但当样本量较小时,不同的方法会给出显著不同的结果。这里实现的算法如下所示:
1.设n为(排序)数组的长度,0 < p be the desired percentile.
1.如果n = 1返回唯一的数组元素(不管p的值是多少);否则
1.计算估计的百分位位置pos = p * (n + 1) / 100以及posfloor(pos)之间的差异d(即pos的小数部分)。
1.Ifpos < 1返回数组中最小的元素。
1.Else ifpos >= n返回数组中最大的元素。
1.否则将lower作为数组中floor(pos)位置的元素,并将upper作为数组中的下一个元素。返回lower + d * (upper - lower)
要计算百分位数,数据必须至少部分有序。使用排序定义对输入数组进行复制和递归分区。Arrays.sort(double[])使用的顺序由java决定。朗·双#比(双)。此订单使Double.NaN比任何其他值(包括Double.POSITIVE_INFINITY)都大。因此,例如,{0, 1, 2, 3, 4, Double.NaN}的中位数(第50百分位)计算为2.5.
由于百分位估计通常涉及数组元素之间的插值,因此包含NaN或无限值的数组通常会返回NaN或无限值。
此外,为了包括Quantile page(wikipedia)中提到的R1、R2等不同的估计类型,使用特定于类型的NaN处理策略来与R(R1-R9)、Excel(R7)等流行工具的典型观察结果紧密匹配。
从2.2开始,Percentile只使用选择而不是完全排序,并在调用各种求值方法之间缓存选择算法状态。这大大提高了单百分位和多百分位计算的效率。为了在基于相同数据计算多个百分位数时最大限度地提高性能,用户应使用#evaluate(double[],double)或#setData(double[])方法之一设置数据数组,然后仅使用提供的百分位数设置#evaluate(double)。
请注意,此实现是不同步的。如果多个线程同时访问该类的一个实例,并且至少有一个线程调用increment()clear()方法,则必须在外部对其进行同步。

代码示例

代码示例来源:origin: org.apache.commons/commons-math3

/**
 * {@inheritDoc}
 */
@Override
public Percentile copy() {
  return new Percentile(this);
}

代码示例来源:origin: org.apache.commons/commons-math3

/**
 * Returns an estimate of the <code>p</code>th percentile of the values
 * in the <code>values</code> array.
 * <p>
 * <ul>
 * <li>Returns <code>Double.NaN</code> if <code>values</code> has length
 * <code>0</code></li></p>
 * <li>Returns (for any value of <code>p</code>) <code>values[0]</code>
 *  if <code>values</code> has length <code>1</code></li>
 * <li>Throws <code>IllegalArgumentException</code> if <code>values</code>
 * is null  or p is not a valid quantile value (p must be greater than 0
 * and less than or equal to 100)</li>
 * </ul></p>
 * <p>
 * See {@link org.apache.commons.math3.stat.descriptive.rank.Percentile} for
 * a description of the percentile estimation algorithm used.</p>
 *
 * @param values input array of values
 * @param p the percentile value to compute
 * @return the percentile value or Double.NaN if the array is empty
 * @throws MathIllegalArgumentException if <code>values</code> is null
 * or p is invalid
 */
public static double percentile(final double[] values, final double p)
throws MathIllegalArgumentException {
    return PERCENTILE.evaluate(values,p);
}

代码示例来源:origin: org.apache.commons/commons-math3

/**
 * Copy constructor, creates a new {@code Percentile} identical
 * to the {@code original}
 *
 * @param original the {@code Percentile} instance to copy
 * @throws NullArgumentException if original is null
 */
public Percentile(final Percentile original) throws NullArgumentException {
  MathUtils.checkNotNull(original);
  estimationType   = original.getEstimationType();
  nanStrategy      = original.getNaNStrategy();
  kthSelector      = original.getKthSelector();
  setData(original.getDataRef());
  if (original.cachedPivots != null) {
    System.arraycopy(original.cachedPivots, 0, cachedPivots, 0, original.cachedPivots.length);
  }
  setQuantile(original.quantile);
}

代码示例来源:origin: org.apache.commons/commons-math3

/**
 * Returns the result of evaluating the statistic over the stored data.
 * <p>
 * The stored array is the one which was set by previous calls to
 * {@link #setData(double[])}
 * </p>
 * @param p the percentile value to compute
 * @return the value of the statistic applied to the stored data
 * @throws MathIllegalArgumentException if p is not a valid quantile value
 * (p must be greater than 0 and less than or equal to 100)
 */
public double evaluate(final double p) throws MathIllegalArgumentException {
  return evaluate(getDataRef(), p);
}

代码示例来源:origin: org.apache.commons/commons-math3

test(values, 0, 0);
return evaluate(values, 0, values.length, p);

代码示例来源:origin: salesforce/Argus

private Double _calculateNthPercentile(Collection<Double> values, Double percentileValue) {
  return new Percentile().evaluate(Doubles.toArray(values), percentileValue);
}

代码示例来源:origin: jpmml/jpmml-evaluator

@Override
  public double doublePercentile(int percentile){

    if(this.size == 0){
      throw new IllegalStateException();
    }

    double[] data = new double[this.size];

    System.arraycopy(this.values, 0, data, 0, data.length);

    Arrays.sort(data);

    Percentile statistic = new Percentile();
    statistic.setData(data);

    return statistic.evaluate(percentile);
  }
}

代码示例来源:origin: stanford-futuredata/macrobase

@Override
public void consume(List<Datum> records) {
  List<DatumWithNorm> toClassify = new ArrayList<>();
  double[] scores = new double[records.size()];
  for(int i = 0; i < records.size(); i++) {
    Datum d = records.get(i);
    DatumWithNorm dwn = new DatumWithNorm(d);
    toClassify.add(dwn);
    scores[i] = dwn.getNorm();
  }
  Percentile pCalc = new Percentile().withNaNStrategy(NaNStrategy.MAXIMAL);
  pCalc.setData(scores);
  double cutoff = pCalc.evaluate(scores, targetPercentile * 100);
  log.debug("{} Percentile Cutoff: {}", targetPercentile, cutoff);
  log.debug("Median: {}", pCalc.evaluate(50));
  log.debug("Max: {}", pCalc.evaluate(100));
  for(DatumWithNorm dwn : toClassify) {
    results.add(new OutlierClassificationResult(dwn.getDatum(),
                          dwn.getNorm() >= cutoff || dwn.getNorm().isInfinite()));
  }
}

代码示例来源:origin: zavtech/morpheus-core

@Override
public double getValue() {
  return new org.apache.commons.math3.stat.descriptive.rank.Percentile(nth * 100)
    .withEstimationType(org.apache.commons.math3.stat.descriptive.rank.Percentile.EstimationType.R_7)
    .withNaNStrategy(NaNStrategy.FIXED)
    .evaluate(values, 0, n);
}

代码示例来源:origin: linkedin/cruise-control

_percentile.setData(historyMetricValues.doubleArray());
double upperPercentileMetricValue = _percentile.evaluate(_anomalyUpperPercentile);
if (upperPercentileMetricValue <= SIGNIFICANT_METRIC_VALUE_THRESHOLD) {
 return null;
double lowerThreshold = _percentile.evaluate(_anomalyLowerPercentile) * _anomalyLowerMargin;
double currentMetricValue = current.metricValues().valuesFor(metricId).latest();

代码示例来源:origin: automatictester/lightning

@Override
protected int calculateNumericResult(DescriptiveStatistics ds) {
  ds.setPercentileImpl(new Percentile().withEstimationType(Percentile.EstimationType.R_3));
  return actualResult = (int) ds.getPercentile((double) percentile);
}

代码示例来源:origin: org.apache.commons/commons-math3

if (values == getDataRef()) {
  work = getDataRef();
} else {
  switch (nanStrategy) {
  case MAXIMAL:// Replace NaNs with +INFs
    work = replaceAndSlice(values, begin, length, Double.NaN, Double.POSITIVE_INFINITY);
    break;
  case MINIMAL:// Replace NaNs with -INFs
    work = replaceAndSlice(values, begin, length, Double.NaN, Double.NEGATIVE_INFINITY);
    break;
  case REMOVED:// Drop NaNs from data
    work = removeAndSlice(values, begin, length, Double.NaN);
    break;
  case FAILED:// just throw exception as NaN is un-acceptable
    work = copyOf(values, begin, length);
    MathArrays.checkNotNaN(work);
    break;
  default: //FIXED
    work = copyOf(values,begin,length);
    break;

代码示例来源:origin: org.apache.commons/commons-math3

/**
 * Replace every occurrence of a given value with a replacement value in a
 * copied slice of array defined by array part from [begin, begin+length).
 * @param values the input array
 * @param begin start index of the array to include
 * @param length number of elements to include from begin
 * @param original the value to be replaced with
 * @param replacement the value to be used for replacement
 * @return the copy of sliced array with replaced values
 */
private static double[] replaceAndSlice(final double[] values,
                    final int begin, final int length,
                    final double original,
                    final double replacement) {
  final double[] temp = copyOf(values, begin, length);
  for(int i = 0; i < length; i++) {
    temp[i] = Precision.equalsIncludingNaN(original, temp[i]) ?
         replacement : temp[i];
  }
  return temp;
}

代码示例来源:origin: org.apache.commons/commons-math3

/**
 * Get pivots which is either cached or a newly created one
 *
 * @param values array containing the input numbers
 * @return cached pivots or a newly created one
 */
private int[] getPivots(final double[] values) {
  final int[] pivotsHeap;
  if (values == getDataRef()) {
    pivotsHeap = cachedPivots;
  } else {
    pivotsHeap = new int[PIVOTS_HEAP_LENGTH];
    Arrays.fill(pivotsHeap, -1);
  }
  return pivotsHeap;
}

代码示例来源:origin: com.salesforce.argus/argus-core

private Double _calculateNthPercentile(Collection<Double> values, Double percentileValue) {
  return new Percentile().evaluate(Doubles.toArray(values), percentileValue);
}

代码示例来源:origin: org.apache.mahout/mahout-mrlegacy

/**
 * @return an array of values to split the numeric feature's values on when
 *  building candidate splits. When input size is <= MAX_NUMERIC_SPLITS + 1, it will
 *  return the averages between success values as split points. When larger, it will
 *  return MAX_NUMERIC_SPLITS approximate percentiles through the data.
 */
private static double[] chooseNumericSplitPoints(double[] values) {
 if (values.length <= 1) {
  return values;
 }
 if (values.length <= MAX_NUMERIC_SPLITS + 1) {
  double[] splitPoints = new double[values.length - 1];
  for (int i = 1; i < values.length; i++) {
   splitPoints[i-1] = (values[i] + values[i-1]) / 2.0;
  }
  return splitPoints;
 }
 Percentile distribution = new Percentile();
 distribution.setData(values);
 double[] percentiles = new double[MAX_NUMERIC_SPLITS];
 for (int i = 0 ; i < percentiles.length; i++) {
  double p = 100.0 * ((i + 1.0) / (MAX_NUMERIC_SPLITS + 1.0));
  percentiles[i] = distribution.evaluate(p);
 }
 return percentiles;
}

代码示例来源:origin: com.linkedin.cruisecontrol/cruise-control-core

_percentile.setData(historyMetricValues.doubleArray());
double upperPercentileMetricValue = _percentile.evaluate(_anomalyUpperPercentile);
if (upperPercentileMetricValue <= SIGNIFICANT_METRIC_VALUE_THRESHOLD) {
 return null;
double lowerThreshold = _percentile.evaluate(_anomalyLowerPercentile) * _anomalyLowerMargin;
double currentMetricValue = current.metricValues().valuesFor(metricId).latest();

代码示例来源:origin: geogebra/geogebra

/**
 * Returns the result of evaluating the statistic over the stored data.
 * <p>
 * The stored array is the one which was set by previous calls to
 * {@link #setData(double[])}
 * </p>
 * @param p the percentile value to compute
 * @return the value of the statistic applied to the stored data
 * @throws MathIllegalArgumentException if p is not a valid quantile value
 * (p must be greater than 0 and less than or equal to 100)
 */
public double evaluate(final double p) throws MathIllegalArgumentException {
  return evaluate(getDataRef(), p);
}

代码示例来源:origin: geogebra/geogebra

test(values, 0, 0);
return evaluate(values, 0, values.length, p);

代码示例来源:origin: meyerjp3/psychometrics

/**
 * Computes the bandwidth
 */
private void computeBandwidth(){
  double n = (double)x.length;
  stats = new DescriptiveStatistics(x);
  stats.setPercentileImpl(new Percentile().withEstimationType(Percentile.EstimationType.R_7));//Use the same percentile method as R.
  double observedSd = stats.getStandardDeviation();
  double observedQ1 = stats.getPercentile(25);
  double observedQ3 = stats.getPercentile(75);
  double observedIqr = observedQ3-observedQ1;
  if(bandwidthType==BandwidthType.BW_NRD){
    //Scott's plugin bandwidth (bw.nrd in R)
    h = 1.06*Math.min(observedSd, observedIqr/1.34)*Math.pow(n, -1.0/5.0);
  }else{
    //Silverman's rule of thumb (bw.nrd0 is the default in R and the default here.)
    h = 0.9*Math.min(observedSd, observedIqr/1.34)*Math.pow(n, -1.0/5.0);
  }
  //apply adjustment factor
  h *= adjust;
}

相关文章