autorank

autorank(data,
         alpha=0.05,
         verbose=False,
         order='descending',
         approach='frequentist',
         rope=0.1,
         rope_mode='effsize',
         nsamples=50000,
         effect_size=None)

Automatically compares populations defined in a block-design data frame. Each column in the data frame contains the samples for one population. The data must not contain any NaNs. The data must have at least five measurements, i.e., rows. The current version is only reliable for less than 5000 measurements.

The following approach is implemented by this function.

  • First all columns are checked with the Shapiro-Wilk test for normality. We use Bonferoni correction for these tests, i.e., alpha/len(data.columns).
  • If all columns are normal, we use Bartlett's test for homogeneity, otherwise we use Levene's test.
  • Based on the normality and the homogeneity, we select appropriate tests, effect sizes, and methods for determining the confidence intervals of the central tendency.

If all columns are normal, we calculate:

  • The mean value as central tendency.
  • The empirical standard deviation as measure for the variance.
  • The confidence interval for the mean value.
  • The effect size in comparison to the highest mean value using Cohen's d.

If at least one column is not normal, we calculate:

  • The median as central tendency.
  • The median absolute deviation from the median as measure for the variance.
  • The confidence interval for the median.
  • The effect size in comparison to the highest ranking approach using Cliff's delta.

For the statistical tests, there are five variants:

  • If approach=='bayesian' we use a Bayesian signed rank test.
  • If there are two populations (columns) and both populations are normal, we use the paired t-test.
  • If there are two populations and at least one populations is not normal, we use Wilcoxon's signed rank test.
  • If there are more than two populations and all populations are normal and homoscedastic, we use repeated measures ANOVA with Tukey's HSD as post-hoc test.
  • If there are more than two populations and at least one populations is not normal or the populations are heteroscedastic, we use Friedman's test with the Nemenyi post-hoc test.

Parameters

data (DataFrame): Each column contains a population and each row contains the paired measurements for the populations.

alpha (float, default=0.05): Significance level. We internally use correction to ensure that all results (incl. confidence intervals) together fulfill this confidence level.

verbose (bool, default=False): Prints decisions and p-values while running the autorank function to stdout.

order (string, default='descending'): Determines the ordering central tendencies of the populations for the ranking. 'descending' results in higher ranks for larger values. 'ascending' results in higher ranks for smaller values.

approach (string, default='frequentist'): With 'frequentist', a suitable frequentist statistical test is used (t-test, Wilcoxon signed rank test, ANOVA+Tukey's HSD, or Friedman+Nemenyi). With 'bayesian', the Bayesian signed ranked test is used. (New in Version 1.1.0)

rope (float, default=0.01): Region of Practical Equivalence (ROPE) used for the bayesian analysis. The statistical analysis assumes that differences from the central tendency that are within the ROPE do not matter in practice. Therefore, such deviations may be considered to be equivalent. The ROPE is defined as an interval around the central tendency and the calculation of the interval is determined by the rope_mode parameter. (New in Version 1.1.0)

rope_mode (string, default='effsize'): Method to calculate the size of the ROPE. With 'effsize', the ROPE is determined dynamically for each comparison of two populations as rope*effect_size, where effect size is either Cohen's d (normal data) or Akinshin's gamma (non-normal data). With 'absolute', the ROPE is defined using an absolute value that is used, i.e., the value of the rope parameter is used without any modification. (New in Version 1.1.0)

nsamples (integer, default=50000): Number of samples used to estimate the posterior probabilities with the Bayesian signed rank test. (New in Version 1.1.0)

effect_size (string, default=None): Effect size measure that is used for reporting. If None, the effect size is automatically selected as described in the flow chart. The following effect sizes are supported: "cohen_d", "cliff_delta", "akinshin_gamma". (New in Version 1.1.0)

Returns

A named tuple of type RankResult with the following entries.

rankdf (DataFrame): Ranked populations including statistics about the populations.

pvalue (float): p-value of the omnibus test for the difference in central tendency between the populations. Not used with Bayesian statistics.

omnibus (string): Omnibus test that is used for the test of a difference ein the central tendency.

posthoc (string): Posthoc tests that was used. The posthoc test is performed even if the omnibus test is not significant. The results should only be used if the p-value of the omnibus test indicates significance. None in case of two populations and Bayesian statistics.

cd (float): The critical distance of the Nemenyi posthoc test, if it was used. Otherwise None.

all_normal (bool): True if all populations are normal, false if at least one is not normal.

pvals_shapiro (list): p-values of the Shapiro-Wilk tests for normality sorted by the order of the input columns.

homoscedastic (bool): True if populations are homoscedastic, false otherwise. None in case of Bayesian statistics.

pval_homogeneity (float): p-value of the test for homogeneity. None in case of Bayesian statistics.

homogeneity_test (string): Test used for homogeneity. Either 'bartlet' or 'levene'.

alpha (float): Family-wise significant level. Same as input parameter.

alpha_normality (float): Corrected alpha that is used for tests for normality.

num_samples (int): Number of samples within each population.

order (string): Order of the central tendencies used for ranking.

posterior_matrix (DataFrame): Matrix with the pair-wise posterior probabilities estimated with the Bayesian signed ranked test. The matrix is a square matrix with the populations sorted by their central tendencies as rows and columns. The value of the matrix in the i-th row and the j-th column contains a 3-tuple (p_smaller, p_equal, p_greater) such that p_smaller is the probability that the population in column j is smaller than the population in row i, p_equal that both populations are equal, and p_larger that population j is larger than population i. If rope==0.0, the matrix contains only 2-tuples (p_smaller, p_greater) because equality is not possible without a ROPE. (New in Version 1.1.0)

decision_matrix (DataFrame): Matrix with the pair-wise decisions made with the Bayesian signed ranked test. The matrix is a square matrix with the populations sorted by their central tendencies as rows and columns. The value of the matrix in the i-th row and the j-th column contains the value 'smaller' if the population in column j is significantly larger than the population in row i, 'equal' is both populations are equivalent (i.e., have no practically relevant difference), 'larger' if the population in column j is larger than the population in column i, and 'inconclusive' if the statistical analysis is did not yield a definitive result. (New in Version 1.1.0)

rope (float): Region of Practical Equivalence (ROPE). Same as input parameter. (New in Version 1.1.0)

rope_mode (string): Mode for calculating the ROPE. Same as input parameter. (New in Version 1.1.0)

create_report

create_report(result, *, decimal_places)

Prints a report about the statistical analysis.

Parameters

result (RankResult): Should be the return value the autorank function.

decimal_places (int, default=3): Number of decimal places that are used for the report.

plot_stats

plot_stats(result, *, allow_insignificant, ax, width)

Creates a plot that supports the analysis of the results of the statistical test. The plot depends on the statistical test that was used.

  • Creates a Confidence Interval (CI) plot for a paired t-test between two normal populations. The confidence intervals are calculated with Bonferoni correction, i.e., a confidence level of alpha/2.
  • Creates a CI plot for Tukey's HSD as post-hoc test with the confidence intervals calculated using the HSD approach such that the family wise significance is alpha.
  • Creates Critical Distance (CD) diagrams for the Nemenyi post-hoc test. CD diagrams visualize the mean ranks of populations. Populations that are not significantly different are connected by a horizontal bar.

This function raises a ValueError if the omnibus test did not detect a significant difference. The allow_significant parameter allows the suppression of this exception and forces the creation of the plots.

Parameters

result (RankResult): Should be the return value the autorank function.

allow_insignificant (bool, default=False): Forces plotting even if results are not significant.

ax (Axis, default=None): Matplotlib axis to which the results are added. A new figure with a single axis is created if None.

width (float, default=None): Specifies the width of the created plot is not None. By default, we use a width of 6. The height is automatically determined, based on the type of plot and the number of populations. This parameter is ignored if ax is not None.

Return

Axis with the plot. None if no plot was generated.

latex_table

latex_table(result, *, decimal_places, label)

Creates a latex table from the results dataframe of the statistical analysis.

Parameters

result (RankResult): Should be the return value the autorank function.

decimal_places (int, default=3): Number of decimal places that are used for the report.

label (str, default=None): Label of the table. Defaults to 'tbl:stat_results' if None.

latex_report

latex_report(result, *, decimal_places, prefix, generate_plots,
             figure_path, complete_document)

Creates a latex report of the statistical analysis.

Parameters

result (AutoRank): Should be the return value the autorank function.

decimal_places (int, default=3): Number of decimal places that are used for the report.

prefix (str, default=""): Prefix that is added before all labels and plot file names.

generate_plots (bool, default=True): Decides if plots are generated, if the results are statistically significant.

figure_path (str, default=""): Path where the plots shall be written to. Ignored if generate_plots is False.

complete_document (bool, default=True): Generates a complete latex document if true. Otherwise only a single section is generated.