autorank

autorank(data,
         alpha=0.05,
         verbose=False,
         order='descending',
         approach='frequentist',
         rope=0.1,
         rope_mode='effsize',
         nsamples=50000,
         effect_size=None,
         force_mode=None,
         random_state=None,
         plot_order=None)

Automatically compares populations defined in a block-design data frame. Each column in the data frame contains the samples for one population. The data must not contain any NaNs. The data must have at least five measurements, i.e., rows. The current version is only reliable for less than 5000 measurements.

The following approach is implemented by this function.

First all columns are checked with the Shapiro-Wilk test for normality. We use Bonferoni correction for these tests, i.e., alpha/len(data.columns).
If all columns are normal, we use Bartlett's test for homogeneity, otherwise we use Levene's test.
Based on the normality and the homogeneity, we select appropriate tests, effect sizes, and methods for determining the confidence intervals of the central tendency.

If all columns are normal, we calculate:

The mean value as central tendency.
The empirical standard deviation as measure for the variance.
The confidence interval for the mean value.
The effect size in comparison to the highest mean value using Cohen's d.

If at least one column is not normal, we calculate:

The median as central tendency.
The median absolute deviation from the median as measure for the variance.
The confidence interval for the median.
The effect size in comparison to the highest ranking approach using Cliff's delta.

For the statistical tests, there are five variants:

If approach=='bayesian' we use a Bayesian signed rank test.
If there are two populations (columns) and both populations are normal, we use the paired t-test.
If there are two populations and at least one populations is not normal, we use Wilcoxon's signed rank test.
If there are more than two populations and all populations are normal and homoscedastic, we use repeated measures ANOVA with Tukey's HSD as post-hoc test.
If there are more than two populations and at least one populations is not normal or the populations are heteroscedastic, we use Friedman's test with the Nemenyi post-hoc test.

Parameters

data (DataFrame): Each column contains a population and each row contains the paired measurements for the populations.

alpha (float, default=0.05): Significance level. We internally use correction to ensure that all results (incl. confidence intervals) together fulfill this confidence level.

verbose (bool, default=False): Prints decisions and p-values while running the autorank function to stdout.

order (string, default='descending'): Determines the ordering central tendencies of the populations for the ranking. 'descending' results in higher ranks for larger values. 'ascending' results in higher ranks for smaller values.

approach (string, default='frequentist'): With 'frequentist', a suitable frequentist statistical test is used (t-test, Wilcoxon signed rank test, ANOVA+Tukey's HSD, or Friedman+Nemenyi). With 'bayesian', the Bayesian signed ranked test is used. (New in Version 1.1.0)

rope (float, default=0.01): Region of Practical Equivalence (ROPE) used for the bayesian analysis. The statistical analysis assumes that differences from the central tendency that are within the ROPE do not matter in practice. Therefore, such deviations may be considered to be equivalent. The ROPE is defined as an interval around the central tendency and the calculation of the interval is determined by the rope_mode parameter. (New in Version 1.1.0)

rope_mode (string, default='effsize'): Method to calculate the size of the ROPE. With 'effsize', the ROPE is determined dynamically for each comparison of two populations as rope*effect_size, where effect size is either Cohen's d (normal data) or Akinshin's gamma (non-normal data). With 'absolute', the ROPE is defined using an absolute value that is used, i.e., the value of the rope parameter is used without any modification. (New in Version 1.1.0)

nsamples (integer, default=50000): Number of samples used to estimate the posterior probabilities with the Bayesian signed rank test. (New in Version 1.1.0)

effect_size (string, default=None): Effect size measure that is used for reporting. If None, the effect size is automatically selected as described in the flow chart. The following effect sizes are supported: "cohen_d", "cliff_delta", "akinshin_gamma". (New in Version 1.1.0)

force_mode (string, default=None): Can be used to force autorank to use parametric or nonparametric frequentist tests. With 'parametric' you automatically get the t-test/repeated measures ANOVA. With 'nonparametric' you automatically get Wilcoxon's signed rank test/Friedman test. In case of Bayesian statistics, this parameter is used to override the automatic selection of the effect size measure, such that 'parametric' uses Cohen's d and 'nonparametric' uses Akinshin's, regardless of the normality of the data. If this parameter is None, the automatic selection is used. (Support for Bayesian statistics added in Version 1.3.0)

random_state (integer, default=None): Seed for random state. Forwarded to Bayesian signed rank test to enable reproducible sampling and, thereby, reproducible results. (New in Version 1.2.0)

plot_order (list): List with the order of the populations used for plotting, where reasonable (e.g., CI plots). If this is not none, this overrides the order parameter for visualizations. (New in Version 1.3.0)

Returns

A named tuple of type RankResult with the following entries.

rankdf (DataFrame): Ranked populations including statistics about the populations.

pvalue (float): p-value of the omnibus test for the difference in central tendency between the populations. Not used with Bayesian statistics.

omnibus (string): Omnibus test that is used for the test of a difference ein the central tendency.

posthoc (string): Posthoc tests that was used. The posthoc test is performed even if the omnibus test is not significant. The results should only be used if the p-value of the omnibus test indicates significance. None in case of two populations and Bayesian statistics.

cd (float): The critical distance of the Nemenyi posthoc test, if it was used. Otherwise None.

all_normal (bool): True if all populations are normal, false if at least one is not normal.

pvals_shapiro (list): p-values of the Shapiro-Wilk tests for normality sorted by the order of the input columns.

homoscedastic (bool): True if populations are homoscedastic, false otherwise. None in case of Bayesian statistics.

pval_homogeneity (float): p-value of the test for homogeneity. None in case of Bayesian statistics.

homogeneity_test (string): Test used for homogeneity. Either 'bartlet' or 'levene'.

alpha (float): Family-wise significant level. Same as input parameter.

alpha_normality (float): Corrected alpha that is used for tests for normality.

num_samples (int): Number of samples within each population.

order (string): Order of the central tendencies used for ranking.

sample_matrix (DataFrame): Matrix with SignedRankTest objects from package baycomp. Can be used to do further analysis, e.g. to generate plots using the built-in plot() method of baycomp. For a detailed description of methods and parameters, see the documentation of baycomp: https://baycomp.readthedocs.io/en/latest/classes.html#multiple-data-sets (New in Version 1.2.0)

posterior_matrix (DataFrame): Matrix with the pair-wise posterior probabilities estimated with the Bayesian signed ranked test. The matrix is a square matrix with the populations sorted by their central tendencies as rows and columns. The value of the matrix in the i-th row and the j-th column contains a 3-tuple (p_smaller, p_equal, p_greater) such that p_smaller is the probability that the population in column j is smaller than the population in row i, p_equal that both populations are equal, and p_larger that population j is larger than population i. If rope==0.0, the matrix contains only 2-tuples (p_smaller, p_greater) because equality is not possible without a ROPE. (New in Version 1.1.0)

decision_matrix (DataFrame): Matrix with the pair-wise decisions made with the Bayesian signed ranked test. The matrix is a square matrix with the populations sorted by their central tendencies as rows and columns. The value of the matrix in the i-th row and the j-th column contains the value 'smaller' if the population in column j is significantly larger than the population in row i, 'equal' is both populations are equivalent (i.e., have no practically relevant difference), 'larger' if the population in column j is larger than the population in column i, and 'inconclusive' if the statistical analysis is did not yield a definitive result. (New in Version 1.1.0)

rope (float): Region of Practical Equivalence (ROPE). Same as input parameter. (New in Version 1.1.0)

rope_mode (string): Mode for calculating the ROPE. Same as input parameter. (New in Version 1.1.0)

effect_size (string): Effect size measure that is used for reporting. Same as input parameter.

force_mode (string): If not None, this is the force mode that was used to select the tests. Either 'parametric' or 'nonparametric'.

plot_order (list): If not None, this is the fixed order that is used for plotting, where possible. Otherwise None. (New in Version 1.3.0)

create_report

create_report(result, *, decimal_places)

Prints a report about the statistical analysis.

Parameters

result (RankResult): Should be the return value the autorank function.

decimal_places (int, default=3): Number of decimal places that are used for the report.

plot_stats

plot_stats(result, *, allow_insignificant, ax, width)

Creates a plot that supports the analysis of the results of the statistical test. The plot depends on the statistical test that was used.

Creates a Confidence Interval (CI) plot for a paired t-test between two normal populations. The confidence intervals are calculated with Bonferoni correction, i.e., a confidence level of alpha/2.
Creates a CI plot for Tukey's HSD as post-hoc test with the confidence intervals calculated using the HSD approach such that the family wise significance is alpha.
Creates Critical Distance (CD) diagrams for the Nemenyi post-hoc test. CD diagrams visualize the mean ranks of populations. Populations that are not significantly different are connected by a horizontal bar.

This function raises a ValueError if the omnibus test did not detect a significant difference. The allow_significant parameter allows the suppression of this exception and forces the creation of the plots.

Parameters

result (RankResult): Should be the return value the autorank function.

allow_insignificant (bool, default=False): Forces plotting even if results are not significant.

ax (Axis, default=None): Matplotlib axis to which the results are added. A new figure with a single axis is created if None.

width (float, default=None): Specifies the width of the created plot is not None. By default, we use a width of 6. The height is automatically determined, based on the type of plot and the number of populations. This parameter is ignored if ax is not None.

Return

Axis with the plot. None if no plot was generated.

latex_table

latex_table(result, *, decimal_places, label, effect_size_relation,
            posterior_relation)

Creates a latex table from the results dataframe of the statistical analysis.

Parameters

result (RankResult): Should be the return value the autorank function.

decimal_places (int, default=3): Number of decimal places that are used for the report.

label (str, default=None): Label of the table. Defaults to 'tbl:stat_results' if None.

effect_size_relation (str, default="best"): Specifies which effect size relation is used in the table. Can be "best", "above", or both. If "best", the effect size is compute in relation to the best-ranked value. If "above", the effect size is computed in relation to the value above in the row above. With "both", both the best and the above are included in the table. (New in Version 1.3.0)

posterior_relation (str, default="best"): Specifies which posterior relation is used in the table. Can be "best", "above", or both. If "best", the posterior is computed in relation to the best-ranked value. If "above", the posterior is computed in relation to the value above in the row above. With "both", both the best and the above are included in the table. (New in Version 1.3.0)

latex_report

latex_report(result, *, decimal_places, prefix, generate_plots,
             figure_path, complete_document)

Creates a latex report of the statistical analysis.

Parameters

result (AutoRank): Should be the return value the autorank function.

decimal_places (int, default=3): Number of decimal places that are used for the report.

prefix (str, default=""): Prefix that is added before all labels and plot file names.

generate_plots (bool, default=True): Decides if plots are generated, if the results are statistically significant.

figure_path (str, default=""): Path where the plots shall be written to. Ignored if generate_plots is False.

complete_document (bool, default=True): Generates a complete latex document if true. Otherwise only a single section is generated.