Statistics

Previous months:
2010 - 1003(10) - 1004(7) - 1005(4) - 1006(1) - 1007(2) - 1008(4) - 1010(1) - 1011(1)
2011 - 1105(2) - 1107(1) - 1111(1) - 1112(1)
2012 - 1203(1) - 1204(2) - 1205(1) - 1208(1) - 1210(1) - 1211(6) - 1212(1)
2013 - 1301(1) - 1304(3) - 1306(1) - 1307(1) - 1310(2)
2014 - 1402(1) - 1403(3) - 1404(2) - 1405(2) - 1407(1) - 1409(4) - 1410(4) - 1411(13) - 1412(4)
2015 - 1503(1) - 1505(2) - 1506(2) - 1507(3) - 1508(3) - 1509(1) - 1511(2) - 1512(6)
2016 - 1601(6) - 1602(3) - 1603(4) - 1604(2) - 1605(1) - 1607(5) - 1608(1) - 1609(4) - 1610(1) - 1611(1) - 1612(2)
2017 - 1701(4) - 1702(3) - 1703(5) - 1704(11) - 1705(12) - 1706(8) - 1707(2) - 1708(2) - 1709(1) - 1710(3) - 1711(5) - 1712(6)
2018 - 1801(5) - 1802(3) - 1803(4) - 1804(4) - 1805(3) - 1806(5) - 1807(2) - 1808(1) - 1809(3) - 1810(5) - 1811(4) - 1812(2)
2019 - 1901(3) - 1903(1) - 1904(2) - 1905(4) - 1906(1) - 1907(2) - 1908(1) - 1909(1) - 1910(2) - 1911(3) - 1912(1)
2020 - 2001(3) - 2002(1) - 2003(1) - 2004(3) - 2005(2) - 2006(2) - 2007(1) - 2008(3) - 2009(2) - 2010(2) - 2011(2) - 2012(12)
2021 - 2101(3) - 2102(3) - 2103(4) - 2104(1) - 2105(1) - 2106(2) - 2107(2) - 2109(1) - 2110(2) - 2111(3) - 2112(3)
2022 - 2201(1) - 2202(2) - 2204(2) - 2207(1) - 2209(2) - 2212(1)
2023 - 2301(1) - 2302(1) - 2303(1) - 2304(1) - 2305(1) - 2306(1) - 2307(1) - 2308(1) - 2309(1) - 2310(2) - 2311(1) - 2312(2)
2024 - 2402(2) - 2404(2) - 2406(3) - 2407(2) - 2408(1) - 2411(2)
2025 - 2502(2) - 2503(6) - 2504(2) - 2505(2) - 2506(2) - 2507(2) - 2508(1) - 2510(5) - 2511(1)
2026 - 2601(2) - 2602(2)

Recent submissions

Any replacements are listed farther down

[371] viXra:2602.0155 [pdf] submitted on 2026-02-26 09:49:23

Nested Sampling: A Critical and Comprehensive Theoretical Guide

Authors: L. Martino, F. Llorente
Comments: 28 Pages.

The Nested Sampling (NS) technique has gained widespread attention, particularly in cosmology and astronomy, due to its ability to efficiently explore high-likelihood regions a feature akin to an implicit likelihood optimization that underlies its success. While the full theoretical derivation of NS is complex and involves several approximations, the central challenge lies in sampling from the likelihood-constrained priors, which is crucial for its performance. This work provides a comprehensive and detailed exposition of NS, clarifying both its theoretical foundations and practical challenges. We provide a thorough description of the NS procedure, emphasizing both its strengths and potential limitations. In doing so, this work seeks to deepen understanding of the method and to foster the development of future enhancements, novel variants, and more efficient implementations across a wide range of scientific applications.
Category: Statistics

[370] viXra:2602.0150 [pdf] submitted on 2026-02-26 21:24:53

A Unifying View of Multiple-Try Metropolis and Particle Metropolis-Hastings Algorithms

Authors: L. Martino
Comments: 27 Pages.

Markov chain Monte Carlo (MCMC) and Sequential Monte Carlo (SMC) methods are cornerstone techniques for Bayesian inference and stochastic optimization. The multiple-try Metropolis (MTM) algorithm generalizes the Metropolis-Hastings (MH) scheme by selecting the next state from a set of weighted candidates, improving exploration of the state space. Particle Metropolis-Hastings (PMH) integrates MCMC and SMC ideas to efficiently tackle high-dimensional targets with sequentially factorized structures, embedding a particle filter within an MH framework. While both approaches have been extensively studied, particularly for state-space models, their relationship has not been fully explored. In this work, we examine the connections and distinctions between MTM and PMH schemes, which motivates the design of novel, highly efficient algorithms for filtering and smoothing. Among these, we introduce a particle multiple-try Metropolis (P-MTM) method, which demonstrates excellent performance across a range of numerical experiments.
Category: Statistics

[369] viXra:2601.0065 [pdf] submitted on 2026-01-15 10:10:17

Importance Sampling and Contrastive Learning Schemes for Parameter Estimation in Non-Normalized Models

Authors: L. Martino, L. Scaffidi, S. Mangano
Comments: 30 Pages.

Likelihood-approximation methods and contrastive learning (CL) are two prominent approaches for inference in models with unknown partition function. In this work, we provide a detailed comparison between the likelihood approximation by Geyer's approach (GA) and CL. Rather than increasing the complexity of Geyer's method to enable comparison, as proposed in [1], we adopt the opposite strategy by simplifying CL. We introduce a class of IS-within-CL schemes that estimate the partition function via importance sampling (IS) and reduce the optimization problem to the original parameter space. This perspective motivates the development of novel variants, whose theoretical properties are analyzed and empirically compared in a replicable experimental study. The described IS-within-CL schemes yield an entire approximation of the partition function, so enabling a possible efficient Bayesian inference. An optimal independent proposal density for IS-within-CL methods and the GA is also introduced. Overall, this work contributes to a clearer unification of likelihood-approximation and CL approaches, offering both theoretical understanding and practical tools for inference in energy-based and non-normalized models. Related MATLAB and R codes are also made freely available to help the reproducibility of the results.
Category: Statistics

[368] viXra:2510.0077 [pdf] submitted on 2025-10-14 21:00:14

Monty-hall Theorem Bayes-price Rule (Bayes Theorem) for a Three Parameter Event Space

Authors: Keshava Prasad Halemane
Comments: 7 Pages.

This research report presents the statement of the Monty-Hall Theorem and provides a constructive proof by solving the classical Monty-Hall Problem. It establishes the fact that the probability of winning the prize is indeed unaffected by a switched choice — very much unlike the most prevalent and widely accepted position held by the Leading Subject-Matter-Experts.
Category: Statistics

[367] viXra:2510.0059 [pdf] submitted on 2025-10-12 10:04:53

Arguments in Favor of the Berger-Parker Index as an Effective Sample Size: The Only True Particle Counter

Authors: L. Martino
Comments: 13 Pages.

In many fields, including computational statistics, ecology, economics, and physics, normalized weights define a discrete probability mass function over a set of entities/samples. The effective sample size (ESS) quantifies the concentration of these weights, providing a measure of sample representativeness. In this work, we show that, among various ESS formulations, the Berger-Parker index uniquely preserves the relative proportions of the weights, acting as a true particle counter. Other commonly used ESS expressions tend to overestimate the effective sample size when only normalized weights are considered. Several examples and formal demonstration are provided.
Category: Statistics

[366] viXra:2510.0016 [pdf] submitted on 2025-10-04 09:38:22

Consensus in Sequential Wrapper Feature Selection: a Unifying Approach

Authors: L. Martino, G. Villacrés, S. Arcidiacono
Comments: 12 Pages.

Feature selection is a crucial task in statistics and machine learning, with direct implications for model interpretability and computational efficiency. This study introduces aunifying approach that combines the four possible sequential wrapper methods employedfor variable selection, aiming to exploit their complementary strengths. The proposed procedure computes feature relevance scores and, subsequently, integrates the outputs from each sequential wrapper method. The underlying idea is simple and efficient. We test it in a controlled experiment with a known ground truth. The results indicate that the ranking obtained by consensus clearly outperform the individual rankings obtained by the wrapper methods.
Category: Statistics

[365] viXra:2510.0015 [pdf] submitted on 2025-10-04 10:00:17

Data-Driven Priors Via Hyper-Parameter Posteriors of Gaussian Processes

Authors: L. Martino, J. Lopez-Santiago, J. Miguez, G. Vazquez-Vilar
Comments: 26 Pages.

When neither prior knowledge nor expert opinion is available, non-informative priors provide a practical alternative for conducting Bayesian inference. However, in the context of model selection, genuinely non-informative priors do not exist. In fact, diu2000use priors on the parameters can drastically alter the value of the Bayesian evidence, making them effectively highly informative, while improper priors are even not allowed. Furthermore, in many real-worldapplications, the use of informative priors can substantially improve the computational efficiency by driving sampling algorithms toward regions of high posterior probability. In this work, we introduce a data-driven procedure for an automatic prior construction. The underlying idea is to exploit the posteriors of the hyper-parameters from non-parametric models, to construct priors for Bayesian inference in parametric models. We test the proposed scheme in four different experiments, two of which involve real astronomical data.
Category: Statistics

[364] viXra:2510.0001 [pdf] submitted on 2025-10-01 20:58:31

Sampling from Mixtures with Negative Weights: Application to Density Approximation by Gaussian Processes

Authors: L. Martino
Comments: 13 Pages.

Mixtures of probability densities are widely used in statistics and machine learning. While classical mixtures restrict weights to be non-negative, allowing negative weights enables more flexible density approximation. However,negative weights introduce challenges in handling and sampling such distributions. For this purpose, we propose efficient Monte Carlo (MC) methods(including MC quadratures, rejection sampling and importance sampling schemes) for computing integrals and generating samples from these mixtures. A tailored proposal density ensures accurate and efficient generationof (unweighted) samples. Applications in Gaussian process-based density estimation demonstrate the practical relevance and eu2000ciency of proposedschemes.
Category: Statistics

[363] viXra:2508.0163 [pdf] submitted on 2025-08-27 20:17:12

Fast Resampling for Sequential Monte Carlo with Millions of Particles

Authors: Luca Martino, V. Elvira
Comments: 27 Pages.

Particle filtering (PFs) and, more generally, sequential Monte Carlo (SMC) methods are essential tools for Bayesian inference. Over the years, many SMC variants have been proposed, yet their core always relies on importance sampling following by a resampling step. While resampling is crucial to mitigate particle degeneracy and to maintain a stable approximation of the posterior distribution, it often represents a significant computational bottleneck. In this work, we present a novel, fast, resampling procedure that provides significant computational gains in demanding (often high-dimensional) scenarios where a large number of particles is required, and the small effective sample size (ESS) is small. The effectiveness of the proposed approach is demonstrated through a series of numerical experiments showing remarkable performance. In addition, a theoretical analysis and related code implementation are provided.
Category: Statistics

[362] viXra:2507.0223 [pdf] submitted on 2025-07-31 20:05:01

Simulation of Generalized Tempered Stable (GTS) Random Variates via Series Representations: A Case Study of Bitcoin and Ethereum

Authors: A. H. Nzokem
Comments: 6 Pages.

The paper presents two series representations of a Levy process for the Generalized Tempered Stable (GTS) distribution: a series representation generated by the inverse tail integral and a short noise representation. Both series representations are used to simulate the daily returns of Bitcoin and Ethereum. The Q-Q plot analysis shows smooth linear patterns, indicating strong agreement between the empirical and theoretical GTS distributions.
Category: Statistics

[361] viXra:2507.0047 [pdf] submitted on 2025-07-06 21:18:35

A Simple Statistical Method to "replace" Yates Analysis & a. N. O. V. A.

Authors: S.C. Gaudie
Comments: 3 Pages. Contact: tetrahedron_1_3_6@aim.com (Note by viXra Admin: Author's name is required on the article; please cite and list scientific references)

At the most "basic level", this is a very simple method, which is easy to understand.The "basics" is just subtracting one value from another. The "basics", can be very revealing, in showing differences between "intermediate, virtual results". For "clarification of understanding", most of these calculations are based on "idealised results", where the "calculated results" are matches to the "obviously expected results". (The "originally inputted results".) Furthermore, more "clarification of understanding" is achieved by using "children's stacking blocks", with "binary numbers" written on them, as "equivalents", to the "experimental results". Also the "data set used is a simple experimental version" - 3 variables (C = clear = 100, B = black = 010, A = amber = 001; N = NO blocks with NO numbers) at 2 levels (0 = Absent;u2006 u2006 1 = Present.)This gives 8 possible "combination", variations:- NNN, NNA, NBN, NBA, CNN, CNA, CBN, CBA BLOCK COLOUR /VARIABLE 000, 001, 010, 011, 100, 101, 110, 111 BINARY NUMBERS ##0, ##1, ##2, ##3, ##4, ##5, ##6, ##7 DECIMAL NUMBERS ##0, ##1, ##2, ##3, ##4, ##5, ##6, #11 "Test Data" BINARY NUMBERS Needs a 1 at the beginning to "keep all 3 numbers in place", for computer calculations! It is fairly easy to extrapolate this method to more variables and more levels. e.g. using trinary numbers. This method is easier and better than Yates Analysis Effects or ANalysis Of VAriance (ANOVA).
Category: Statistics

[360] viXra:2506.0111 [pdf] submitted on 2025-06-20 20:10:55

On a Probabilistic Iterated Factor Method

Authors: Theophilus Agama
Comments: 5 Pages.

We introduce a probabilistic version of the iterated factor method developed in our previous work. Let n be drawn uniformly from the set {1, 2, ..., M}. Define s(n) = floor( sqrt(log base 2 of n) / log(log base 2 of n) ), and t(n) = sqrt(log(log n)), and let ku2099 = floor(n / 2) with respect to s, meaning a construction related to the s-th iterated factor. As N tends to infinity, with Zu2099 distributed normally with mean 0 and variance 1, and under the condition thatPr[ ν(ku2099) ≤ s and |Zu2099| ≤ t ] tends to 1,we show the inequality ι(2u207f − 1) ≤ n − 1 + log base 2 of n + C·sqrt( log base 2 of n / log log base 2 of n ) holds for some absolute constant C > 0. This result surpasses the O(log n / log log n) barrier that is guaranteed under the classical Brauer method (Brauer, 1939). It may thus be viewed as an introduction of probabilistic methods into the theory of addition chains.
Category: Statistics

[359] viXra:2506.0094 [pdf] submitted on 2025-06-18 20:44:42

Sampling from the Maxwell-Juttner Distribution

Authors: Luc Devroye
Comments: 10 Pages.

The Maxwell-Ju ̈ttner distribution pertains to the speeds of particles in a hypothetical gas of relativistic particles. We consider the generalized form of this law in Rd, and show how to generate a random variate from this distribution. In particular, one can sample the radius of the d-dimensional random vector in expected time uniformly bounded over all dimensions d and shape parameters (temperatures).
Category: Statistics

[358] viXra:2505.0016 [pdf] submitted on 2025-05-01 17:42:44

Full proof to the Gambler's Ruin Problem

Authors: Brian Yin
Comments: 5 Pages. (Note by viXra Admin: Please cite and list scientific references)

In this paper, we are going to be discussing about the full proof to the Gambler's Ruin Problem, using a combination of probability theory, recurrence relations, and boundary conditions.
Category: Statistics

[357] viXra:2505.0005 [pdf] submitted on 2025-05-01 16:50:56

Automatic Uncertainty Evaluation for Determining the Number of Components in Nested Models

Authors: L. Martino, R. San Millan-Castillo, E. Morgado
Comments: 21 Pages.

In this work, we propose and examine two procedures for constructing intervals that capture the uncertainty associated with determining the effective number of components in model selection problems. The output of these methods is an interval (defined by two integer bounds) representing plausible values for the number of components. A detailed discussion is provided on the connection between the proposed approaches andthe widely-used information criteria in the literature. Notably, the methods do not relyon the availability of a likelihood function, making them broadly applicable across variousdomains such as regression, classification, feature and/or order selection, clustering, anddimensionality reduction. These techniques leverage geometric properties of the error curve to construct the intervals. Extensive experiments on both synthetic and real-world datasets demonstrate the effectiveness and practical utility of the proposed procedures. Additionally, MATLAB code is provided to facilitate adoption by practitioners and researchers.
Category: Statistics

[356] viXra:2504.0171 [pdf] submitted on 2025-04-27 09:46:44

Compare and Combine Different Importance Ranking Methods for Feature Selection: a Gentle Review

Authors: M. Marinescu, L. Martino, G. Villacres, S. G. Arcidiacono, O. Barquero
Comments: 23 Pages.

Feature selection remains a highly relevant and actively researched topic across signal processing, statistics, and machine learning. It has gained new relevance recently, especially because of renewed interest in the so-called Shapley values. However, beyond the Shapley values, many possibilities exist to measure (explicitly or implicitly) the importance of a variable for a specific task. Given a measure of importance, we can obtain a ranking of the input features (involved, e.g., in a regression or classification problem), as provided by an algorithm and/or expert system. Consequently, it is also necessary to evaluate the obtained rankings, for instance to identify the most effective ranking method or to aggregateall results into an average ranking, akin to an ensemble average of expert opinions. In this work, we provide an exhaustive review of several scoring functions and techniques designed for evaluating the ranking methods with or without an available ground-truth. Moreover, the work contains some novel elements such as the use of other famous indices, for instance, the Gini coefficient and effective sampling size (ESS) measures. It is important to remark thatthe paper incorporates insights from a variety of sources across diverse scientific disciplines, including computational statistics, quantitative economics, and machine learning. Finally, we test the described schemes in a controlled experiment on feature selection, in order to compare different ranking methods and to assess their performance and robustness.
Category: Statistics

[355] viXra:2504.0119 [pdf] submitted on 2025-04-17 20:19:43

The Role of Statistics in Machine Learning Regression Models

Authors: Bamba Gueye, Laure Gouba
Comments: 22 Pages. 20 Figures

In this paper, we discuss the role of statistics in simple linear regression, multiple linear regression, and logistic regression. Python has been used to implement the algorithms in these models.
Category: Statistics

[354] viXra:2503.0127 [pdf] submitted on 2025-03-21 20:54:38

Notes on the Jellinek-Berry Thermostated Ideal Gas

Authors: Leo T. Butler, Alireza Sharifi
Comments: 12 Pages.

This note studies Hamiltonian systems which are thermostated using the Jellinek—Berry thermostat (J. Chem. Phys. 1988; Phys. Rev. A 1988). Jellinek & Jellinek and Berry propose an extension of Nosé's thermostat (J. Chem. Phys. 1984). They introduce multiple functional parameters in order to achieve ergodicity of the thermostatted dynamics. This family of Hamiltonian thermostats aim to simulate the macro canonical ensemble of a Hamiltonian $H$ by coupling $H$ to a 1-d heat reservoir with potentialenergy $v(s)$ and kinetic energy $p^2/2Q(s)$. Our note derives a normal form for the reservoir’s potential energy; investigates when the Jellinek—Berry thermostated system admits a Nosé—Hoover reduction; and, we demonstrate that a Jellinek—Berry thermostated periodic ideal gas is completely integrable and satisfies a KAM twist condition called Rüssmann non-degeneracy. This is used to deduce that a thermostated, collision-less, non-ideal gas (i.e. one with a smooth potential energy) at sufficiently high temperatures of the reservoir has a positive measure set of invariant tori—hence, the thermostated dynamics are non-ergodic.
Category: Statistics

[353] viXra:2503.0117 [pdf] submitted on 2025-03-19 14:35:15

A Note on Gradient-Based Parameter Estimation for Energy-Based Models

Authors: L. Martino, S. Ingrassia, S. Mangano, L. Scaffidi
Comments: 12 Pages.

Energy-based models (EBMs) are an important family of models where a piece of the likelihood is intractable, and hence unknown. For this reason, the parameter estimation in EBMs is a challengefor the standard estimation methods. In this paper, we present a critical discussion of gradient-based approaches for inference in energy-based models. We provide many details of different derivations, clarify connections and differences. We give practical suggestions for the application of the different schemes. Specifically, we focus on a suitable choice of the proposal/reference density that is crucial for the performance of the gradient-based procedures.
Category: Statistics

[352] viXra:2503.0116 [pdf] submitted on 2025-03-19 15:09:21

Multioutput Feature Selection for Emulation and Sensitivity Analysis

Authors: J. Vicent, L. Martino, J. Verrelst, J. P. Rivera Caicedo, G. Camps-Valls
Comments: 26 Pages. Published in IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1-11, 2024

Statistical regression methods are widely used in remote sensing applications but tend to lack physical interpretability. In this paper, we introduce a methodological framework to improve modelemulation and its understanding with machine learning feature selection. Our wrapper-forward feature selection method seamlessly integrates physics knowledge into model emulation, improving the tradeoff between accuracy and interpretability. We illustrate our methodology by applying it to atmospheric radiative transfer models in the context of global sensitivity analysis (GSA) and emulation. Our approach consistently aligns with variance-based GSA, pinpointing the critical features of aerosol properties, solar zenith angle, and water vapor. While our physically-based emulators yield only a modest accuracy improvement of 0.2% over conventional Gaussian Processes emulators, its introduction signifies a step forward to physics-aware machine learning-based emulation. The emulator performance remains steadfast, unaffected by substantial changes, further underscoring the reliability of our approach.
Category: Statistics

[351] viXra:2503.0115 [pdf] submitted on 2025-03-19 20:03:15

Multi-fidelity Gaussian Process Emulation for Atmospheric Radiative Transfer Models

Authors: J, Vicent, L. Martino, J. Verrelst, G. Camps-Valls
Comments: 28 Pages. Published in IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1-10, 2023.

Atmospheric radiative transfer models (RTMs) are widely used in satellite data processing to correct for the scattering and absorption effects caused by aerosols and gas molecules in the Earth’s atmosphere. As the complexity of RTMs grows and the requirements for future Earth Observation missions become more demanding, the conventional Look-Up Table (LUT) interpolation approach faces important challenges. Emulators have been suggested as an alternative to LUT interpolation, but they arestill too slow for operational satellite data processing. Our research introduces a solution that harnesses the power of multi-fidelity methods to improve the accuracy and runtime of Gaussian Process (GP) emulators. We investigate the impact of the number of fidelity layers, dimensionality reduction, and training dataset size on the performance of multi-fidelity GP emulators. We find that an optimal multi-fidelity emulator can achieve relative errors in surface reflectance below 0.5% and performs atmospheric correction of hyperspectral PRISMA satellite data (one million pixels) in a few minutes. Additionally, we provide a suite of functions and tools for automating the creation and generation of atmospheric RTM emulators.
Category: Statistics

[350] viXra:2503.0080 [pdf] submitted on 2025-03-13 20:56:57

Bayesian Approach to Hypothesis-Based Randomness Estimation

Authors: Alexander Rozenkevich
Comments: 6 Pages.

A Bayesian method for dynamic hypothesis-based randomness estimation of a sequence of experimental data is proposed. Examples of pseudorandom number generator testing are given.
Category: Statistics

[349] viXra:2503.0079 [pdf] submitted on 2025-03-13 20:56:30

Bayesian Approach to Hypothesis-Based Randomness Estimation (in Russian)

Authors: Alexander Tozenkevich
Comments: 6 Pages.

[348] viXra:2502.0136 [pdf] submitted on 2025-02-19 22:13:49

Logarithm of Exponential and Cauchy Random Variables

Authors: Josef Bukac
Comments: 3 Pages. (Note by viXra Admin: An abstract in the article is required)

We show how to calculate the expectation and the second moment of logarithm of exponential distribution. There are two ways to do this.The Cauchy distribution has no expectation but we calculate the expectation and second moment of the logaritm of the absolute value of the Cauchy distribution.
Category: Statistics

[347] viXra:2502.0097 [pdf] submitted on 2025-02-14 18:48:25

The Eggenberger-Polya Urn Process: Probabilities of Revisited Ball Ratios

Authors: Richard J. Mathar
Comments: 14 Pages.

The Eggenberger-Polya urn process starts with an urn that contains balls of two colors. At each step a ball in the urn is randomly chosen and a ball of the same color added to the urn. The probabilities of finding specified numbers of balls of the two colors later on can be visualized as a non-isotropic walk of U(p) and R(ight) steps on a square lattice. We discuss some numerical aspects of the probability that a ratio of the ball numbers of the two colors reappears later on during the process.
Category: Statistics

[346] viXra:2411.0080 [pdf] submitted on 2024-11-11 20:56:07

The Fundamental Problem of Causal Inference

Authors: Andrea Berdondini
Comments: 6 Pages.

The fundamental problem of causal inference defines the impossibility of associating a causal link to a correlation, in other words: correlation does not prove causality. This problem can be understood from two points of view: experimental and statistical. The experimental approach tells us that this problem arises from the impossibility of simultaneously observing an event both in the presence and absence of a hypothesis. The statistical approach, on the other hand, suggests that this problem stems from the error of treating tested hypotheses as independent of each other. Modern statistics tends to place greater emphasis on the statistical approach because, compared to the experimental point of view, it also shows us a way to solve the problem. Indeed, when testing many hypotheses, a composite hypothesis is constructed that tends to cover the entire solution space. Consequently, the composite hypothesis can be fitted to any data set by generating a random correlation. Furthermore, the probability that the correlation is random is equal to the probability of obtaining the same result by generating an equivalent number of random hypotheses.
Category: Statistics

[345] viXra:2411.0008 [pdf] submitted on 2024-11-02 07:03:48

A Subsampling Based Neural Network for Spatial Data

Authors: Debjoy Thakur
Comments: 28 Pages.

The application of deep neural networks in geospatial data has become a trending research problem in the present day. A significant amount of statistical research has already been introduced, such as generalized least square optimization by incorporating spatial variance-covariance matrix, considering basis functions in the input nodes of the neural networks, and so on. However, for lattice data, there is no available literature about the utilization of asymptotic analysis of neural networks in regression for spatial data. This article proposes a consistent localized two-layer deep neural network-based regression for spatial data. We have proved the consistency of this deep neural network for bounded and unbounded spatial domains under a fixed sampling design of mixed-increasing spatial regions. We have proved that its asymptotic convergence rate is faster than that of cite{zhan2024neural}'s neural network and an improved generalization of cite{shen2023asymptotic}'s neural network structure. We empirically observe the rate of convergence of discrepancy measures between the empirical probability distribution of observed and predicted data, which will become faster for a less smooth spatial surface. We have applied our asymptotic analysis of deep neural networks to the estimation of the monthly average temperature of major cities in the USA from its satellite image. This application is an effective showcase of non-linear spatial regression. We demonstrate our methodology with simulated lattice data in various scenarios.
Category: Statistics

[344] viXra:2408.0103 [pdf] submitted on 2024-08-26 02:11:07

Digit Occurrence of an Ordinal’s Factorial in Euler’s Number

Authors: Parker Emmerson
Comments: 10 Pages. (Abstract added by viXra Admin as required - Please conform!)

We analyze the frequency of digit occurrences in that digit's factorial expression in different bases. I write programs for visualizing it.
Category: Statistics

[343] viXra:2407.0015 [pdf] submitted on 2024-07-02 06:50:01

Introduction to the Probability Theory

Authors: Taha Sochi
Comments: 185 Pages.

This book is a collection of notes and solved problems about probability theory. The book also contains proposed exercises attached to the solved problems as well as computer codes (in C++ language) added to some of these problems for the purpose of calculation, test and simulation. Illustrations (such as figures and tables) are added when necessary or appropriate to enhance clarity and improve understanding. In most cases intuitive arguments and methods are used to make the notes and solutions natural and instinctive. Like my previous books, maximum clarity was one of the main objectives and criteria in determining the style of writing, presenting and structuring the book as well as selecting its contents.
Category: Statistics

[342] viXra:2407.0002 [pdf] submitted on 2024-07-01 14:25:14

Hybrid Approach of Hypothesis Testing to Test the Mean Difference Between Two Groups Utilising Gaussian Distribution and Confidence Interval

Authors: Kazi Sakib Hasan
Comments: 24 Pages.

This paper presents an easier and new robust method for hypothesis testing to conclude significant mean differences between two independent or paired samples using the concepts of location, variability, confidence intervals and Gaussian distribution. For hypothesis testing of two samples, t-test is widely used. Beside this, Wilcoxon signed-rank test and often permutation test is also conducted. Each of these methods have their own rigorousness and drawbacks for which general people and non-statistics students often find it hard to conduct experiments using these. To fix these issues, a new method of hypothesis testing is proposed in this paper that basically utilises the properties of normally distributed data and resampling, and is relatively easier to calculate using only pen and paper. The time complexity analysis of each program is also conducted to give a concise overview about which hypothesis testing algorithm is more efficient and faster to execute, since statisticians use a lot of software nowadays for their analytical tasks.
Category: Statistics

[341] viXra:2406.0160 [pdf] submitted on 2024-06-27 17:11:50

Likelihood Measures for Classifying Frequency Response Functions from Posture Control Experiments

Authors: Vittorio Lippi
Comments: 4 Pages. presented at International Conference on Mathematical Analysis and Applications in Science and Engineering 20 - 22 June 2024, Porto, Portugal

The frequency response function (FRF) is an established way to describe the outcome of experiments in posture control literature. The FRF is an empirical transfer function between an input stimulus and the induced body segment sway profile, represented as a vector of complex values associated with a vector of frequencies. Having obtained an FRF from a trial with a subject, it can be useful to quantify the likelihood it belongs to a certain population, e.g., to diagnose a condition or to evaluate the human likeliness of a humanoid robot or a wearable device. In this work, a recently proposed method for FRF statistics based on confidence bands computed with Bootstrap will be summarized, and, on its basis, possible ways to quantify the likelihood of FRFs belonging to a given set will be proposed
Category: Statistics

[340] viXra:2406.0159 [pdf] submitted on 2024-06-27 20:15:06

Random Field Theory for Testing Differences Between Frequency Response Functions in Posturography

Authors: Vittorio Lippi
Comments: 2 Pages. Presented at 9th International Posture Symposium, Smolenice 2023 (Note by viXra Admin: An abstract on the article is required)

The frequency response function (FRF) is an established way to describe the outcome of experiments in posture control literature. Specifically, the FRF is an empirical transfer function between an input stimulus and the induced body movement. By definition, the FRF is a complex function of frequency. When statistical analysis is performed to assess differences between groups of FRFs (e.g., obtained under different conditions or from a group of patients and a control group), the FRF's structure should be considered. Usually, the statistics are performed defined a scalar variable to be studied, such as the norm of the difference between FRFs, or considering the components independently (that can be applied to real and complex components separately), in some cases both approaches are integrated, e.g., the comparison frequency-by frequency is used as a post hoc test when the null hypothesis is rejected on the scalar value. The two components of the complex values can be tested with multivariate methods such as Hotelling’s T2 as done in on the averages of the FRF over all the frequencies, where a further post hoc test is performed applying bootstrap on magnitude and phase separately. The problem with the definition of a scalar variable as the norm of the differences or the difference of the averages in the previous examples is that it introduces an arbitrary metric that, although reasonable, has no substantial connection with the experiment unless the scalar value is assumed a priori as the object of the study as in where a human-likeness score for humanoid robots is defined on the basis of FRFs difference. On the other hand, testing frequencies (and components) separately does not consider that the FRF's values are not independent, and applying corrections for multiple comparisons, e.g., Bonferroni can result in a too conservative approach destroying the power of the experiment. In order to properly consider the nature of the FRF, a method based on random field theory is presented. A case study with data from posture control experiments is presented. To take into account the two components (imaginary and real) as two independent variables, the fact that the same subject repeated the test in the two conditions, a 1-D implementation of the Hotelling T2 is used as presented in but applied in the frequency domain instead of the time domain.
Category: Statistics

[339] viXra:2406.0055 [pdf] submitted on 2024-06-12 21:00:31

A Note on the Area Under the Likelihood and the Fake Evidence for Model Selection

Authors: L. Martino, F. Llorente
Comments: 22 Pages.

Improper priors are not allowed for the computation of the Bayesian evidence Z = p(y) (a.k.a., marginal likelihood), since in this case Z is not completely specified due to an arbitrary constant involved in the computation. However, in this work, we remark that they can be employed in a specific type of model selection problem: when we have several (possibly infinite) models belonging to the same parametric family (i.e., for tuning parameters of a parametric model). However, the quantities involved in this type of selection cannot be considered as Bayesian evidences: we suggest to use the name "fake evidences" (or "areas under the likelihood" in the case of uniform improper priors). We also show that, in this model selection scenario, using a diu2002use prior and increasing its scale parameter asymptotically to infinity, we cannot recover the value of the area under the likelihood, obtained with a uniform improper prior. We first discuss it from a general point of view. Then we provide, as an applicative example, all the details for Bayesian regression models with nonlinear bases, considering two cases: the use of a uniform improper prior and the use of a Gaussian prior, respectively. A numerical experiment is also provided confirming and checking all the previous statements.
Category: Statistics

[338] viXra:2404.0105 [pdf] submitted on 2024-04-21 12:12:42

An Index of Effective Number of Variables for Uncertainty and Reliability Analysis in Model Selection Problems

Authors: L. Martino, E. Morgado, R. San Millan Castillo
Comments: 20 Pages.

An index of effective number of variables (ENV) is introduced for model selection in nested models. This is the case, for instance, when we have to decide the order of a polynomial function or the number of bases in a nonlinear regression, or choose the number of clusters in a clustering problem, or the number of feature in a variable selection application (to name few examples). It is inspired by the concept of maximum area under the curve (AUC) idea and the Gini index. The interpretation of the ENV index is identical to the effective sample size (ESS) indices with respect to a set of samples. The ENV index improves some drawback the elbow detectors described in the literature, and introduces different measures of uncertainty and reliability of the proposed solution. These novel reliability measures can be employed also jointly with the use different information criteria such as the well-known AIC and BIC. Comparisons with classical and recent schemes are provided in different experiments involving real datasets. Related Matlab code is given.
Category: Statistics

[337] viXra:2404.0064 [pdf] submitted on 2024-04-13 20:44:31

A Uniform Lower Bound for the Probability of K Players Tied for First Place Using Supertelescoping Series

Authors: Mathis Antonetti
Comments: 3 Pages.

In this note, we establish a uniform lower bound (w.r.t. the number of players) for the probability of k players tied for first place in the geometric case. To derive this bound, we introduce the concept of supertelescoping series as a generalization of telescoping series. We also provide an insight on the relationship between supertelescopic series and supermartingales.
Category: Statistics

[336] viXra:2402.0093 [pdf] submitted on 2024-02-18 11:04:32

Second Moment/order Approximations by Kernel Smoothers with Application to Volatility Estimation

Authors: L. Beleña, E. Curbelo, L. Martino, V. Laparra
Comments: 14 Pages.

Volatility estimation and quantile regression are relevant active research areas in statistics, machine learning and econometrics. In this work, we propose two procedures to estimate local variances in generic regression problems by using of kernel smoothers. The proposed schemes can be applied in multidimesional scenarios (not just for time series analysis) and easily in a multi-output framework, as well. Moreover, they allow the possibility of providing uncertainty estimation using a generic kernel smoother technique. Several numerical experiments show the benefits of the proposed methods, even comparing with benchmark techniques. One of these experiment involves a real dataset analysis.
Category: Statistics

[335] viXra:2402.0061 [pdf] submitted on 2024-02-12 07:13:17

Fit Probability Density Function Without Knowing the Form of Distribution

Authors: Dajun Chen
Comments: 2 Pages.

This paper proposes two methods for fitting probability density function only with samples from the distribution. The methods are inspired by Generative Adversarial Networks . The demos run in Pytorch and they are available on https://github.com/chendajunAlpha/Fit-probability-density-function
Category: Statistics

[334] viXra:2312.0089 [pdf] submitted on 2023-12-17 14:49:03

The Excess Mortality is Strongly Underestimated

Authors: Hans Lugtigheid
Comments: 15 Pages.

This article analyses the conjecture that excess mortality is underestimated with the pandemic.I use the numbers from the CBS (Dutch Central Bureau for Statistics) as an example. As a baseline we take the expected mortality for 2021 and 2022 from 2019. I correct this expected mortality with the estimated number of people who died in earlier years than expected because of the pandemic. For 2021 this correction is 8K. The CBS expects the mortality to be almost equal to the estimate from 2019. Then the excess mortality increases from 16K (CBS) to 24K.I present the following idea to explain the difference. At the beginning of very year the numbers of people in year groups are usually adjusted by applying a historical determined percentage to the population at January first. Covid hits the weakest the hardest. This changes the distribution of the expected remaining life years in the year group. And thus the average expected remaining life years. Hence the percentage has to be adjusted. Then the expected mortality decreases and the excess mortality increases.The excess mortality within a year are people who for example died in April from covid but who would have died in October without the pandemic. With this number total excess mortality rises with 6K to 30K.Excess mortality is divided in covid and non-covid. De large increase in non-covid deaths is striking.The analysis supports the conjecture that excess mortality is underestimated.Note: The numbers in this article are for the Netherlands. For you own country use the appropriate numbers.
Category: Statistics

[333] viXra:2312.0088 [pdf] submitted on 2023-12-17 23:25:17

Expected Mortality: Adjustment for Distribution in Age-Groups

Authors: Hans Lugtigheid
Comments: 4 Pages.

This article discusses the influence of a disturbance like covid on the calculation of life expectancy in year groups etcetera. Life expectancies in year-groups are usually adjusted in the beginning of the year based on the population in the beginning of the year. This is done with a percentage based on previous years. This percentage is a reflection of volume. With the pandemic the weak were hit heavily by covid. A consequence is that the distribution of life expectancy changes in the year groups. This increases the life expectancy and decreases the expected mortality in the year group. Then the calculation for the year groups has to be adjusted accordingly. In this article I give an example of such adjustment. One can accordingly adjust likewise statistics.
Category: Statistics

[332] viXra:2311.0085 [pdf] submitted on 2023-11-19 02:50:22

A Framework for Modeling, Analyzing, and Decision-Making in Disease Spread Dynamics and Medicine/Vaccine Distribution

Authors: Zenin Easa Panthakkalakath, Neeraj, Jimson Mathew
Comments: 12 Pages.

The challenges posed by epidemics and pandemics are immense, especially if the causes are novel. This article introduces a versatile open-source simulation framework designed to model intricate dynamics of infectious diseases across diverse population centres. Taking inspiration from historical precedents such as the Spanish flu and COVID-19, and geographical economic theories such as Central place theory, the simulation integrates agent-based modelling to depict the movement and interactions of individuals within different settlement hierarchies. Additionally, the framework provides a tool for decision-makers to assess and strategize optimal distribution plans for limited resources like vaccines or cures as well as to impose mobility restrictions.
Category: Statistics

[331] viXra:2310.0050 [pdf] submitted on 2023-10-10 22:03:45

Ratios of Exponential Functions, Interpolation

Authors: Bukac Josef
Comments: 11 Pages. We use interpolation to get the starting values of parameters. Another paper about The singularity of the atrix appearing in the Gauss-Newton method will follow.

We describe models of proportions depending on some independent quantitative variables. An explicit formula for inverse matrices facilitatesinterpolation as a way to calculate the starting values for iterations in nonlinear regression with logistic functions or ratios of exponential functions.
Category: Statistics

[330] viXra:2310.0032 [pdf] submitted on 2023-10-06 15:52:16

Adaptive Posterior Distributions for Uncertainty Analysis of Covariance Matrices in Bayesian Inversion Problems for Multioutput Signals

Authors: E. Curbelo, L. Martino, F. Llorente, D. Delgado-Gomez
Comments: 28 Pages.

In this paper we address the problem of performing Bayesian inference for the parameters of a nonlinear multi-output model and the covariance matrix of the different output signals. We proposean adaptive importance sampling (AIS) scheme for multivariate Bayesian inversion problems, which is based in two main ideas: the variables of interest are split in two blocks and the inferencetakes advantage of known analytical optimization formulas. We estimate both the unknown parameters of the multivariate non-linear model and the covariance matrix of the noise. In the firstpart of the proposed inference scheme, a novel AIS technique called adaptive target AIS (ATAIS) is designed, which alternates iteratively between an IS technique over the parameters of the nonlinearmodel and a frequentist approach for the covariance matrix of the noise. In the second part of the proposed inference scheme, a prior density over the covariance matrix is considered and the cloud of samples obtained by ATAIS are recycled and re-weighted for obtaining a complete Bayesian study over the model parameters and covariance matrix. ATAIS is the main contribution of the work. Additionally, the inverted layered importance sampling (ILIS) is presented as a possible compelling algorithm (but based on a conceptually simpler idea). Different numerical examples show the benefits of the proposed approaches.
Category: Statistics

[329] viXra:2309.0010 [pdf] submitted on 2023-09-01 07:15:18

Securing the Foundations of Probability Theory

Authors: Randolph L. Gerl
Comments: 6 Pages.

Several traditional problems in probability theory are discussed and a resolution to them is proposed. The use of probability theory in the study of physical reality is contrasted with its use in pure mathematics and the latter is found to be problematic. The proposed resolution is postulated to work for all physical reality but is inclusive enough to cover many situations in pure mathematics.
Category: Statistics

[328] viXra:2308.0183 [pdf] submitted on 2023-08-27 16:13:11

Linear Compositional Regression

Authors: Josef Bukac
Comments: 7 Pages. A paper on interpolation by generalized logistic functions will follow

We study the properties of regression coefficients when the sum of the dependent variables is one,ie, the dependent variables are compositional.We show that the sum of intercepts is equal toone and the sum of other corresponding regressioncoefficients is zero. We do it for simple linearregressions and also for a more general case usingmatrix notation. The last part treats the casewhen the dependent variables do not sum up to one. We simplify the well known formula derived by theuse of Lagrange multipliers.
Category: Statistics

[327] viXra:2307.0056 [pdf] submitted on 2023-07-11 16:50:28

An Automatic Counting System of Small Objects in Noisy Images with a Noisy Labelled Dataset: Computing the Number of Microglial Cells in Biomedical Images

Authors: L. Martino, P. Paradas, L. Carro, M. M. Garcia, C. Goicoechea, S. Ingrassi
Comments: 20 Pages.

[326] viXra:2306.0081 [pdf] submitted on 2023-06-14 03:36:54

Statistics of L1 Distances in the Finite Square Lattice

Authors: Richard J. Mathar
Comments: 12 Pages.

The L1 distance between two points in a square lattice is the sum of horizontal and vertical absolute differences of the Cartesian coordinates and - as in graph theory - also the minimumnumber of edges to walk to reach one point from the other. The manuscript contains a Java program that computes in a finite square grid of fixed shapethe number of point pairs as a function of that distance.
Category: Statistics

[325] viXra:2305.0011 [pdf] submitted on 2023-05-03 01:21:57

Winning at War: Comparing Different Strategies in a Card Game

Authors: Hakon Olav Torvik
Comments: 6 Pages.

The card game "war" is a simple game usually assumed to not include any element of strategy, only luck. I challenge this notion by noticing that the order of placing cards back into the deck can be used as a strategy. I simulate the game with different strategies, and find that the strategies can significantly increase the chances of winning, but usually increase the time it takes to complete the game. This is however dependent on your opponent using specific strategies. The best advice on strategy seems to be tricking your opponent into following an ordered strategy, while you use a random strategy, a strategy some might object to.
Category: Statistics

[324] viXra:2304.0006 [pdf] submitted on 2023-04-01 22:25:09

The Greggs-Pret Index: a Machine Learning Analysis of Consumer Habits as a Metric for the Socio-Economic North-South Divide in England

Authors: Robin Smith, Kristian C. Z. Haverson
Comments: 5 Pages.

In England, it is anecdotally remarked that the number of Greggs bakeries to be found in a town is a reliable measure of the area’s 'Northern-ness'. Conversely, a commercial competitor to Greggs in the baked goods and sandwiches market, Pret-a-Manger, is reputed to be popular in more 'southern' areas of England. Using a Support Vector Machine and an Artificial Neural Network (ANN) Regression Model, the relative geographical distributions of Greggs and Pret have been utilised for the first time to quantify the North-South divide in England. The calculated dividing lines were each compared to another line, based on Gross Domestic Household Income (GDHI). The lines match remarkably well, and we conclude that this is likely because much of England's wealth is concentrated in London, as are most of England's Pret-a-Manger shops. Further studies were conducted based on the relative geographical distributions of popular supermarkets Morrisons and Waitrose, which are also considered to have a North-South association. This analysis yields different results. For all metrics, the North-South dividing line passes close to the M1 Watford Gap services. As a common British idiom, this location is oft quoted as one point along the English North-South divide, and it is notable that this work agrees. This tongue-in-cheek analysis aims to highlight more serious factors highlighting the North-South divide, such as life expectancy, education, and poverty.
Category: Statistics

[323] viXra:2303.0043 [pdf] submitted on 2023-03-07 02:38:32

[Consideration of Immunity Loss in the Epidemic Model of Respiratory Viruses]

Authors: Johnny J. Mafra Jr.
Comments: 16 Pages.

A previous work on Covid-19 forecast miserably failed to preview the epidemic evolution with the massive vaccination done during 2021. This paper aims to workaround its weak point, which was to not consider immunity loss in its model. The set of SIR equations was reviewed including immunity loss, Beta profile was recalculated and the model was tuned using real data of 2021. This way was achieved a good conformance between the simulation and data, roughly within the calculated uncertainty of 25%. The simulation for 2022 presented Omicron peak but switched in time. The probable explanation for that is an unbalance in Beta profile in the beginning of 2022, resulting in a bigger peak in January and in consequence a smaller one latter, due to more immune people. It was explored the hypothesis of different immunity losses for natural and vaccine immunities. This case showed a theoretical profile similar to the real data observed. As a limit case theoretical study, was verified that the epidemic evolution in several years more similar to real data was the case in that the vaccination didn’t avoid transmission or avoid as little as 20%. Simulation showed, as expected, that if Beta is below some limit the epidemic vanishes. Data showed that Covid-19 seems to be naturally vanishing by itself, meaning that no measures so far were effective. New approaches are speculated to provide a better performance on epidemic combat based on ventilation and air sterilization using GUV. Suggestions on how to test those approaches are presented.
Category: Statistics

[322] viXra:2302.0081 [pdf] submitted on 2023-02-17 17:09:23

Under What Requirements Will Bayes’ Theorem be Meaningful?

Authors: Joseph Palazzo
Comments: 4 Pages.

We establish that all the pertinent elements of an assertion must be real. That if it contains an element M which cannot be classified as real, we say that the assertion is contaminated. We then show that Bayes’ Theorem is invalid.
Category: Statistics

[321] viXra:2301.0134 [pdf] submitted on 2023-01-25 13:50:10

Correlation Between Substance Representing that Tier and Its Typical Price in Several Games Using a Tier System

Authors: Kyumin Nam
Comments: 3 Pages.

Substances representing tier (Iron, Bronze, Silver, Gold, Platinum, Diamond) and its typical price (USD/gram) in several games using a tier system have a positive correlation [1, 2, 5].
Category: Statistics

[320] viXra:2212.0092 [pdf] submitted on 2022-12-09 13:55:42

Plithogenic Probability & Statistics Are Generalizations of Multivariate Probability & Statistics

Authors: Florentin Smarandache
Comments: 10 Pages.

In this paper we exemplify the types of Plithogenic Probability and respectively Plithogenic Statistics. Several applications are given. The Plithogenic Probability of an event to occur is composed from the chances that the event occurs with respect to all random variables (parameters) that determine it. Each such a variable is described by a Probability Distribution (Density) Function, which may be a classical, (T,I,F)-neutrosophic, I-neutrosophic, (T,F)-intuitionistic fuzzy, (T,N,F)-picture fuzzy, (T,N,F)-spherical fuzzy, or (other fuzzy extension) distribution function. The Plithogenic Probability is a generalization of the classical MultiVariate Probability. The analysis of the events described by the plithogenic probability is the Plithogenic Statistics.
Category: Statistics

[319] viXra:2209.0132 [pdf] submitted on 2022-09-23 13:33:45

Universal and Automatic Elbow Detection for Learning the Effective Number of Components in Model Selection Problems

Authors: E. Morgado, L. Martino, R. San Millán-Castillo
Comments: 10 Pages.

We design an automatic elbow detector (UAED) for deciding effective number of components in model selection problems. The relationship with the information criteria widely employed in the literature is also discussed. The proposed UAED does not require the knowledge of a likelihood function and can be easily applied in diverse applications, such as regression and classification, feature and/or order selection, clustering, dimension reduction etc. Several experiments involving synthetic and real data show the advantages of the proposed scheme with benchmark techniques in the literature.
Category: Statistics

[318] viXra:2209.0123 [pdf] submitted on 2022-09-22 20:33:40

Spectral Information Criterion for Automatic Elbow Detection

Authors: L. Martino, R. San Millán-Castillo, E. Morgado
Comments: 20 Pages.

We introduce a generalized information criterion which contains other well-known information criteria, such as BIC and AIC, as special cases. Furthermore, the proposed spectral information criterion (SIC) is also more general than the other information criteria, e.g., since the knowledge of a likelihood function is not strictly required. SIC extracts geometric features of the error curve and, as a consequence, it can be considered an automatic elbow detector. SIC provides a subset of all possible models, with a cardinalitythat often is much smaller than the total number of possible models. The elements of this subset are "elbows" of the error curve. A practical rule for selecting a unique model withinthe sets of elbows is suggested as well. Several experiments involving ideal scenarios, synthetic data and real data show the benefits of the proposed scheme. Matlab code related to theexperiments is available.
Category: Statistics

[317] viXra:2207.0168 [pdf] submitted on 2022-07-28 22:43:14

Explicit Statistical Assay of Suicide Rates in Germany and Brazil

Authors: Anurag Dutta, Manan Roy Choudhury, Seemantini Chattopadhyay
Comments: 10 Pages. (A non-essential image deemed to be insensitive is blocked by viXra Admin)

Background: Suicide, the act of self-hurting or killing intentionally is in great spurt these days. It is the result of mental disorders resulting from depression, anxiety, or stress.
Methods: In this study, we have analyzed the dataset of suicide cases for one developing country - "Brazil", and one developed country - "Germany", and have used Statistical Methods, along with Machine Learning techniques to obtain a clear idea.
Results: We discovered that the Suicide Rate in Brazil is quite high in comparison to the Suicide Rate in Germany.
Conclusions: Our results provide a shred of evidence that the development status of the country, along with some more factors, like Per - Capita Income, Employment, Literacy, etc. in some way or the other affects the suicide rate of a country.
Category: Statistics

[316] viXra:2204.0154 [pdf] submitted on 2022-04-26 18:14:52

The Proof of Riemann Hypothesis

Authors: Minuk Choi
Comments: 7 Pages.

The proposition that is “the ratio of numbers that have an even number and odd number of prime factors none repeated is 50 : 50” is equivalence relation with Riemann hypothesis. I prove this proposition using the posterior distribution of discrete uniform distribution.
Category: Statistics

[315] viXra:2202.0089 [pdf] submitted on 2022-02-13 23:14:57

Some (Vaguely Meaningful) Fun With A Coin Toss Game: The "(St.) Petersburg" Game Paradox

Authors: Gary J. Duggan
Comments: 5 Pages.

A simple coin toss game, attributed to Nicolaus Bernoulli in the early 1700s, results in a mathematical paradox which still appears to be subject to what might be described as "conceptual" rather than "mathematical" solutions. A mathematical solution is given showing that, if the number of games is 2^m-1 then the average payout per game for this number of games is m/(2-(1/2^(m-1))).
Category: Statistics

[314] viXra:2202.0084 [pdf] submitted on 2022-02-13 23:24:08

COVID-19 and All-Cause Mortality Data by Age Group Reveals Risk of COVID Vaccine-Induced Fatality is Equal to or Greater than the Risk of a COVID death for all Age Groups Under 80 Years Old as of 6 February 2022

Authors: Kathy Dopp, Stephanie Seneff
Comments: 21 Pages.

As of 6 February 2022, based on publicly available official UK and US data, all age groups under 50 years old are at greater risk of fatality after receiving a COVID-19 inoculation than an unvaccinated person is at risk of a COVID-19 death. All age groups under 80 years old have virtually no benefit from receiving a COVID-19 inoculation, and the younger ages incur significant risk. This analysis is conservative because it ignores the fact that inoculation-induced adverse events such as thrombosis, myocarditis, Bell’s palsy, and other vaccine-induced injuries can lead to shortened life span. When one takes into consideration the fact that there is approximately a 90% decrease in risk of COVID-19 death if early treatment is provided to all symptomatic high-risk persons, one can only conclude that mandates of COVID-19 inoculations are ill-advised. Considering the emergence of antibody-resistant variants like Delta and Omicron, for most age groups COVID-19 vaccine inoculations result in higher death rates than COVID-19 does for the unvaccinated.
Category: Statistics

[313] viXra:2201.0152 [pdf] submitted on 2022-01-23 18:41:23

Forensic Analysis of Lucy I and Lucy II

Authors: Robert Bennett
Comments: 6 Pages.

A quantitative test for the probability that two sets of photos are of the same woman. The result for 7 facial characteristics in each photo is that the odds are 30 million to 1 that Lucy I and Lucy II are the same person.
Category: Statistics

[312] viXra:2112.0158 [pdf] submitted on 2021-12-30 18:14:01

An Exhaustive Variable Selection Study for Linear Models of Soundscape Emotions: Rankings and Gibbs Analysis

Authors: R. San Millán-Castillo, L. Martino, E. Morgado, F. Llorente
Comments: 18 Pages.

In the last years, soundscapes have become one of the most active topics in Acoustics, providing a holistic approach to the acoustic environment, which involves human perception and context. Soundscapes-elicited emotions are central and substantially subtle and unnoticed (compared to speech or music). Currently, soundscape emotion recognition is a hot topic in the literature. We provide an exhaustive variable selection study (i.e., a selection of the soundscapes indicators) to a well-known dataset (emo-soundscapes).We consider linear soundscape emotion models for two soundscapes descriptors: arousal and valence. Several ranking schemes and procedures for selecting the number of variables are applied. We have also performed an alternating optimization scheme for obtaining the best sequences keeping fixed a certain number of features. Furthermore, we have designed a novel technique based on Gibbs sampling, which provides a more complete and clear view of the relevance of each variable. Finally, we have also compared our results with the analysis obtained by the classical methods based on p-values. As a result of our study, we suggest two simple and parsimonious linear models of only 7 and 16 variables (within the 122 possible features) for the two outputs (arousal and valence), respectively. The suggested linear models provide very good and competitive performance, with R2 > 0.86 and R2 > 0.63 (values obtained after a cross-validation procedure), respectively.
Category: Statistics

[311] viXra:2112.0058 [pdf] submitted on 2021-12-12 20:44:49

An Ordered Sample Mean That's a Bit Like Simpson's Rule

Authors: D Williams
Comments: 2 Pages.

A "Simpson's Rule"-like Ordered Sample Mean is compared with the standard version. It appears to be better at least for small sample sizes. A related integral approximation is also given and tested against the Mid Point Rule. Other Types of Ordered Sample Means need investigating.
Category: Statistics

[310] viXra:2112.0013 [pdf] submitted on 2021-12-02 02:52:21

Minimum with Inequality Constraint Applied to Increasing Cubic, Logistic and Gomperz or Convex Quartic and Biexponential Regressions

Authors: Josef Bukac
Comments: 28 Pages.

We present a method of minimizing an objective function subject to an inequality constraint. It enables us to minimize the sum of squares of deviations in linear regression under inequality restrictions. We demonstrate how to calculate the coefficients of cubic function under the restriction that it is increasing, we also mention how to fit a convex quartic polynomial. We use such results for interpolation as a method for calculation of starting values for iterative methods of fitting some specific functions, such as four-parameter logistic, positive bi-exponential, or Gomperz functions. Curvature-driven interpolation enables such calculations for otherwise solutions to interpolation equations may not exist or may not be unique. We also present examples to illustrate how it works and compare our approach with that of Zhang (2020).
Category: Statistics

[309] viXra:2111.0150 [pdf] submitted on 2021-11-28 14:29:35

Bayesian Inference Via Generalized Thermodynamic Integration

Authors: F. Llorente, L. Martino, D. Delgado
Comments: 17 Pages.

The idea of using a path of tempered posterior distributions has been widely applied in the literature for the computation of marginal likelihoods (a.k.a., Bayesian evidence). Thermodynamic integration, path sampling and annealing importance sampling are well-known examples of algorithms belonging to this family of methods. In this work, we introduce a generalized thermodynamic integration (GTI) scheme which is able to perform a complete Bayesian inference, i.e., GTI can approximate generic posterior exceptions (not only the marginal likelihood). Several scenarios of application of GTI are discussed and different numerical simulations are provided.
Category: Statistics

[308] viXra:2111.0145 [pdf] submitted on 2021-11-28 17:11:33

Effective Sample Size Approximations as Entropy Measures

Authors: L. Martino, V. Elvira
Comments: 11 Pages.

In this work, we analyze alternative e ective sample size (ESS) measures for importance sampling algorithms. More specifically, we study a family of ESS approximations introduced in [11]. We show that all the ESS functions included in this family (called Huggins-Roy's family) satisfy all the required theoretical conditions introduced in [17]. We also highlight the relationship of this family with the Renyi entropy. By numerical simulations, we study the performance of different ESS approximations introducing also an optimal linear combination of the most promising ESS indices introduced in literature. Moreover, we obtain the best ESS approximation within the Huggins-Roy's family, that provides almost a perfect match with the theoretical ESS values.
Category: Statistics

[307] viXra:2111.0012 [pdf] submitted on 2021-11-02 20:50:03

A Revised Comparison Between Fama and French Five-Factor Model and Three-Factor Model——based on China's a-Share Market

Authors: Zhijing Zhang, Yue Yu, Qinghua Ma, Haixiang Yao
Comments: 18 Pages.

In allusion to some contradicting results in existing research, this paper selects China's latest stock data from 2005 to 2020 for empirical analysis. By choosing this periods’ data, we avoid the periods of China's significant stock market reforms to reduce the impact of the government's policy on the factor effect. In this paper, the redundant factors (HML, CMA) are orthogonalized, and the regression analysis of 5*5 portfolio of Size-B/M and Size-Inv is carried out with these two orthogonalized factors. It found that the HML and the CMA are still significant in many portfolios, indicating that they have a strong explanatory ability, which is also consistent with the results of GRS test. All these show that the five-factor model has a better ability to explain the excess return rate. In the concrete analysis, this paper uses the methods of the five- factor 25-group portfolio returns calculation, the five-factor regression analysis, the orthogonal treatment, the five-factor 25-group regression and the GRS test to more comprehensively explain the excellent explanatory ability of the five-factor model to the excess return. Then, we analyze the possible reasons for the strong explanatory ability of the HML, CMA and RMW from the aspects of price to book ratio, turnover rate and correlation coefficient. We also give a detailed explanation of the results, and analyze the changes of China's stock market policy and investors' investment style recent years. Finally, this paper attempts to put forward some useful suggestions on the development of asset pricing model and China's stock market.
Category: Statistics

[306] viXra:2110.0128 [pdf] submitted on 2021-10-22 04:13:21

Violating the Second Law of Thermodynamics in a Dynamical System Through Equivalence Closure Via Mutual Information Carriers of a 5-Tuple Measure Space

Authors: Deep Bhattacharjee
Comments: 22 Pages, 5 Figures, TechRxiv (Computations), Ergodic Theory

Time and space average of an ergodic systems following the 5-tuple relations (A,~,J,Σ,μ) through the initial increment from a+bθ to a+c+bθ indicates the entropy to be reserved in the deterministic yet dynamical and conservative systems to hold for the set S_p= S_1 ∑_(i=2)^∞_S_i keeping S as the entropy ∃(S_∞=⋯S_3=S_2 )>S_1 obeying the Poincare ́ recurrence theorem throughout the constant attractor A. This in turn states the facts of the equivalence closure as the property of the induced systems to resemblance an entropy conserving scenarios.
Category: Statistics

[305] viXra:2110.0032 [pdf] submitted on 2021-10-07 09:24:06

On the Safe Use of Prior Densities for Bayesian Model Selection in Physics

Authors: F. Llorente, L. Martino, E. Curbelo, J. Lopez-Santiago, D. Delgado
Comments: 25 Pages.

The application of Bayesian inference in physics for the purpose of model selection is very popular nowadays. In this framework, models are compared through their marginallikelihoods, or their quotients, called Bayes factors. However, marginal likelihoods show strong dependence on the prior choice, even when the data are very informative, unlike the posterior distribution. Furthermore, when the prior is improper, the marginal likelihood of the corresponding model is undetermined. In this work, we aim to raise awareness about the issue of prior sensitivity of the marginal likelihood and its role in model selection. We also comment on the use of uninformative priors, which are very common choices in practice. Several practical suggestions are provided and possible solutions allowing the use of improper priors are discussed. The connection between the marginal likelihood approach and the well- known information criteria is also presented. We describe all the issues and possible solutions by illustrative numerical examples (providing some related code). One of them involving a real-world application on exoplanet detection.
Category: Statistics

[304] viXra:2109.0178 [pdf] submitted on 2021-09-24 07:34:10

Optimality in Noisy Importance Sampling

Authors: F. Llorente, L. Martino, J. Read, D. Delgado
Comments: 13 Pages.

Many applications in signal processing and machine learning require the study of probability density functions (pdfs) that can only be accessed through noisy evaluations. In this work, we analyze the noisy importance sampling (IS), i.e., IS working with noisy evaluations of the target density. We present the general framework and derive optimal proposal densities for noisy IS estimators. The optimal proposals incorporate the information of the variance of the noisy realizations, proposing points in regions where the noise power is higher. We also compare the use of the optimal proposals with previous optimality approaches considered in a noisy IS framework.
Category: Statistics

[303] viXra:2107.0131 [pdf] submitted on 2021-07-23 19:10:35

Measurement Space Partitioning for Estimation and Prediction

Authors: Glenn Healey, Shiyuan Zhao
Comments: 31 Pages.

An important and challenging problem in the evaluation of baseball players is the quan- tification of batted-ball talent. This problem has traditionally been addressed using linear regression on the value of a statistic derived from a set of observations. We use large sets of trajectory measurements acquired by in-game sensors to show that the predictive value of a batted ball depends on its physical properties. This knowledge is exploited to estimate batted-ball distributions defined over a multidimensional measurement space from observed distributions by using regression parameters that adapt to batted ball properties. This process is central to a new method for estimating batted-ball talent. The domain of the batted-ball distributions is defined by a partition of measurement space that is selected to optimize the accuracy of the estimates. We present examples illustrating facets of the new approach and use a set of experiments to show that the new method generates estimates that are significantly more accurate than those generated using current methods. The new methodology supports the use of fine-grained contextual adjustments and we show that this process further improves the accuracy of the technique.
Category: Statistics

[302] viXra:2107.0031 [pdf] submitted on 2021-07-05 20:36:40

Improvement of the Matlab Program Proposed in Vixra:2103.0018

Authors: Joh. J. Sauren, Aloys J. Sipers
Comments: 8 Pages. [Corrections made by viXra Admin to conform with the requirements of viXra.org]

In this article, the Matlab program proposed in the article viXra:2103.0018 is improved. Further, the constant d3 depends on the constants d2 and a3. Three theorems are stated on the generating functions for the constants d2 and a3. The first two theorems provide analytical expressions for these generating functions, whereas the third theorem relates them.
Category: Statistics

[301] viXra:2106.0144 [pdf] submitted on 2021-06-24 18:41:26

Wave Packets of Relaxation Type in Boundary Problems of Quantum Mechanics

Authors: Igor B. Krasnyuk
Comments: 23 Pages.

An initial value boundary problem for the linear Schr ˙odinger equation with nonlinear functional boundary conditions is considered. It is shown that attractor of problem contains periodic piecewise constant functions on the complex plane with ﬁnite points of discontinuities on a period. The method of reduction of the problem to a system of integro-diﬀerence equations has been applied. Applications to optical resonators with feedback has been considered. The elements of the attractor can be interpreted as white and black solitons in nonlinear optics.
Category: Statistics

[300] viXra:2106.0036 [pdf] submitted on 2021-06-07 20:38:46

Introduction to the Gaussian Information Criterion

Authors: Russell Leidich
Comments: 9 Pages. [Corrections are made by viXra Admin to comply with the rules of viXra.org]

There are many applications involving physical measurements which are expected to result in a probability density function (PDF) which is asymptotically Gaussian (normal) or lognormal. In the latter case, we can simply take the logs of the (positive) samples in order to obtain the former, so the math in this paper will focus exclusively on Gaussians.

For example, we would expect the distribution of radio power received at a dish to be lognormally distributed, given a sufficiently broad swath of sky to observe for a sufficiently long duration, and in the relative absence of terrestrial radio interference. However, if we were then to focus on a particular star system, the observed "experimental" PDF could substantially deviate from that "background" PDF. It might not even be lognormal if, for example, the star exhibits peaks in radio power at a few distinct frequencies.

It would therefore be useful to have a means to quantify the "surprise" factor of experimental PDFs relative to an established background PDF which is known to be, or be equivalent to, a Gaussian. If a given experimental PDF where also known to be Gaussian, then we could do this by employing the Kullback-Leibler (KL) divergence from one to the other, as Gupta appears to have done for the multidimensional case.

When the experimental PDF is not known to be Gaussian (or any PDF archetype, for that matter), the situation is more complicated, mainly because we are forced to deal with a real-valued set of samples ordered by increasing positivity -- a 1D point cloud, to be precise, although "vector" will suffice for brevity -- rather than an analytic function. Ranking the information cost of encoding such a vector, versus others arising from other experiments, under the prior assumption of the same background PDF, is the subject of this paper. We also investigate the question of ascertaining which background PDF is the most useful for the sake of discriminating anomalous from mundane experimental PDFs.

Category: Statistics

[299] viXra:2104.0046 [pdf] submitted on 2021-04-09 17:05:32

Respiratory Viruses Epidemic Dynamics Covid-19 Case Study and Forecast for 2021 in the Most Affected Countries

Authors: Johnny J. Mafra Jr.
Comments: 20 Pages.

It was researched and adopted a method to introduce a seasonal behavior on SIR model to study the dynamics of covid-19. This method is based on the calculation of β for each week of the year based on observed previous seasonal behavior for several countries and regions, which are the most affected in the world. Was also included in the model the vaccination, which will be a factor of major effect on this dynamic in 2021. The model was used to build a simulator and was done the determination of β and the forecast of covid-19 cases for USA, Brazil and India. β was found to range seasonally from 0,15 to 0,40 or from 0,10 to 0,80 depending on the region. It was found that vaccination will be very effective in reducing the cases in 2021 and that the herd immunity will be reached when around 55% of the population be immune. The simulation took to some unexpected findings, like the effect of lockdown in later waves of the epidemic and about the epidemic dynamics. It was found a condition for exogenic respiratory viruses that triggers a major epidemic and a condition that explains why a respiratory virus for which part of the population is already immune has a seasonal behavior, with a small number of cases. This dynamic explains the evolution of covid-19 in 2020 and 2021 and even the Spanish flu in 1918 and 1919.
Category: Statistics

[298] viXra:2103.0173 [pdf] submitted on 2021-03-27 02:04:39

Datasailr an R Package for Row by Row Data Processing, Using Datasailr Script

Authors: Toshihiro Umehara
Comments: 8 Pages.

Data processing and data cleaning are essential steps before applying statistical or machine learning procedures. R provides a flexible way for data processing using vectors. R packages also provide other ways for manipulating data such as using SQL and using chained functions. I present yet another way to process data in a row by row manner using data manipulation oriented script, DataSailr script. This article introduces datasailr package, and shows potential benefits of using domain specific language for data processing.
Category: Statistics

[297] viXra:2103.0079 [pdf] submitted on 2021-03-12 01:15:46

The Scale Invariant Prior and Its Generalizations

Authors: Stephen P. Smith
Comments: 8 Pages.

The scale invariant prior is revisited, for a single variance parameter and for a variance-covariance matrix. These results are generalized to develop different scale invariant priors where probability measure is assigned through the sum of variance components that represent partitions of total variance, or through a sum of variance-covariance matrices representing partitions of a total variance-covariance matrix.
Category: Statistics

[296] viXra:2103.0018 [pdf] submitted on 2021-03-03 14:39:39

On the Computation of the Principal Constants $d_{2}$ and $d_{3}$ Used to Construct Control Limits for Control Charts Applied in Statistical Process Control

Authors: Joh. J. Sauren
Comments: 3 Pages.

In this communication a short and straightforward algorithm, written in Octave (version 6.1.0 (2020-11-26))/Matlab (version '9.9.0.1538559 (R2020b) Update 3'), is proposed for brute-force computation of the principal constants $d_{2}$ and $d_{3}$ used to calculate control limits for various types of variables control charts encountered in statistical process control (SPC).
Category: Statistics

[295] viXra:2103.0008 [pdf] submitted on 2021-03-02 17:19:51

Compressed Particle Methods for Expensive Models with Application in Astronomy and Remote Sensing

Authors: L. Martino, V. Elvira, J. Lopez-Santiagoy G. Camps-Valls
Comments: 41 Pages.

In many inference problems, the evaluation of complex and costly models is often required. In this context, Bayesian methods have become very popular in several fields over the last years, in order to obtain parameter inversion, model selection or uncertainty quantification. Bayesian inference requires the approximation of complicated integrals involving (often costly) posterior distributions. Generally, this approximation is obtained by means of Monte Carlo (MC) methods. In order to reduce the computational cost of the corresponding technique, surrogate models (also called emulators) are often employed. Another alternative approach is the so-called Approximate Bayesian Computation (ABC) scheme. ABC does not require the evaluation of the costly model but the ability to simulate artificial data according to that model. Moreover, in ABC, the choice of a suitable distance between real and artificial data is also required. In this work, we introduce a novel approach where the expensive model is evaluated only in some well-chosen samples. The selection of these nodes is based on the so-called compressed Monte Carlo (CMC) scheme. We provide theoretical results supporting the novel algorithms and give empirical evidence of the performance of the proposed method in several numerical experiments. Two of them are real-world applications in astronomy and satellite remote sensing.
Category: Statistics

[294] viXra:2102.0094 [pdf] submitted on 2021-02-17 23:44:54

The kth Power Expectile Estimation and Testing

Authors: Fuming Lin, Yingying Jiang, Yong Zhou
Comments: 57 Pages.

This paper develops the theory of the kth power expectile estimation and considers its relevant hypothesis tests for coefficients of linear regression models. We prove that the asymptotic covariance matrix of kth power expectile regression converges to that of quantile regression as k converges to one, and hence provide a moment estimator of asymptotic matrix of quantile regression. The kth power expectile regression is then utilized to test for homoskedasticity and conditional symmetry of the data. Detailed comparisons of the local power among the kth power expectile regression tests, the quantile regression test, and the expectile regression test have been provided. When the underlying distribution is not standard normal, results show that the optimal k are often larger than 1 and smaller than 2, which suggests the general kth power expectile regression is necessary.
Category: Statistics

[293] viXra:2102.0027 [pdf] submitted on 2021-02-05 13:05:30

Hamiltonian Markov Chain Monte Carlo Using Second Derivatives

Authors: Stephen P. Smith
Comments: 10 Pages.

Hamiltonian Markov Chain Monte Carlo is one of the established methods to conduct a Bayesian simulation. This method uses evaluations of the probability density and its gradient at particular variables. This present paper describes how to incorporate information from second derivatives that relate to a direction set, and describes how to modify the simulation accordingly.
Category: Statistics

[292] viXra:2102.0026 [pdf] submitted on 2021-02-05 22:05:06

Revisiting the UK EU Membership Referendum (Brexit) Poll Tracker

Authors: Michaelino Mervisiano
Comments: 37 Pages. [Corrections made by viXra Admin to conform with the requirements on the Submission Form]

On the 23rd June 2016, the United Kingdom (UK) European Union (EU) membership referendum resulted in 51.9% of voters voted to leave EU—popularly termed as Brexit. Given its significant implications, correctly predicting Brexit was crucial but most pollsters predicted incorrectly. This paper assesses whether Brexit was evident and predictable from the pre-referendum polls data. Unlike previous studies—whose analytical tools are limited to latest poll analysis, descriptive statistics, point estimate, and simple linear regression—this project use more robust and sophisticated statistical methodologies
Category: Statistics

[291] viXra:2101.0082 [pdf] submitted on 2021-01-13 14:07:54

Special Beta Distribution for Big Data Analysis: X ~ Beta (α = λ + 1, β = 2 – λ)

Authors: Kuan-Shian Wang, Mei-Yu Lee
Comments: 89 Pages. [Corrections made to conform with the requirements on the Submission Form]

This book discusses the special case of Beta distribution as α = λ + 1 and β = 2 – λ. To compare with the continuous Bernoulli distribution, the change of λ affected the pdf of the special Beta distribution. Then find out the sufficient statistic, the point estimator, the confidence interval, the test statistic, and the goodness of fit. The special Beta distribution at the case of λ = 0.5 is different from the continuous Bernoulli distribution. The special Beta distribution pdf is changed in smoothing but the Continuous Bernoulli distribution pdf has a big wave when λ is from small to large. As the sample size becomes large, two distributions are approximated to Normal distribution with different relationships between λ and the sum of samples.
Category: Statistics

[290] viXra:2101.0046 [pdf] submitted on 2021-01-06 17:52:45

Analyzing the Side Force on a Baseball Using Hawk-Eye Measurements

Authors: Glenn Healey, Lequan Wang
Comments: 19 Pages.

We use Hawk-Eye measurements to analyze the side force on a baseball.
Category: Statistics

[289] viXra:2101.0034 [pdf] submitted on 2021-01-05 09:18:19

Multi Categories Analytic Method Using Continuous Bernoulli Distribution and Conditional Distribution

Authors: Kuan-Shian Wang, Mei-Yu Lee
Comments: 195 Pages.

This book provides four model designs to discuss how continuous Bernoulli distribution extends to the analysis of K categories. By contrast to the discrete polynomial distribution which is extended from Bernoulli distribution depending on the additive property, the random variable of continuous Bernoulli should be tested the pdf, cdf, distribution, and checked if maintain the characteristics of CB distribution or not. Model 1 is from random variable method(variable-added), Model 2 and 3 are from the probability model-building and suitable for the parameter-added or the conditional relationship of variables, respectively. Model 4 is from the continuous trinomial distribution and suitable for the joint relationship of variables.
Category: Statistics

[288] viXra:2012.0221 [pdf] submitted on 2020-12-30 12:07:53

The Continuous Bernoulli Approaching Distribution When λ → 0 and the Continuous Binomial Distribution

Authors: Kuan-Shian Wang, Mei-Yu Lee
Comments: 37 Pages. [Corrections are made by viXra Admin to comply with the rules of viXra.org]

We provide the mathematical deduction and numerical explanations to verify that as λ → 0, the continuous Bernoulli approximates to the exponential distribution in Chapter 1 and as λ → 0 and λ → 1, the continuous binomial distribution will approximate to Gamma distribution in Chapter 3. Meanwhile, Chapter 2 describes how to compute the continuous Binomial distribution which can be derived by the continuous Bernoulli.
Category: Statistics

[287] viXra:2012.0088 [pdf] submitted on 2020-12-12 09:51:59

Continuous Bernoulli Distribution-Simulator and Test Statistic

Authors: Kuan-Sian Wang, Mei-Yu Lee
Comments: Pages.

We discussed the simulator and test statistic of continuous Bernoulli distribution which is important to test the pervasive error of variational autoencoders in deep learning. We provided the sufficient statistic, the point estimator, the confidence interval, test statistic, goodness of fit, and one-way test for continuous Bernoulli distribution. Besides, continuous binomial distribution can be derived, so the the confidence interval and the test can be worked under two continuous Bernoulli populations. Continuous trinomial distribution can also be find. Please download the computer software of this book from https://github.com/meiyulee/continuous_Bernoulli
Category: Statistics

[286] viXra:2012.0044 [pdf] submitted on 2020-12-07 13:36:14

On a Linnik Theorem in Theory of Errors

Authors: Abdelmajid Ben Hadj Salem
Comments: 7 Pages. In French.

In this note, we give a proof of a theorem of Linnik concerning the theory of errors, stated in his book "Least squares method and the mathematical bases of the statistical theory of the treatment of observations", without proof.
Category: Statistics

[285] viXra:2012.0038 [pdf] submitted on 2020-12-06 14:50:31

Automatic Emulator and Optimized Look-up Table Generation for Radiative Transfer Models

Authors: L. Martino, J. Vicent, G. Camps-Valls
Comments: 5 Pages.

This paper introduces an automatic methodology to construct emulators for costly radiative transfer models (RTMs). The proposed method is sequential and adaptive, and it is based on the notion of the acquisition function by which instead of optimizing the unknown RTM underlying function we propose to achieve accurate approximations. The Automatic Gaussian Process Emulator (AGAPE) methodology combines the interpolation capabilities of Gaussian processes (GPs) with the accurate design of an acquisition function that favors sampling in low density regions and flatness of the interpolation function. We illustrate the good capabilities of the method in toy examples and for the construction of an optimal look-up-table for atmospheric correction based on MODTRAN5.
Category: Statistics

[284] viXra:2012.0037 [pdf] submitted on 2020-12-06 19:04:22

Adaptive Sequential Interpolator Using Active Learning for Efficient Emulation of Complex Systems

Authors: L.Martino, D. Heestermans Svendsen, J. Vicent, G. Camps-Valls
Comments: 5 Pages.

Many fields of science and engineering require the use of complex and computationally expensive models to understand the involved processes in the system of interest. Nevertheless, due to the high cost involved, the required study becomes a cumbersome process. This paper introduces an interpolation procedure which belongs to the family of active learning algorithms, in order to construct cheap surrogate models of such costly complex systems. The proposed technique is sequential and adaptive, and is based on the optimization of a suitable acquisition function. We illustrate its efficiency in a toy example and for the construction of an emulator of an atmosphere modeling system.
Category: Statistics

[283] viXra:2012.0036 [pdf] submitted on 2020-12-06 19:06:41

Particle Group Metropolis Methods for Tracking the Leaf Area Index

Authors: L. Martino, V. Elvira, G. Camps-Valls
Comments: 5 Pages.

Monte Carlo (MC) algorithms are widely used for Bayesian inference in statistics, signal processing, and machine learning. In this work, we introduce an Markov Chain Monte Carlo (MCMC) technique driven by a particle filter. The resulting scheme is a generalization of the so-called Particle Metropolis-Hastings (PMH) method, where a suitable Markov chain of sets of weighted samples is generated. We also introduce a marginal version for the goal of jointly inferring dynamic and static variables. The proposed algorithms outperform the corresponding standard PMH schemes, as shown by numerical experiments.
Category: Statistics

[282] viXra:2012.0035 [pdf] submitted on 2020-12-06 15:16:02

Group Metropolis Sampling

Authors: L. Martino, V. Elvira, G. Camps-Valls
Comments: 5 Pages.

Monte Carlo (MC) methods are widely used for Bayesian inference and optimization in statistics, signal processing and machine learning. Two well-known class of MC methods are the Importance Sampling (IS) techniques and the Markov Chain Monte Carlo (MCMC) algorithms. In this work, we introduce the Group Importance Sampling (GIS) framework where different sets of weighted samples are properly summarized with one summary particle and one summary weight. GIS facilitates the design of novel efficient MC techniques. For instance, we present the Group Metropolis Sampling (GMS) algorithm which produces a Markov chain of sets of weighted samples. GMS in general outperforms other multiple try schemes as shown by means of numerical simulations.
Category: Statistics

[281] viXra:2012.0034 [pdf] submitted on 2020-12-05 11:18:45

Joint Gaussian Processes for Inverse Modeling

Authors: D. Heestermans Svendsen, L. Martino, M. Campos-Taberner, G. Camps-Valls
Comments: 5 Pages.

Solving inverse problems is central in geosciences and remote sensing. Very often a mechanistic physical model of the system exists that solves the forward problem. Inverting the implied radiative transfer model (RTM) equations numerically implies, however, challenging and computationally demanding problems. Statistical models tackle the inverse problem and predict the biophysical parameter of interest from radiance data, exploiting either in situ data or simulated data from an RTM. We introduce a novel nonlinear and nonparametric statistical inversion model which incorporates both real observations and RTM-simulated data. The proposed Joint Gaussian Process (JGP) provides a solid framework for exploiting the regularities between the two types of data, in order to perform inverse modeling. Advantages of the JGP method over competing strategies are shown on both a simple toy example and in leaf area index (LAI) retrieval from Landsat data combined with simulated data generated by the PROSAIL model.
Category: Statistics

[280] viXra:2012.0033 [pdf] submitted on 2020-12-05 11:25:51

Distributed Particle Metropolis-Hastings Schemes

Authors: L. Martino, V. Elvira, G. Camps-Valls
Comments: 5 Pages.

We introduce a Particle Metropolis-Hastings algorithm driven by several parallel particle filters. The communication with the central node requires the transmission of only a set of weighted samples, one per filter. Furthermore, the marginal version of the previous scheme, called Distributed Particle Marginal Metropolis-Hastings (DPMMH) method, is also presented. DPMMH can be used for making inference on both a dynamical and static variable of interest. The ergodicity is guaranteed, and numerical simulations show the advantages of the novel schemes.
Category: Statistics

[279] viXra:2012.0032 [pdf] submitted on 2020-12-05 22:19:11

Probabilistic Cross-Validation Estimators for Gaussian Process Regression

Authors: L. Martino, V. Laparra, G. Camps-Valls
Comments: 5 Pages.

Gaussian Processes (GPs) are state-of-the-art tools for regression. Inference of GP hyperparameters is typically done by maximizing the marginal log-likelihood (ML). If the data truly follows the GP model, using the ML approach is optimal and computationally efficient. Unfortunately very often this is not case and suboptimal results are obtained in terms of prediction error. Alternative procedures such as cross-validation (CV) schemes are often employed instead, but they usually incur in high computational costs. We propose a probabilistic version of CV (PCV) based on two different model pieces in order to reduce the dependence on a specific model choice. PCV presents the benefits from both approaches, and allows us to find the solution for either the maximum a posteriori (MAP) or the Minimum Mean Square Error (MMSE) estimators. Experiments in controlled situations reveal that the PCV solution outperforms ML for both estimators, and that PCV-MMSE results outperforms other traditional approaches.
Category: Statistics

[278] viXra:2012.0031 [pdf] submitted on 2020-12-05 22:21:01

Recycling Gibbs Sampling

Authors: L. Martino, V. Elvira, G. Camps-Valls
Comments: 5 Pages.

Gibbs sampling is a well-known Markov chain Monte Carlo (MCMC) algorithm, extensively used in signal processing, machine learning and statistics. The key point for the successful application of the Gibbs sampler is the ability to draw samples from the full-conditional probability density functions efficiently. In the general case this is not possible, so in order to speed up the convergence of the chain, it is required to generate auxiliary samples. However, such intermediate information is finally disregarded. In this work, we show that these auxiliary samples can be recycled within the Gibbs estimators, improving their efficiency with no extra cost. Theoretical and exhaustive numerical comparisons show the validity of the approach.
Category: Statistics

[277] viXra:2012.0030 [pdf] submitted on 2020-12-05 22:23:48

Multioutput Automatic Emulator for Radiative Transfer Models

Authors: D. Heestermans Svendsen, L. Martino, J. Vicent, G. Camps-Valls
Comments: 4 Pages.

This paper introduces a methodology to construct emulators of costly radiative transfer models (RTMs). The proposed methodology is sequential and adaptive, and it is based on the notion of acquisition functions in Bayesian optimization. Here, instead of optimizing the unknown underlying RTM function, one aims to achieve accurate approximations. The Automatic Multi-Output Gaussian Process Emulator (AMOGAPE) methodology combines the interpolation capabilities of Gaussian processes (GPs) with the accurate design of an acquisition function that favors sampling in low density regions and flatness of the interpolation function. We illustrate the promising capabilities of the method for the construction of an emulator for a standard leaf-canopy RTM.
Category: Statistics

[276] viXra:2011.0183 [pdf] submitted on 2020-11-26 11:08:04

Statistical Analysis of the Presidential Elections in Belarus in 2020

Authors: Sergey L. Cherkas
Comments: 4 Pages. Both English and Russian versions of the paper

A Monte Carlo simulation is performed to calculate а probability of the difference between the average value in some random sample and the average over the total set. Method of analysis of the nature of the peculiarities in the probability distribution functions is suggested. The method consists in comparison of the probability distribution functions for the percentage and the number of voters for Mr. Lukashenko in each polling station.
Category: Statistics

[275] viXra:2011.0015 [pdf] submitted on 2020-11-02 21:38:17

Probability and Stochastic Analysis in Reproducing Kernels and Division by Zero Calculus

Authors: Tsutomu Matsuura, Hiroshi Okumura, Saburou Saitoh
Comments: 21 Pages.

Professor Rolin Zhang kindly invited in The 6th Int'l Conference on Probability and Stochastic Analysis (ICPSA 2021), January 5-7, 2021 in Sanya, China as a Keynote speaker and so, we will state the basic interrelations with reproducing kernels and division by zero from the viewpoint of the conference topics. The connection with reproducing kernels and Probability and Stochastic Analysis are already fundamental and well-known, and so, we will mainly refer to the basic relations with our new division by zero $1/0=0/0=z/0=\tan(\pi/2) =\log 0 =0, [(z^n)/n]_{n=0} = \log z$, $[e^{(1/z)}]_{z=0} = 1$.　
Category: Statistics

[274] viXra:2010.0257 [pdf] submitted on 2020-10-31 19:46:07

Hidden Markov Model Evaluation from First Principles

Authors: Russell Leidich
Comments: 9 Pages. [Corrections made by viXra Admin to conform with the requirements on the Submission Form]

Hidden Markov models (HMMs) are a class of generative stochastic process models which seek to explain, in the simplest possible terms subject to inherent structural constraints, a set of equally long sequences (time series) of observations. Given such a set, an HMM can be trivially constructed which will reproduce the set exactly. Such an approach, however, would amount to verfitting the data, yielding a model that fails to generalize to new observations of the same physical system under analysis. It’s therefore important to consider the information cost (entropy) of describing the HMM itself – not just the entropy of reproducing the observations, which would be zero in the foregoing extreme case, but in general would be the negative log of the probability of such reproduction occurring by chance. The sum of these entropies would then be suitable for the purpose of ranking a set of candidate HMMs by their respective likelihoods of having actually generated the observations in the first place. To the author’s knowledge, however, no approach has yet been derived for the purpose of measuring HMM entropy from first principles, which is the subject of this paper, notwithstanding the popular use of the Bayesian information criterion (BIC) for this purpose.
Category: Statistics

[273] viXra:2010.0002 [pdf] submitted on 2020-10-01 10:42:20

Random Walks Are Not So Random, After All

Authors: Arturo Tozzi
Comments: 9 Pages.

Physical and biological phenomena are often portrayed in terms of random walks, white noise, Markov paths, stochastic trajectories with subsequent symmetry breaks. Here we show that this approach from dynamical systems theory is not profitable when random walks occur in phase spaces of dimensions higher than two. The more the dimensions, the more the (seemingly) stochastic paths are constrained, because their trajectories cannot resume to the starting point. This means that high-dimensional tracks, ubiquitous in real world physical/biological phenomena, cannot be operationally treated in terms of closed paths, symplectic manifolds, Betti numbers, Jordan theorem, topological vortexes. This also means that memoryless events disconnected from the past such as Markov chains cannot exist in high dimensions. Once expunged the operational role of random walks in the assessment of experimental phenomena, we take aim to somewhat “redeem” stochasticity. We suggest two methodological accounts alternative to random walks that partially rescue the operational role of white noise and Markov chains. The first option is to assess multidimensional systems in lower dimensions, the second option is to establish a different role for random walks. We diffusely describe the two alternatives and provide heterogeneous examples from boosting chemistry, tunneling nanotubes, backward entropy, chaotic attractors.
Category: Statistics

[272] viXra:2009.0135 [pdf] submitted on 2020-09-19 11:03:03

Joint Introduction to Gaussian Processes and Relevance Vector Machines with Connections to Kalman Ltering and Other Kernel Smoothers

Authors: L. Martino, J. Read
Comments: 50 Pages.

The expressive power of Bayesian kernel-based methods has led them to become an important tool across many different facets of artificial intelligence, and useful to a plethora of modern application domains, providing both power and interpretability via uncertainty analysis. This article introduces and discusses two methods which straddle the areas of probabilistic Bayesian schemes and kernel methods for regression: Gaussian Processes and Relevance Vector Machines. Our focus is on developing a common framework with which to view these methods, via intermediate methods a probabilistic version of the well-known kernel ridge regression, and drawing connections among them, via dual formulations, and discussion of their application in the context of major tasks: regression, smoothing, interpolation, and filtering. Overall, we provide understanding of the mathematical concepts behind these models, and we summarize and discuss in depth different interpretations and highlight the relationship to other methods, such as linear kernel smoothers, Kalman filtering and Fourier approximations. Throughout, we provide numerous figures to promote understanding, and we make numerous recommendations to practitioners. Benefits and drawbacks of the different techniques are highlighted. To our knowledge, this is the most in-depth study of its kind to date focused on these two methods, and will be relevant to theoretical understanding and practitioners throughout the domains of data-science, signal processing, machine learning, and artificial intelligence in general.
Category: Statistics

[271] viXra:2009.0082 [pdf] submitted on 2020-09-12 12:49:17

Importance of Statistical Tools and Methods in Data Science

Authors: Krish Bajaj
Comments: 10 Pages.

This paper aims to highlight the prominent position of statistics as a foundational pillar for descriptive and inferential statistical analysis to deduce underlying patterns in a population by looking at a sample drawn from the population. It focusses on the intuitive aspects of the statistical tools and its relevance and applicability .The paper concludes by highlighting some common misconceptions and misuse of statistics.
Category: Statistics

[270] viXra:2008.0131 [pdf] submitted on 2020-08-18 20:30:23

Combining Radar, Weather, and Optical Measurements to Model the Dependence of Baseball Lift on Spin and Surface Roughness

Authors: Glenn Healey, Lequan Wang
Comments: 24 Pages.

An accurate model for the lift force on a baseball is important for several applications. The precision of previous models has been limited by the use of small samples of measurements acquired in controlled experiments. The increased prevalence of ball-tracking radar systems provides an abundant source of data for modeling, but the effective use of these data requires overcoming several challenges. We develop a new model that uses this radar data and is constrained by the physical principles and measurements derived from the controlled experiments. The modeling process accounts for the uncertainty in different data sources while exploiting the size and diversity of the radar measurements to mitigate the effects of systematic biases, outliers, and the lack of geometric information that is typically available in controlled experiments. Fine-grained weather data is associated with each radar measurement to enable compensation for the local air density. We show that the new model is accurate enough to capture changes in lift due to small changes in surface roughness which could not be discerned by previous models.
Category: Statistics

[269] viXra:2008.0107 [pdf] submitted on 2020-08-15 11:37:29

Application of Markov Chain Model in Completion Rates

Authors: Idd Sifael Omary, Ngong-homa Jackson, Timothy A. Peter
Comments: 35 Pages. BSc. (Mathematics and Statistics) Research Report Mwenge Catholic University July, 2016.

Completion Rate and Enrollment forecasting is an essential element in budgeting, resource allocation, and the overall planning for the growth of education sector. Our paper purposeful demonstrated the use of Markov chain techniques in studying progression of BSMST Programme Students from the time of entry/enrollment in each academic year to graduation after the expected year of study in MWECAU. The target population included all BSMST programme students in MWECAU from 2013 to 2015. The model used to determine the student’s completion/dropout rate, retention rate and the expected duration of completing by sex. We established the completion rates for male students and that of female students and dropout rates. We saw how long Markov Transition Probability Matrices of BSMST students at MWECAU will be at a steady state. How the established completion and dropout rates will be in Absorbing rates/States. Also we saw female expectation of university education compared to male students in BSMST Programme students. The model was only suitable in make a short period projections.
Category: Statistics

[268] viXra:2008.0065 [pdf] submitted on 2020-08-10 16:54:00

La Théorie des Erreurs (Theory of Errors)

Authors: Philippe Hottier, Abdelmajid Ben Hadj Salem
Comments: 137 Pages. In French. Comments welcome.

It is a digital version of a manuscript of a course about the theory of errors given by the Engineer-in-Chief Philippe Hottier at the '80s, at the French National School of Geographic Sciences. The course gives the foundation of the method of the least squares for the case of linear models.
Category: Statistics

[267] viXra:2007.0240 [pdf] submitted on 2020-07-30 21:01:00

An Alternative Model of Probability Theory

Authors: D Williams
Comments: 15 Pages.

An alternative model of probability theory is give and compared with the standard version. Difficulties in extending the Central Limit Theorem for sums of random variables (rather than averages) are shown then resolved using the new model and dx-less integrals. Some new types of sample means are proposed and tested against the standard version.
Category: Statistics

[266] viXra:2006.0023 [pdf] submitted on 2020-06-03 09:40:46

Causal Inference for COVID-19 Interventions

Authors: Vikas Ramachandra
Comments: 14 Pages.

The exponential spread of the COVID-19 pandemic has caused countries to impose drastic measures on the public including social distancing, movement restrictions and lockdowns. These government interventions have led to different mobility patterns for the populations. We propose a method of causal inference using community mobility datasets to determine the treatment effects of government interventions on population mobility related outcomes. We first identify the changepoint based on the data of government interventions. We also perform changepoint detection to verify that there is indeed a changepoint at the time of intervention. Then we estimate the mobility trends using a Bayesian structural causal model and project the counterfactual. This is compared to the actual values after interventions to give the treatment effect of interventions. As a specific example, we analyze mobility trends in India before and after interventions. Our analysis shows that there are significant changes in mobility due to government interventions. Our paper aims to provide insights into changes in response to government measures and we hope that it is helpful to those making critical decisions to combat COVID-19.
Category: Statistics

[265] viXra:2006.0014 [pdf] submitted on 2020-06-01 12:18:06

Conditio Sine Qua Non

Authors: Ilija Barukčić
Comments: 27 pages. (C) Ilija Barukčić, 2020, Jever, Germany. All rights reserved.

Aims: Different processes or events which are objectively given and real are equally one of the foundations of human life (necessary conditions) too. However, a generally accepted, logically consistent (bio)-mathematical description of these natural processes is still not in sight. Methods: Discrete random variables are analysed. Results: The mathematical formula of the necessary condition is developed. The impact of study design on the results of a study is considered. Conclusion: Study data can be analysed for necessary conditions.
Category: Statistics

[264] viXra:2005.0215 [pdf] submitted on 2020-05-21 20:15:49

Introduction to Neutrosophic Statistics Translated Arabic Version مقدمة في الاحصاء النيوتروسوفكي

Authors: Huda E. Khalid, Ahmed K. Essa
Comments: 167 Pages. ISBN: 978-1-59973-906-9

على الرغم من أن الإحصاء النيوتروسوفكي قد تم تعريفه منذ العام 1996 ، ثم نشر في عام 1998 بالكتاب المعنون " النيوتروسوفيا/ المنطق، المجموعة والاحتمالية النيوتروسوفكية" إلاّ انه لم ينل حظاً من الاهتمام والتطور إلى يومنا هذا. وكذلك كان الحال مع الاحتمالية النيوتروسوفكية، باستثناء بعض المقالات المتفرقة التي حظيت بتطور بسيط لا يكاد يرتقي لشمولية الفكرة التي تقوم عليها ، وقد نشرت عام 2013 ضمن الكتاب المعنون " مقدمة في القياس، التكامل والاحتمالية النيوتروسوفكية". يعد الإحصاء النيوتروسوفكي مفهوماً موسعاً للإحصاء التقليدي (الكلاسيكي)، إذ يتم فيه التعامل مع قيم ذات مجموعات بدلاً عن قيم هشة ، بحيث يكون من السهل في اغلب المعادلات والصيغ الإحصائية التقليدية استبدال عدَّة أعداد بمجاميع . أي أن العمليات ستجري على المجاميع بدلاً من إجراء العمليات على الأعداد ، وسيتم ذلك باستخدام المعلمات غير المعينة (غير الدقيقة، التي فيها لاتأكيد ، وحتى تلك التي تكون مجهولة تماماً) بدلاً من استخدام المعلمات الطبيعية المتعارف عليها في الإحصاء التقليدي.
Category: Statistics

[263] viXra:2005.0182 [pdf] submitted on 2020-05-17 18:03:58

Estimated Life Expectancy Impact of Sars-Cov-2 Infection on the Entire German Population

Authors: Tobias Martens, Wieland Lühder
Comments: 3 Pages.

The life expectancy of the currently living German population is calculated per age and as weighted average. The same calculation is repeated after considering everyone infected with and potentially killed by SARS-CoV-2 within one year, given the current age-dependent lethality estimates from a study at London Imperial College [1]. For an average life expectancy of 83.0 years in the current population, the reduction due to SARS-CoV-2 infection amounts to 2.0 (1.1-3.9) months. The individual values show a maximum of 7.7 (4.4-15.2) months for a 70-year-old. People below age 50 loose less than 1 month in average.
Category: Statistics

[262] viXra:2004.0452 [pdf] submitted on 2020-04-19 11:42:28

Multiple Sclerosis is Caused by an Epstein Bar Virus Infection

Authors: Ilija Barukčić
Comments: 17 Pages. (C) Ilija Barukčić, 2019, Jever, Germany. All rights reserved.

Aim: The relationship between Epstein-Barr virus and multiple sclerosis is assessed once again in order to gain a better understanding of this disease. Methods: A systematic review and meta-analysis is provided aimed to answer among other the following question. Is there a cause effect relationship between Epstein-Barr virus and multiple sclerosis? The conditio sine qua non relationship proofed the hypothesis without an Epstein-Barr virus infection no multiple sclerosis. The mathematical formula of the causal relationship k proofed the hypothesis of a cause effect relationship between Epstein-Barr virus infection and multiple sclerosis. Significance was indicated by a p-value of less than 0.05. Results: The data of the studies analysed provide evidence that an Epstein-Barr virus infection is a necessary condition (a conditio sine qua non) of multiple sclerosis. In particular and more than that. The data of the studies analysed provided impressive evidence of a cause-effect relationship between Epstein-Barr virus infection and multiple sclerosis. Conclusion: Multiple sclerosis is caused by an Epstein-Barr virus infection.
Category: Statistics

[261] viXra:2004.0425 [pdf] submitted on 2020-04-17 13:15:53

Automatic Tempered Posterior Distributions for Inverse Problems

Authors: Luca Martino
Comments: 7 Pages.

We propose a new Monte Carlo technique for Bayesian inversion problem. The power of the noise perturbation in the observation model is also estimated jointly with the rest of parameters. Moreover, it is also used as a tempered parameter. Hence, a sequence of tempered posterior densities is considered where the tempered parameter is automatically selected according to the actual estimation of the power of the noise perturbation.
Category: Statistics

[260] viXra:2004.0060 [pdf] submitted on 2020-04-02 23:05:18

Study on the Average Speed of a Particle Swarm Derived from Particles with the Same Speed and Random Directions in Space

Authors: Tao Guo
Comments: 11 Pages.

It has been more than 100 years since the advent of special relativity, but the reasons behind the related phenomena are still unknown. This article aims to inspire people to think about such problems. With the help of Mathematica software, I have proven the following problem by means of statistics: In 3-dimensional Euclidean space, for point particles whose speeds are c and whose directions are uniformly distributed in space (assuming these particles’ reference system is R0 if their average velocity is 0), when some particles (assuming their reference system is Ru), as a particle swarm, move in a certain direction with a group speed u (i.e., the norm of the average velocity) relative to R0, their (or the sub-particle swarm’s) average speed relative to Ru is slower than that of particles (or the same scale sub-particle swarm) in R0 relative to R0. The degree of slowing depends on the speed u of Ru and accords with the quantitative c2 −u2 relationship described by the Lorentz factor.
Category: Statistics

[259] viXra:2003.0340 [pdf] submitted on 2020-03-16 13:55:18

Modeling and Projecting Offensive Value Using Combined Hit-Tracking and Speed Measurements

Authors: Glenn Healey
Comments: 20 Pages.

Outcome-based statistics for representing batter and pitcher skill have been shown to have a low degree of repeatability due to the effects of multiple confounding variables such as the defense, weather, and ballpark. Statistics derived from pitch and hit-tracking data acquired by the Statcast system have been shown to provide greater repeatability and predictive value than outcome-based statistics. The wOBA cube representation uses three-dimensional hit-tracking data to compute intrinsic batted ball statistics for batters and pitchers. While providing more reliable measures than outcome-based statistics, this representation also revealed that running speed is an important determinant of batter success. We address this issue by building a four-dimensional model for a batted ball's value as a function of its physical contact parameters and the batter's time-to-first speed.
Category: Statistics

[258] viXra:2002.0368 [pdf] submitted on 2020-02-19 13:31:17

The Risk Ratio is Logically Inconsistent

Many different measures of association are used by medical literature, the relative risk is one of these measures. However, to judge whether results of studies are reliable, it is essential to use among other measures of association which are logically consistent. In this paper, we will present how to deal with one of the most commonly used measures of association, the relative risk. The conclusion is inescapable that the relative risk is logically inconsistent and should not be used any longer.
Category: Statistics

[257] viXra:2001.0650 [pdf] submitted on 2020-01-29 12:50:56

An Example of The Use of The Least-Squares Method

Authors: Abdelmajid Ben Hadj Salem
Comments: 9 Pages. In French.

In this paper, we present an example of the use of the least-squares method in topographic and surveying works.
Category: Statistics

[256] viXra:2001.0052 [pdf] submitted on 2020-01-04 16:39:29

Marginal Likelihood Computation for Model Selection and Hypothesis Testing: an Extensive Review

Authors: F. Llorente, L. Martino, D. Delgado, J. Lopez-Santiago
Comments: 58 Pages.

This is an up-to-date introduction to, and overview of, marginal likelihood computation for model selection and hypothesis testing. Computing normalizing constants of probability models (or ratio of constants) is a fundamental issue in many applications in statistics, applied mathematics, signal processing and machine learning. This article provides a comprehensive study of the state-of-the-art of the topic. We highlight limitations, benets, connections and differences among the dierent techniques. Problems and possible solutions with the use of improper priors are also described. Some of the most relevant methodologies are compared through theoretical comparisons and numerical experiments.
Category: Statistics

[255] viXra:2001.0003 [pdf] submitted on 2020-01-01 06:38:26

Probability Models and Ultralogics

Authors: Robert A. Herrmann
Comments: 10 Pages.

In this paper, we show how nonstandard consequence operators, ultralogics, can generate the general informational content displayed by probability models. In particular, a model that states a specific probability that an event will occur and those models that use a specific distribution to predict that an event will occur. These results have many diverse applications and even apply to the collapse of the wave function.
Category: Statistics

[254] viXra:1912.0129 [pdf] submitted on 2019-12-06 12:00:41

Statins and Death Due to Any Cause – All Doubts Removed?

Objective: To date, it is quite common to claim that some patient groups benefit from statin therapy in both primary and secondary prevention of cardiovascular disease while equally the use of higher-intensity statin therapies is emphasized. In this Review, the efficacy of statin therapy in light of the study data available is explored. Methods: All in all, 40 studies with a sample size of n = 88388 were re-analyzed. The exclusion relationship was used to test the null-hypothesis: a certain statin does exclude death due to any cause. The causal relationship k was used to test the data for causality. The level of significance was set to Alpha = 0,05. Results: The data of the studies reanalyzed provide convincing evidence that statins unfortunately do not exclude death due to any cause. An immediate statin therapy discontinuation should be considered. Conclusions: Overwhelming evidence suggests that the risk potential harmful effects of statin therapy far outweigh any real or perceived benefit. Keywords: Statins, death, causal relationship. Barukcic@t-online.de
Category: Statistics

[253] viXra:1911.0237 [pdf] submitted on 2019-11-13 13:18:58

Human Cytomegalovirus is the Cause of Essential Hypertension

Objective: To our knowledge, no study has provided strict evidence of a clear relationship between a human cytomegalovirus (HCMV) infection and human essential hypertension (EH). Methods: To examine the possible role of HCMV in the etiology of EH, a literature searched through the electronic database PubMed was performed. Data were accurately assessed and re-analyzed by new statistical methods. Results: The meta-analysis results of this study provide evidence that HCMV infection and essential hypertension are connected. Conclusions: Without HCMV infection no EH. Keywords: Human cytomegalovirus, essential hypertension, causal relationship.
Category: Statistics

[252] viXra:1911.0184 [pdf] submitted on 2019-11-10 04:05:32

Without a Varicella Zoster Virus Infection, no Schizophrenia

Objective: Despite decades of research and major efforts, a cause or the cause of schizophrenia is still not identified. Although many studies indicate that infectious agents are related to schizophrenia no definite consensus has been reached on this issue. Methods: The purpose of this study was to investigate relationship between varicella zoster virus (VZS) and schizophrenia while relying on new statistical methods. Results: The meta-analysis results provide striking evidence that VZV is a necessary condition of schizophrenia. Conclusions: There is some weak evidence that VZV infection is the cause of schizophrenia. Keywords: Varicella zoster virus, schizophrenia, causal relationship.
Category: Statistics

[251] viXra:1911.0024 [pdf] submitted on 2019-11-01 11:17:10

The P Value of Likely Extreme Events

Authors: Ilija Barukčić
Comments: 21 pages.

Objective: Sometimes there are circumstances where it is necessary to calculate the P Value of extremely events xt like p(xt) = 1 while reliable methods are rare. Methods: A systematic approach to the problem of the P Values of extremely events is provided. Results: New theorems for calculating P Values of extremely likely events are developed. Conclusions: It is possible to calculate the P Values even of extreme events. E-mail: Barukcic@t-online.de Keywords: P Value, likely events, cause, effect, causal relationship.
Category: Statistics

[250] viXra:1910.0656 [pdf] submitted on 2019-10-31 17:01:19

On Maximum Likelihood Estimates for the Shape Parameter of the Generalized Pareto Distribution

Authors: Kouider Mohammed Ridha
Comments: 6 Pages.

The general Pareto distribution (GPD) has been widely used a lot in the extreme value for example to model exceedance over a threshold. Feature of The GPD that when applied to real data sets depends substantially and clearly on the parameter estimation process. Mostly the estimation is preferred by maximum likelihood because have a consistent estimator with lowest bias and variance. The objective of the present study is to develop efficient estimation methods for the maximum likelihood estimator for the shape parameter or extreme value index. Which based on the numerical methods for maximizing the log-likelihood by introduce an algorithm for computing maximum likelihood estimate of The GPD parameters. Finally, a numerical examples are given to illustrate the obtained results, they are carried out to investigate the behavior of the method
Category: Statistics

Replacements of recent Submissions

[157] viXra:2601.0127 [pdf] replaced on 2026-02-08 00:15:36

Neutrinos De-mystified[:] Do Energetic, Negligible Mass Neutrinos Violate Einstein's E=mCC?

Authors: Carl Littmann
Comments: 8 Pages.

Einstein’s Relativity Theory emphasizes that "if a body radiates a given amount of Energy, that emitting body loses a Mass equal to that emitted Energy divided by the speed of light squared". But if that lost mass can’t be fully found by adding up all the resulting products, including negligible-mass high-energy Neutrinos; where did that mass go? My paper asserts that the lost (hidden) mass was ‘injected’ into the ‘aether’, increasing aether’s mass. As Einstein even said, in 1930, "Space is Eating-Up matter!" I use that "Einstein Statement" to estimate a minimum mass density of aether in Space, i.e., a key estimate but still likely much too low. And I also show that Neutrino propagation is likely an Ethereal Pulse or Stress, like a Twisting Spring Pulse (wave), instead of a forward or backward pulse. Thus, not likely a Particle mass flying through space, like a bullet or ‘baseball’. And I give more details, and address related questions.
Category: Statistics

[156] viXra:2601.0065 [pdf] replaced on 2026-02-24 22:06:15

Importance Sampling and Contrastive Learning Schemes for Parameter Estimation in Non-Normalized Models

Authors: L. Martino, L. Scaffidi, S. Mangano
Comments: 30 Pages.

[155] viXra:2511.0110 [pdf] replaced on 2026-01-25 15:40:45

A Kinetic Route to the Lorentz Transform and Beyond

Authors: Jayanta Majumder
Comments: 9 Pages.

We model an elementary particle as a closed, lightlike intrinsic motion with rest-cycle period $tau$ that can undergo bodily translation without ever exceeding speed~$c$. A local triangle construction and cycle averaging yield the Pythagorean relation $T^{2}=tau^{2}+(x/c)^{2}$, where $x$ is the net spatial advance of the wavefront over one intrinsic cycle. Interpreting the exchange between intrinsic cycling ($T$) and bodily shift ($x/c$) as a symmetric two-channel kinetics with rate $k(t)$ integrates to a hyperbolic rotation (Lorentz boost) with rapidity $phi=int k,dt$ and $v/c=tanhphi$. In the small-signal limit this identifies $k=F/(mc)$, linking the kinetic picture to Newton's second law while the $tanh$ nonlinearity enforces the $c$ bound. We also give a physical reading of emph{relative rapidity} as the net logarithmic bias needed to map between motion states.
Category: Statistics

[154] viXra:2510.0077 [pdf] replaced on 2025-12-20 02:02:16

Monty-hall Theorem Bayes-price Rule (Bayes Theorem) for a Three Parameter Event Space

Authors: Keshava Prasad Halemane
Comments: 11 Pages. 2 Tables

This research report presents the statement of the Monty-Hall Theorem and provides a constructive proof by solving the classical Monty-Hall Problem. It establishes the fact that the probability of winning the prize is indeed unaffected by a switched-choice — very much unlike the most prevalent and widely accepted position held by the Leading Subject-Matter-Experts.
Category: Statistics

[153] viXra:2510.0001 [pdf] replaced on 2026-01-25 11:33:37

Sampling from Mixtures with Negative Weights: Application to Density Approximation by Gaussian Processes

Authors: L. Martino
Comments: 17 Pages.

Mixtures of probability densities are widely used in statistics and machine learning. While classical mixtures restrict weights to be non-negative, allowing negative weights enables more flexible density approximation. However, negative weights introduce challenges in handling and sampling such distributions. For this purpose, we propose efficient Monte Carlo (MC) methods (including MC quadratures, rejection sampling and importance sampling schemes) for computing integrals and generating samples from these mixtures. A tailored proposal density ensures accurate and efficient generation of (unweighted) samples. Applications in Gaussian process-based density estimation demonstrate the practical relevance and efficiency of proposed schemes.
Category: Statistics

[152] viXra:2508.0163 [pdf] replaced on 2026-02-05 21:18:48

Fast Resampling for Sequential Monte Carlo with Millions of Particles

Authors: L. Martino, V. Elvira
Comments: 31 Pages.

Particle filtering (PFs) and, more generally, sequential Monte Carlo (SMC) methods are essential tools for Bayesian inference. Over the years, many SMC variants have been proposed, yet their core always relies on importance sampling followed by a resampling step.While resampling is crucial to mitigate particle degeneracy and to maintain a stable approximation of the posterior distribution, it often represents a significant computational bottleneck.In this work, we present a novel, fast, resampling procedure that provides significant computational gains in demanding (often high-dimensional) scenarios where a large number of particles is required, and the effective sample size (ESS) is small.The effectiveness of the proposed approach is demonstrated through a series of numerical experiments showing remarkable performance. In addition, a theoretical analysis and related code implementation are provided.
Category: Statistics

[151] viXra:2502.0136 [pdf] replaced on 2025-03-04 07:38:31

Logarithm of Exponential and Cauchy Random Variables

Authors: Josef Bukac
Comments: 5 Pages.

[150] viXra:2406.0055 [pdf] replaced on 2025-04-08 18:36:58

A Note on the Area Under the Likelihood and the Fake Evidence for Model Selection

Authors: L. Martino, F. Llorente
Comments: 29 Pages.

Improper priors are not allowed for the computation of the Bayesian evidence Z = p(y) (a.k.a., marginal likelihood), since in this case Z is not completely specified due to an arbitrary constant involved in the computation. However, in this work, we remark that they can be employed in a specific type of model selection problem: when we have several (possibly infinite) models belonging to the same parametric family (i.e., for tuning parameters of a parametric model). However, the quantities involved in this type of selection cannot be considered as Bayesian evidences: we suggest to use the name "fake evidences" (or "areas under the likelihood" in the case of uniform improper priors). We also show that, in this model selection scenario, using a use prior and increasing its scale parameter asymptotically to infinity, we cannot recover the value of the area under the likelihood, obtained with a uniform improper prior. We first discuss it from a general point of view. Then we provide, as an applicative example, all the details for Bayesian regression models with nonlinear bases, considering two cases: the use of a uniform improper prior and the use of a Gaussian prior, respectively. A numerical experiment is also provided confirming and checking all the previous statements.
Category: Statistics

[149] viXra:2406.0055 [pdf] replaced on 2024-06-26 08:31:05

A Note on the Area Under the Likelihood and the Fake Evidence for Model Selection

Authors: L. Martino, F. Llorente
Comments: 22 Pages.

Improper priors are not allowed for the computation of the Bayesian evidence Z = p(y) (a.k.a., marginal likelihood), since in this case Z is not completely specified due to an arbitrary constant involved in the computation. However, in this work, we remark that they can be employed in a specific type of model selection problem: when we have several (possibly infinite) models belonging to the same parametric family (i.e., for tuning parameters of a parametric model). However, the quantities involved in this type of selection cannot be considered as Bayesian evidences: we suggest to use the name "fake evidences" (or "areas under the likelihood" in the case of uniform improper priors). We also show that, in this model selection scenario, using a use prior and increasing its scale parameter asymptotically to infinity, we cannot recover the value of the area under the likelihood, obtained with a uniform improper prior. We first discuss it from a general point of view. Then we provide, as an applicative example, all the details for Bayesian regression models with nonlinear bases, considering two cases: the use of a uniform improper prior and the use of a Gaussian prior, respectively. A numerical experiment is also provided confirming and checking all the previous statements.
Category: Statistics

[148] viXra:2310.0032 [pdf] replaced on 2024-12-04 19:15:45

Adaptive Posterior Distributions for Uncertainty Analysis of Covariance Matrices in Bayesian Inversion Problems for Multioutput Signals

Authors: E. Curbelo, L. Martino, F. Llorente, D. Delgado-Gomez
Comments: 45 Pages.

In this paper we address the problem of performing Bayesian inference for the parameters of a nonlinear multioutput model and the covariance matrix of the different output signals. We proposean adaptive importance sampling (AIS) scheme for multivariate Bayesian inversion problems, which is based in two main ideas: the variables of interest are split in two blocks and the inferencetakes advantage of known analytical optimization formulas. We estimate both the unknown parameters of the multivariate non-linear model and the covariance matrix of the noise. In the first part of the proposed inference scheme, a novel AIS technique called adaptive target adaptive importance sampling (ATAIS) is designed, which alternates iteratively between an IS technique over the parameters of the non-linear model and a frequentist approach for the covariance matrix of the noise. In the second part of the proposed inference scheme, a prior density over the covariance matrix is considered and the cloud of samples obtained by ATAIS are recycled and re-weighted to obtain a complete Bayesian study over the model parameters and covariance matrix. ATAIS is the main contribution of the work. Additionally, the inverted layered importance sampling (ILIS) is presented as a possible compelling algorithm (but based on a conceptually simpler idea). Different numerical examples show the benefits of the proposed approaches.
Category: Statistics

[147] viXra:2310.0032 [pdf] replaced on 2024-08-04 20:54:26

Adaptive Posterior Distributions for Uncertainty Analysis of Covariance Matrices in Bayesian Inversion Problems for Multioutput Signals

Authors: E. Curbelo, L. Martino, F. Llorente, D. Delgado-Gomez
Comments: 45 Pages.

[146] viXra:2310.0032 [pdf] replaced on 2024-02-06 21:09:42

Adaptive Posterior Distributions for Uncertainty Analysis of Covariance Matrices in Bayesian Inversion Problems for Multioutput Signals

Authors: E. Curbelo, L. Martino, F. Llorente, D. Delgado-Gomez
Comments: 28 Pages.

[145] viXra:2307.0056 [pdf] replaced on 2025-01-21 20:45:14

A Machine Learning Algorithm for the Quantiu2002cation and Uncertainty Analysis of the Number of Spinal Microglia Trainable in Small and Heterogeneous Datasets

Authors: L. Martino, M. M. Garcia, P. S. Paradas, E. Curbelo
Comments: 25 Pages.

Counting immunopositive cells on biological tissues generally requires either manual annotation or (when available) automatic rough systems, for scanning signal surface and intensity in whole slide imaging. In this work, we tackle the problem of counting microglial cells in lumbar spinal cord cross-sections of rats by omitting cell detection and focusing only on the counting task. Manual cell counting is however a time-consuming task, and additionally entails extensive personnel training. The classic automatic color-based methods roughly inform of total labeled area and intensity (protein quantification) but do not specifically provide information on cell number. Since the images to be analyzed have a high resolution but a huge amount of pixels contains just noise or artifacts, we first perform a preprocessing generating several filtered images. Then, we design an automatic kernel counter that is a non-parametric and non-linear method. The proposed scheme can be easily trained in small datasets since, in its basic version, it relies only on one hyper-parameter. However, being non-parametric and non-linear, the proposed algorithm is flexible enough to express all the information contained in rich and heterogeneous datasets as well (providing the maximum overfit if required). Furthermore, the proposed kernel counter also provides uncertainty estimation of the given prediction, and can directly tackle the case of receiving several expert opinions over the same image. Different numerical experiments with artificial and real datasets show very promising results. Related Matlab code is also provided.
Category: Statistics

[144] viXra:2209.0132 [pdf] replaced on 2023-06-14 09:56:05

Universal and Automatic Elbow Detection for Learning the Effective Number of Components in Model Selection Problems

Authors: E. Morgado, L. Martino, R. San Millán-Castillo
Comments: 25 Pages. (to appear) Digital Signal Processing, 2023

We design a Universal Automatic Elbow Detector (UAED) for deciding the effective number of components in model selection problems. The relationship with the information criteria widely employed in the literature is also discussed. The proposed UAED does not require the knowledge of a likelihood function and can be easily applied in diverse applications, such as regression and classification, feature and/or order selection, clustering, and dimension reduction. Several experiments involving synthetic and real data show the advantages of the proposed scheme with benchmark techniques in the literature.
Category: Statistics

[143] viXra:2209.0132 [pdf] replaced on 2023-06-06 15:06:55

Universal and Automatic Elbow Detection for Learning the Effective Number of Components in Model Selection Problems

Authors: E. Morgado, L. Martino, R. San Millán-Castillo
Comments: 25 Pages. (to appear) Digital Signal Processing, 2023.

[142] viXra:2209.0132 [pdf] replaced on 2022-10-11 11:49:06

Universal and Automatic Elbow Detection for Learning the Effective Number of Components in Model Selection Problems

Authors: E. Morgado, L. Martino, R. San Millán-Castillo
Comments: 15 Pages.

We design a universal automatic elbow detector (UAED) for deciding effective number of components in model selection problems. The relationship with the information criteria widely employed in the literature is also discussed. The proposed UAED does not requirethe knowledge of a likelihood function and can be easily applied in diverse applications, such as regression and classification, feature and/or order selection, clustering, dimension reduction etc. Several experiments involving synthetic and real data show the advantages ofthe proposed scheme with benchmark techniques in the literature.
Category: Statistics

[141] viXra:2209.0132 [pdf] replaced on 2022-10-09 09:46:48

Universal and Automatic Elbow Detection for Learning the Effective Number of Components in Model Selection Problems

Authors: E. Morgado, L. Martino, R. San Millán-Castillo
Comments: 15 Pages.

[140] viXra:2209.0123 [pdf] replaced on 2023-06-06 14:57:34

Spectral Information Criterion for Automatic Elbow Detection

Authors: L. Martino, R. San Millán-Castillo, E. Morgado
Comments: 22 Pages. (to appear) Expert Systems With Applications, 2023.

We introduce a generalized information criterion that contains other well-known information criteria, such as Bayesian information Criterion (BIC) and Akaike information criterion (AIC), as special cases. Furthermore, the proposed spectral information criterion(SIC) is also more general than the other information criteria, e.g., since the knowledge of a likelihood function is not strictly required. SIC extracts geometric features of the error curve and, as a consequence, it can be considered an automatic elbow detector. SIC provides a subset of all possible models, with a cardinality that often is much smaller than the total number of possible models. The elements of this subset are elbows" of the error curve. A practical rule for selecting a unique model within the sets of elbows is suggested as well. Theoretical invariance properties of SIC are analyzed. Moreover, we test SIC in ideal scenarios where provides always the optimal expected results. We also test SIC inseveral numerical experiments: some involving synthetic data, and two experiments involving real datasets. They are all real-world applications such as clustering, variable selection, or polynomial order selection, to name a few. The results show the benefits of the proposed scheme. Matlab code related to the experiments is also provided. Possible future research lines are finally discussed.
Category: Statistics

[139] viXra:2204.0074 [pdf] replaced on 2022-07-17 15:30:07

Matter Theory on EM field

Authors: Sheng-Ping Wu
Comments: 12 Pages.

This article try to unified the four basic forces by Maxwell equations, the only experimental theory. Self-consistent Maxwell equations with the e-current coming from matter current is proposed, and is solved to electrons and the structures of particles and atomic nucleus. The static properties and decay are reasoned, all meet experimental data. The equation of general relativity sheerly with electromagnetic field is discussed as the base of this theory. In the end the conformation elementarily between this theory and QED and weak theory is discussed.
Category: Statistics

[138] viXra:2201.0152 [pdf] replaced on 2022-02-13 20:15:49

Forensic Analysis of Lucy I and Lucy II

Authors: Robert Bennett
Comments: 6 Pages.

A quantitative test for the probability that two sets of photos are of the same woman. The result for 7 facial characteristics in each photo is that the odds are 13 million to 1 that Lucy I and Lucy II are the same person.
Category: Statistics

[137] viXra:2112.0158 [pdf] replaced on 2022-07-17 10:28:12

An Exhaustive Variable Selection Study for Linear Models of Soundscape Emotions: Rankings and Gibbs Analysis

Authors: R. San Millán-Castillo, L. Martino, E. Morgado, F. Llorente
Comments: 26 Pages. (to appear)) IEEE Transactions on Audio, Speech and Language Processing

[136] viXra:2111.0145 [pdf] replaced on 2025-03-23 16:12:27

Effective Sample Size Approximations as Entropy Measures

Authors: L. Martino, V. Elvira
Comments: 31 Pages.

In this work, we analyze alternative effective sample size (ESS) metrics for importance sampling algorithms, and discuss a possible extended range of applications. We show the relationship between the ESS expressions used in the literature and two entropy families, the Renyi and Tsallis entropy. The Renyi entropy is connected to the Huggins-Roy's ESS family introduced in [22]. We prove that that all the ESS functions included in the Huggins-Roy's family fulfill all the desirable theoretical conditions. We analyzed and remark the connections with several other fields, such as the Hill numbers introduced in ecology, the Gini inequality coefficient employed in economics, and the Gini impurity index used mainly in machine learning, to name a few. Finally, by numerical simulations, we study the performance of different ESS expressions contained in the previous ESS families in terms of approximation of the theoretical ESS definition, and show the application of ESS formulas in a variable selection problem.
Category: Statistics

[135] viXra:2111.0145 [pdf] replaced on 2024-12-04 18:58:55

Effective Sample Size Approximations as Entropy Measures

Authors: L. Martino, V. Elvira
Comments: 17 Pages.

In this work, we analyze alternative effective sample size (ESS) measures for importance sampling algorithms. We show the relationship between the ESS expressions used in the literature and two entropy families, the Renyi and Tsallis entropy. The Renyi entropy is connected to the Huggins-Roy's ESS family introduced in [12]. We prove that that all the ESS functions included in the Huggins-Roy's family fulfill all the desirable theoretical conditions. Moreover, we show that the Gini impurity index can be converted in a proper ESS formula. We also highlight its connection with the Tsallis entropy. Finally, by numerical simulations, we study the performance of different ESS expressions contained in the previous ESS families in term of approximation of the theoretical ESS definition.
Category: Statistics

[134] viXra:2110.0032 [pdf] replaced on 2022-06-10 12:08:11

On the Safe Use of Prior Densities for Bayesian Model Selection

Authors: F. Llorente, L. Martino, E. Curbelo, J. Lopez-Santiago, D. Delgado
Comments: 38 Pages.

The application of Bayesian inference for the purpose of model selection is very popular nowadays. In this framework, models are compared through their marginal likelihoods, or their quotients, called Bayes factors. However, marginal likelihoods depends on the prior choice. For model selection, even di use priors can be actually very informative, unlike for the parameter estimation problem. Furthermore, when the prior is improper, the marginal likelihood of the corresponding model is undetermined. In this work, we discuss the issue of prior sensitivity of the marginal likelihood and its role in model selection. We also comment on the use of uninformative priors, which are very common choices in practice. Several practical suggestions are discussed and many possible solutions, proposed in the literature, to design objective priors for model selection are described. Some of them also allow the use of improper priors. The connection between the marginal likelihood approach and the well-known information criteria is also presented. We describe the main issues and possible solutions by illustrative numerical examples, providing also some related code. One of them involving a real-world application on exoplanet detection.
Category: Statistics

[133] viXra:2110.0032 [pdf] replaced on 2022-05-11 12:51:09

On the Safe Use of Prior Densities for Bayesian Model Selection

Authors: F. Llorente, L. Martino, E. Curbelo, J. Lopez-Santiago, D. Delgado
Comments: 38 Pages.

[132] viXra:2110.0032 [pdf] replaced on 2022-03-23 12:51:19

On the Safe Use of Prior Densities for Bayesian Model Selection

Authors: F. Llorente, L. Martino, E. Curbelo, J. Lopez-Santiago, D. Delgado
Comments: 34 Pages.

[131] viXra:2110.0032 [pdf] replaced on 2021-11-07 07:59:55

On the Safe Use of Prior Densities for Bayesian Model Selection

Authors: F. Llorente, L. Martino, E. Curbelo, J. Lopez-Santiago, D. Delgado
Comments: 25 Pages.

The application of Bayesian inference for the purpose of model selection is very popular nowadays. In this framework, models are compared through their marginallikelihoods, or their quotients, called Bayes factors. However, marginal likelihoods show strong dependence on the prior choice, even when the data are very informative, unlike the posterior distribution. Furthermore, when the prior is improper, the marginal likelihood of the corresponding model is undetermined. In this work, we aim to raise awareness about the issue of prior sensitivity of the marginal likelihood and its role in model selection. We also comment on the use of uninformative priors, which are very common choices in practice. Several practical suggestions are provided and possible solutions allowing the use of improper priors are discussed. The connection between the marginal likelihood approach and the well- known information criteria is also presented. We describe all the issues and possible solutions by illustrative numerical examples (providing some related code). One of them involving a real-world application on exoplanet detection.
Category: Statistics

[130] viXra:2109.0178 [pdf] replaced on 2022-01-13 03:48:54

Optimality in Noisy Importance Sampling

Authors: F. Llorente, L. Martino, J. Read, D. Delgado
Comments: 14 Pages. Signal Processing, Volume 194, 2022, 108455 - doi:10.1016/j.sigpro.2022.108455

[129] viXra:2105.0051 [pdf] replaced on 2025-04-09 07:22:40

Lorentz Transformation of Gravitational Field

Authors: Zhi Cheng
Comments: 7 Pages.

The transformation relationship between different reference systems follows the principle of Lorentz transformation. In general relativity, the curvature of space-time caused by mass is regarded as a non-inertial frame of reference, so there is also a problem of frame of reference transformation. If the concept of virtual space-time is introduced, we can see that the existence of gravity can also introduce the Lorentz transformation relationship, so that we can deal with the problem of gravity in a simpler way. This article analyzes the static gravitational field, and obtains a result consistent with the Schwarzschild solution of relativity in a more concise way.
Category: Statistics

[128] viXra:2011.0183 [pdf] replaced on 2021-01-29 20:41:23

Statistical Analysis of the Presidential Elections in Belarus in 2020

Authors: Sergey L. Cherkas
Comments: 7 Pages.

Usually, one wants to have a simple picture of the trustworthiness of the main elections result. However, in some situations only partial information about the elections is available. Here we suggest some criterion of comparing of the available information with the official results. One of the criterions consists in comparison of the mean value over available sample with the official mean value. A Monte Carlo simulation is performed to calculate a probability of the difference between the average value in some random sample and the average over the total set. Another method is an analysis of the nature of the peculiarities in the probability distribution functions consisting in comparison of the probability distribution functions for the percentage and the number of voters for Mr. Lukashenko in each polling station. The last criterion is rather esthetic than exposing. It could be applied to arbitrary elections systems such as United Kingdom or United States if one wants to extract the main result in a few pictures.
Category: Statistics

[127] viXra:2011.0183 [pdf] replaced on 2020-12-02 10:52:49

Statistical Analysis of the Presidential Elections in Belarus in 2020

Authors: Sergey L. Cherkas
Comments: 4 Pages. Both English and Russian versions of the paper

A Monte Carlo simulation is performed to calculate а probability of the difference between the average value in some random sample and the average over the total set. Method of analysis of the nature of the peculiarities in the probability distribution functions is suggested. The method consists of a comparison of the probability distribution functions for the percentage and the number of voters for Mr. Lukashenko in each polling station.
Category: Statistics

[126] viXra:2011.0183 [pdf] replaced on 2020-12-01 11:48:00

Statistical Analysis of the Presidential Elections in Belarus in 2020

Authors: Sergey L. Cherkas
Comments: 4 Pages. Both English and Russian versions of the paper

A Monte Carlo simulation is performed to calculate а probability of the difference between the average value in some random sample and the average over the total set. Method of analysis of the nature of the peculiarities in the probability distribution functions is suggested. The method consists of a comparison of the probability distribution functions for the percentage and the number of voters for Mr. Lukashenko in each polling station.
Category: Statistics

[125] viXra:2009.0135 [pdf] replaced on 2021-07-11 15:23:07

A Joint Introduction to Gaussian Processes and Relevance Vector Machines with Connections to Kalman Filtering and Other Kernel Smoothers

Authors: L. Martino, J. Read
Comments: L. Martino, J. Read, "A Joint introduction to Gaussian Processes and Relevance Vector Machines with Connections to Kalman filtering and other Kernel Smoothers", Information Fusion, Volume 74, Pages 17-38, 2021

[124] viXra:2009.0135 [pdf] replaced on 2021-03-24 18:50:49

A Joint Introduction to Gaussian Processes and Relevance Vector Machines with Connections to Kalman Filtering and Other Kernel Smoothers

Authors: L. Martino, J. Read
Comments: 52 Pages. (to appear) Information Fusion

[123] viXra:2009.0135 [pdf] replaced on 2020-09-21 16:45:05

Joint Introduction to Gaussian Processes and Relevance Vector Machines with Connections to Kalman Filtering and Other Kernel Smoothers

Authors: L. Martino, J. Read
Comments: 50 Pages.

[122] viXra:2004.0425 [pdf] replaced on 2021-02-27 09:46:18

Automatic Tempered Posterior Distributions for Bayesian Inversion Problems

Authors: L. Martino, F. Llorente, E. Curbelo, J. Lopez-Santiago, J. Miguez
Comments: 18 Pages.

We propose a novel adaptive importance sampling scheme for Bayesian inversion problems where the inference of the variables of interest and the power of the data noise is split. More specifically, we consider a Bayesian analysis for the variables of interest (i.e., the parameters of the model to invert), whereas we employ a maximum likelihood approach for the estimation of the noise power. The whole technique is implemented by means of an iterative procedure, alternating sampling and optimization steps. Moreover, the noise power is also used as a tempered parameter for the posterior distribution of the the variables of interest. Therefore, a sequence of tempered posterior densities is generated, where the tempered parameter is automatically selected according to the actual estimation of the noise power. A complete Bayesian study over the model parameters and the scale parameter can be also performed. Numerical experiments show the benefits of the proposed approach.
Category: Statistics

[121] viXra:2004.0425 [pdf] replaced on 2020-09-06 08:24:05

Automatic Tempered Posterior Distributions for Bayesian Inversion Problems

Authors: L. Martino, J. Lopez-Santiago, J. Miguez
Comments: 17 Pages.

We propose a novel adaptive importance sampling scheme for Bayesian inversion problems where the inference of the variables of interest and the power of the data noise is split. More specifically, we consider a Bayesian analysis for the variables of interest (i.e., the parameters of the model to invert), whereas we employ a maximum likelihood approach for the estimation of the noise power. The whole technique is implemented by means of an iterative procedure, alternating sampling and optimization steps. Moreover, the noise power is also used as a tempered parameter for the posterior distribution of the the variables of interest. Therefore, a sequence of tempered posterior densities is generated, where the tempered parameter is automatically selected according to the actual estimation of the noise power. Numerical experiments show the benefits of the proposed approach.
Category: Statistics

[120] viXra:2004.0425 [pdf] replaced on 2020-09-03 11:41:36

Automatic Tempered Posterior Distributions for Bayesian Inversion Problems

Authors: L. Martino, J. Lopez-Santiago, J. Miguez
Comments: 17 Pages.

We propose a novel adaptive importance sampling scheme for Bayesian inversion problems where the inference of the variables of interest and the power of the data noise is split. More specifically, we consider a Bayesian analysis for the variables of interest (i.e., the parameters of the model to invert), whereas we employ a maximum likelihood approach for the estimation of the noise power. The whole technique is implemented by means of an iterative procedure, alternating sampling and optimization steps. Moreover, the noise power is also used as a tempered parameter for the posterior distribution of the the variables of interest. Therefore, a sequence of tempered posterior densities is generated, where the tempered parameter is automatically selected according to the actual estimation of the noise power. Numerical experiments show the benefits of the proposed approach.
Category: Statistics

[119] viXra:2004.0060 [pdf] replaced on 2020-04-22 09:01:21

Study on the Average Speed of a Particle Swarm Derived from Particles with the Same Speed and Random Directions in Space

Authors: Tao Guo
Comments: 14 Pages.

[118] viXra:2004.0060 [pdf] replaced on 2020-04-14 08:41:36

Study on the Average Speed of a Particle Swarm Derived from Particles with the Same Speed and Random Directions in Space

Authors: Tao Guo
Comments: 14 Pages.

[117] viXra:2002.0368 [pdf] replaced on 2020-02-29 11:03:38

The Realtive Risk Is Logically Inconsistent

[116] viXra:2001.0052 [pdf] replaced on 2021-02-06 13:32:52

Marginal Likelihood Computation for Model Selection and Hypothesis Testing: an Extensive Review

Authors: F. Llorente, L. Martino, D. Delgado, J. Lopez-Santiago
Comments: 91 Pages.

This is an up-to-date introduction to, and overview of, marginal likelihood computation for model selection and hypothesis testing. Computing normalizing constants of probability models (or ratio of constants) is a fundamental issue in many applications in statistics, applied mathematics, signal processing and machine learning. This article provides a comprehensive study of the state-of-the-art of the topic. We highlight limitations, benefits, connections and differences among the different techniques. Problems and possible solutions with the use of improper priors are also described. Some of the most relevant methodologies are compared through theoretical comparisons and numerical experiments.
Category: Statistics

[115] viXra:2001.0052 [pdf] replaced on 2020-05-18 05:13:39

Marginal Likelihood Computation for Model Selection and Hypothesis Testing: an Extensive Review

Authors: F. Llorente, L. Martino, D. Delgado, J. Lopez-Santiago
Comments: 58 Pages.

This is an up-to-date introduction to, and overview of, marginal likelihood computation for model selection and hypothesis testing. Computing normalizing constants of probability models (or ratio of constants) is a fundamental issue in many applications in statistics, applied mathematics, signal processing and machine learning. This article provides a comprehensive study of the state-of-the-art of the topic. We highlight limitations, benefits, connections and differences among the different techniques. Problems and possible solutions with the use of improper priors are also described. Some of the most relevant methodologies are compared through theoretical comparisons and numerical experiments.
Category: Statistics

[114] viXra:2001.0052 [pdf] replaced on 2020-05-15 16:58:59

Marginal Likelihood Computation for Model Selection and Hypothesis Testing: an Extensive Review

Authors: F. Llorente, L. Martino, D. Delgado, J. Lopez-Santiago
Comments: 58 Pages.

This is an up-to-date introduction to, and overview of, marginal likelihood computation for model selection and hypothesis testing. Computing normalizing constants of probability models (or ratio of constants) is a fundamental issue in many applications in statistics, applied mathematics, signal processing and machine learning. This article provides a comprehensive study of the state-of-the-art of the topic. We highlight limitations, benefits, connections and differences among the different techniques. Problems and possible solutions with the use of improper priors are also described. Some of the most relevant methodologies are compared through theoretical comparisons and numerical experiments.
Category: Statistics