Statistics

Previous months:
2010 - 1003(10) - 1004(7) - 1005(4) - 1006(1) - 1007(2) - 1008(4) - 1010(1) - 1011(1)
2011 - 1105(2) - 1107(1) - 1111(1) - 1112(1)
2012 - 1203(1) - 1204(2) - 1205(1) - 1208(1) - 1210(1) - 1211(6) - 1212(1)
2013 - 1301(1) - 1304(3) - 1306(1) - 1307(1) - 1310(2)
2014 - 1402(1) - 1403(3) - 1404(2) - 1405(2) - 1407(1) - 1409(4) - 1410(4) - 1411(13) - 1412(4)
2015 - 1503(1) - 1505(2) - 1506(2) - 1507(3) - 1508(3) - 1509(1) - 1511(2) - 1512(6)
2016 - 1601(6) - 1602(3) - 1603(4) - 1604(2) - 1605(1) - 1607(5) - 1608(1) - 1609(4) - 1610(1) - 1611(1) - 1612(2)
2017 - 1701(4) - 1702(3) - 1703(5) - 1704(11) - 1705(12) - 1706(8) - 1707(2) - 1708(2) - 1709(1) - 1710(3) - 1711(5) - 1712(6)
2018 - 1801(5) - 1802(3) - 1803(4) - 1804(4) - 1805(3) - 1806(5) - 1807(2) - 1808(1) - 1809(3) - 1810(5) - 1811(4) - 1812(2)
2019 - 1901(3) - 1903(1) - 1904(2) - 1905(4) - 1906(1) - 1907(2) - 1908(1) - 1909(1) - 1910(2) - 1911(3) - 1912(1)
2020 - 2001(4) - 2002(1) - 2003(1) - 2004(3) - 2005(2) - 2006(2) - 2007(1) - 2008(3) - 2009(2) - 2010(2) - 2011(2) - 2012(12)
2021 - 2101(3) - 2102(3) - 2103(4) - 2104(1) - 2106(2) - 2107(2) - 2109(1) - 2110(2) - 2111(3) - 2112(3)
2022 - 2201(1) - 2202(2) - 2204(2) - 2207(1) - 2209(2) - 2212(1)
2023 - 2301(1) - 2302(1) - 2303(1) - 2304(1) - 2305(1) - 2306(1) - 2307(1) - 2308(1) - 2309(1) - 2310(2) - 2311(1) - 2312(2)
2024 - 2402(2)

Recent submissions

Any replacements are listed farther down

[337] viXra:2402.0093 [pdf] submitted on 2024-02-18 11:04:32

Second Moment/order Approximations by Kernel Smoothers with Application to Volatility Estimation

Authors: L. Beleña, E. Curbelo, L. Martino, V. Laparra
Comments: 14 Pages.

Volatility estimation and quantile regression are relevant active research areas in statistics, machine learning and econometrics. In this work, we propose two procedures to estimate local variances in generic regression problems by using of kernel smoothers. The proposed schemes can be applied in multidimesional scenarios (not just for time series analysis) and easily in a multi-output framework, as well. Moreover, they allow the possibility of providing uncertainty estimation using a generic kernel smoother technique. Several numerical experiments show the benefits of the proposed methods, even comparing with benchmark techniques. One of these experiment involves a real dataset analysis.
Category: Statistics

[336] viXra:2402.0061 [pdf] submitted on 2024-02-12 07:13:17

Fit Probability Density Function Without Knowing the Form of Distribution

Authors: Dajun Chen
Comments: 2 Pages.

This paper proposes two methods for fitting probability density function only with samples from the distribution. The methods are inspired by Generative Adversarial Networks . The demos run in Pytorch and they are available on https://github.com/chendajunAlpha/Fit-probability-density-function
Category: Statistics

[335] viXra:2312.0089 [pdf] submitted on 2023-12-17 14:49:03

The Excess Mortality is Strongly Underestimated

Authors: Hans Lugtigheid
Comments: 15 Pages.

This article analyses the conjecture that excess mortality is underestimated with the pandemic.I use the numbers from the CBS (Dutch Central Bureau for Statistics) as an example. As a baseline we take the expected mortality for 2021 and 2022 from 2019. I correct this expected mortality with the estimated number of people who died in earlier years than expected because of the pandemic. For 2021 this correction is 8K. The CBS expects the mortality to be almost equal to the estimate from 2019. Then the excess mortality increases from 16K (CBS) to 24K.I present the following idea to explain the difference. At the beginning of very year the numbers of people in year groups are usually adjusted by applying a historical determined percentage to the population at January first. Covid hits the weakest the hardest. This changes the distribution of the expected remaining life years in the year group. And thus the average expected remaining life years. Hence the percentage has to be adjusted. Then the expected mortality decreases and the excess mortality increases.The excess mortality within a year are people who for example died in April from covid but who would have died in October without the pandemic. With this number total excess mortality rises with 6K to 30K.Excess mortality is divided in covid and non-covid. De large increase in non-covid deaths is striking.The analysis supports the conjecture that excess mortality is underestimated.Note: The numbers in this article are for the Netherlands. For you own country use the appropriate numbers.
Category: Statistics

[334] viXra:2312.0088 [pdf] submitted on 2023-12-17 23:25:17

Expected Mortality: Adjustment for Distribution in Age-Groups

Authors: Hans Lugtigheid
Comments: 4 Pages.

This article discusses the influence of a disturbance like covid on the calculation of life expectancy in year groups etcetera. Life expectancies in year-groups are usually adjusted in the beginning of the year based on the population in the beginning of the year. This is done with a percentage based on previous years. This percentage is a reflection of volume. With the pandemic the weak were hit heavily by covid. A consequence is that the distribution of life expectancy changes in the year groups. This increases the life expectancy and decreases the expected mortality in the year group. Then the calculation for the year groups has to be adjusted accordingly. In this article I give an example of such adjustment. One can accordingly adjust likewise statistics.
Category: Statistics

[333] viXra:2311.0085 [pdf] submitted on 2023-11-19 02:50:22

A Framework for Modeling, Analyzing, and Decision-Making in Disease Spread Dynamics and Medicine/Vaccine Distribution

Authors: Zenin Easa Panthakkalakath, Neeraj, Jimson Mathew
Comments: 12 Pages.

The challenges posed by epidemics and pandemics are immense, especially if the causes are novel. This article introduces a versatile open-source simulation framework designed to model intricate dynamics of infectious diseases across diverse population centres. Taking inspiration from historical precedents such as the Spanish flu and COVID-19, and geographical economic theories such as Central place theory, the simulation integrates agent-based modelling to depict the movement and interactions of individuals within different settlement hierarchies. Additionally, the framework provides a tool for decision-makers to assess and strategize optimal distribution plans for limited resources like vaccines or cures as well as to impose mobility restrictions.
Category: Statistics

[332] viXra:2310.0050 [pdf] submitted on 2023-10-10 22:03:45

Ratios of Exponential Functions, Interpolation

Authors: Bukac Josef
Comments: 11 Pages. We use interpolation to get the starting values of parameters. Another paper about The singularity of the atrix appearing in the Gauss-Newton method will follow.

We describe models of proportions depending on some independent quantitative variables. An explicit formula for inverse matrices facilitatesinterpolation as a way to calculate the starting values for iterations in nonlinear regression with logistic functions or ratios of exponential functions.
Category: Statistics

[331] viXra:2310.0032 [pdf] submitted on 2023-10-06 15:52:16

Adaptive Posterior Distributions for Uncertainty Analysis of Covariance Matrices in Bayesian Inversion Problems for Multioutput Signals

Authors: E. Curbelo, L. Martino, F. Llorente, D. Delgado-Gomez
Comments: 28 Pages.

In this paper we address the problem of performing Bayesian inference for the parameters of a nonlinear multi-output model and the covariance matrix of the different output signals. We proposean adaptive importance sampling (AIS) scheme for multivariate Bayesian inversion problems, which is based in two main ideas: the variables of interest are split in two blocks and the inferencetakes advantage of known analytical optimization formulas. We estimate both the unknown parameters of the multivariate non-linear model and the covariance matrix of the noise. In the firstpart of the proposed inference scheme, a novel AIS technique called adaptive target AIS (ATAIS) is designed, which alternates iteratively between an IS technique over the parameters of the nonlinearmodel and a frequentist approach for the covariance matrix of the noise. In the second part of the proposed inference scheme, a prior density over the covariance matrix is considered and the cloud of samples obtained by ATAIS are recycled and re-weighted for obtaining a complete Bayesian study over the model parameters and covariance matrix. ATAIS is the main contribution of the work. Additionally, the inverted layered importance sampling (ILIS) is presented as a possible compelling algorithm (but based on a conceptually simpler idea). Different numerical examples show the benefits of the proposed approaches.
Category: Statistics

[330] viXra:2309.0010 [pdf] submitted on 2023-09-01 07:15:18

Securing the Foundations of Probability Theory

Authors: Randolph L. Gerl
Comments: 6 Pages.

Several traditional problems in probability theory are discussed and a resolution to them is proposed. The use of probability theory in the study of physical reality is contrasted with its use in pure mathematics and the latter is found to be problematic. The proposed resolution is postulated to work for all physical reality but is inclusive enough to cover many situations in pure mathematics.
Category: Statistics

[329] viXra:2308.0183 [pdf] submitted on 2023-08-27 16:13:11

Linear Compositional Regression

Authors: Josef Bukac
Comments: 7 Pages. A paper on interpolation by generalized logistic functions will follow

We study the properties of regression coefficients when the sum of the dependent variables is one,ie, the dependent variables are compositional.We show that the sum of intercepts is equal toone and the sum of other corresponding regressioncoefficients is zero. We do it for simple linearregressions and also for a more general case usingmatrix notation. The last part treats the casewhen the dependent variables do not sum up to one. We simplify the well known formula derived by theuse of Lagrange multipliers.
Category: Statistics

[328] viXra:2307.0056 [pdf] submitted on 2023-07-11 16:50:28

An Automatic Counting System of Small Objects in Noisy Images with a Noisy Labelled Dataset: Computing the Number of Microglial Cells in Biomedical Images

Authors: L. Martino, P. Paradas, L. Carro, M. M. Garcia, C. Goicoechea, S. Ingrassi
Comments: 20 Pages.

Counting immunopositive cells on biological tissues generally requires either manual annotation or (when available) automatic rough systems, for scanning signal surface and intensity in whole slide imaging. In this work, we tackle the problem of counting microglial cells in biomedical images that represent lumbar spinal cord cross-sections of rats. Note that counting microglial cells is typically a time-consuming task, and additionally entail extensive personnel training. We skip the task of detecting the cells and we focus only on the counting problem. Firstly, a linear predictor is designed based on the information provided by filtered images, obtained applying color threshold values to the labelled images in thedataset. Non-linear extensions and other improvements are presented. The choice of the threshold values is also discussed. Different numerical experiments show the capability of the proposed algorithms. Furthermore, the proposed schemes could be applied to different counting problems of small objects in other types of images (from satellites, telescopes, and/or drones, to name a few).
Category: Statistics

[327] viXra:2306.0081 [pdf] submitted on 2023-06-14 03:36:54

Statistics of L1 Distances in the Finite Square Lattice

Authors: Richard J. Mathar
Comments: 12 Pages.

The L1 distance between two points in a square lattice is the sum of horizontal and vertical absolute differences of the Cartesian coordinates and - as in graph theory - also the minimumnumber of edges to walk to reach one point from the other. The manuscript contains a Java program that computes in a finite square grid of fixed shapethe number of point pairs as a function of that distance.
Category: Statistics

[326] viXra:2305.0011 [pdf] submitted on 2023-05-03 01:21:57

Winning at War: Comparing Different Strategies in a Card Game

Authors: Hakon Olav Torvik
Comments: 6 Pages.

The card game "war" is a simple game usually assumed to not include any element of strategy, only luck. I challenge this notion by noticing that the order of placing cards back into the deck can be used as a strategy. I simulate the game with different strategies, and find that the strategies can significantly increase the chances of winning, but usually increase the time it takes to complete the game. This is however dependent on your opponent using specific strategies. The best advice on strategy seems to be tricking your opponent into following an ordered strategy, while you use a random strategy, a strategy some might object to.
Category: Statistics

[325] viXra:2304.0006 [pdf] submitted on 2023-04-01 22:25:09

The Greggs-Pret Index: a Machine Learning Analysis of Consumer Habits as a Metric for the Socio-Economic North-South Divide in England

Authors: Robin Smith, Kristian C. Z. Haverson
Comments: 5 Pages.

In England, it is anecdotally remarked that the number of Greggs bakeries to be found in a town is a reliable measure of the area’s 'Northern-ness'. Conversely, a commercial competitor to Greggs in the baked goods and sandwiches market, Pret-a-Manger, is reputed to be popular in more 'southern' areas of England. Using a Support Vector Machine and an Artificial Neural Network (ANN) Regression Model, the relative geographical distributions of Greggs and Pret have been utilised for the first time to quantify the North-South divide in England. The calculated dividing lines were each compared to another line, based on Gross Domestic Household Income (GDHI). The lines match remarkably well, and we conclude that this is likely because much of England's wealth is concentrated in London, as are most of England's Pret-a-Manger shops. Further studies were conducted based on the relative geographical distributions of popular supermarkets Morrisons and Waitrose, which are also considered to have a North-South association. This analysis yields different results. For all metrics, the North-South dividing line passes close to the M1 Watford Gap services. As a common British idiom, this location is oft quoted as one point along the English North-South divide, and it is notable that this work agrees. This tongue-in-cheek analysis aims to highlight more serious factors highlighting the North-South divide, such as life expectancy, education, and poverty.
Category: Statistics

[324] viXra:2303.0043 [pdf] submitted on 2023-03-07 02:38:32

[Consideration of Immunity Loss in the Epidemic Model of Respiratory Viruses]

Authors: Johnny J. Mafra Jr.
Comments: 16 Pages.

A previous work on Covid-19 forecast miserably failed to preview the epidemic evolution with the massive vaccination done during 2021. This paper aims to workaround its weak point, which was to not consider immunity loss in its model. The set of SIR equations was reviewed including immunity loss, Beta profile was recalculated and the model was tuned using real data of 2021. This way was achieved a good conformance between the simulation and data, roughly within the calculated uncertainty of 25%. The simulation for 2022 presented Omicron peak but switched in time. The probable explanation for that is an unbalance in Beta profile in the beginning of 2022, resulting in a bigger peak in January and in consequence a smaller one latter, due to more immune people. It was explored the hypothesis of different immunity losses for natural and vaccine immunities. This case showed a theoretical profile similar to the real data observed. As a limit case theoretical study, was verified that the epidemic evolution in several years more similar to real data was the case in that the vaccination didn’t avoid transmission or avoid as little as 20%. Simulation showed, as expected, that if Beta is below some limit the epidemic vanishes. Data showed that Covid-19 seems to be naturally vanishing by itself, meaning that no measures so far were effective. New approaches are speculated to provide a better performance on epidemic combat based on ventilation and air sterilization using GUV. Suggestions on how to test those approaches are presented.
Category: Statistics

[323] viXra:2302.0081 [pdf] submitted on 2023-02-17 17:09:23

Under What Requirements Will Bayes’ Theorem be Meaningful?

Authors: Joseph Palazzo
Comments: 4 Pages.

We establish that all the pertinent elements of an assertion must be real. That if it contains an element M which cannot be classified as real, we say that the assertion is contaminated. We then show that Bayes’ Theorem is invalid.
Category: Statistics

[322] viXra:2301.0134 [pdf] submitted on 2023-01-25 13:50:10

Correlation Between Substance Representing that Tier and Its Typical Price in Several Games Using a Tier System

Authors: Kyumin Nam
Comments: 3 Pages.

Substances representing tier (Iron, Bronze, Silver, Gold, Platinum, Diamond) and its typical price (USD/gram) in several games using a tier system have a positive correlation [1, 2, 5].
Category: Statistics

[321] viXra:2212.0092 [pdf] submitted on 2022-12-09 13:55:42

Plithogenic Probability & Statistics Are Generalizations of Multivariate Probability & Statistics

Authors: Florentin Smarandache
Comments: 10 Pages.

In this paper we exemplify the types of Plithogenic Probability and respectively Plithogenic Statistics. Several applications are given. The Plithogenic Probability of an event to occur is composed from the chances that the event occurs with respect to all random variables (parameters) that determine it. Each such a variable is described by a Probability Distribution (Density) Function, which may be a classical, (T,I,F)-neutrosophic, I-neutrosophic, (T,F)-intuitionistic fuzzy, (T,N,F)-picture fuzzy, (T,N,F)-spherical fuzzy, or (other fuzzy extension) distribution function. The Plithogenic Probability is a generalization of the classical MultiVariate Probability. The analysis of the events described by the plithogenic probability is the Plithogenic Statistics.
Category: Statistics

[320] viXra:2209.0132 [pdf] submitted on 2022-09-23 13:33:45

Universal and Automatic Elbow Detection for Learning the Effective Number of Components in Model Selection Problems

Authors: E. Morgado, L. Martino, R. San Millán-Castillo
Comments: 10 Pages.

We design an automatic elbow detector (UAED) for deciding effective number of components in model selection problems. The relationship with the information criteria widely employed in the literature is also discussed. The proposed UAED does not require the knowledge of a likelihood function and can be easily applied in diverse applications, such as regression and classification, feature and/or order selection, clustering, dimension reduction etc. Several experiments involving synthetic and real data show the advantages of the proposed scheme with benchmark techniques in the literature.
Category: Statistics

[319] viXra:2209.0123 [pdf] submitted on 2022-09-22 20:33:40

Spectral Information Criterion for Automatic Elbow Detection

Authors: L. Martino, R. San Millán-Castillo, E. Morgado
Comments: 20 Pages.

We introduce a generalized information criterion which contains other well-known information criteria, such as BIC and AIC, as special cases. Furthermore, the proposed spectral information criterion (SIC) is also more general than the other information criteria, e.g., since the knowledge of a likelihood function is not strictly required. SIC extracts geometric features of the error curve and, as a consequence, it can be considered an automatic elbow detector. SIC provides a subset of all possible models, with a cardinalitythat often is much smaller than the total number of possible models. The elements of this subset are "elbows" of the error curve. A practical rule for selecting a unique model withinthe sets of elbows is suggested as well. Several experiments involving ideal scenarios, synthetic data and real data show the benefits of the proposed scheme. Matlab code related to theexperiments is available.
Category: Statistics

[318] viXra:2207.0168 [pdf] submitted on 2022-07-28 22:43:14

Explicit Statistical Assay of Suicide Rates in Germany and Brazil

Authors: Anurag Dutta, Manan Roy Choudhury, Seemantini Chattopadhyay
Comments: 10 Pages. (A non-essential image deemed to be insensitive is blocked by viXra Admin)

Background: Suicide, the act of self-hurting or killing intentionally is in great spurt these days. It is the result of mental disorders resulting from depression, anxiety, or stress.
Methods: In this study, we have analyzed the dataset of suicide cases for one developing country - "Brazil", and one developed country - "Germany", and have used Statistical Methods, along with Machine Learning techniques to obtain a clear idea.
Results: We discovered that the Suicide Rate in Brazil is quite high in comparison to the Suicide Rate in Germany.
Conclusions: Our results provide a shred of evidence that the development status of the country, along with some more factors, like Per - Capita Income, Employment, Literacy, etc. in some way or the other affects the suicide rate of a country.
Category: Statistics

[317] viXra:2204.0154 [pdf] submitted on 2022-04-26 18:14:52

The Proof of Riemann Hypothesis

Authors: Minuk Choi
Comments: 7 Pages.

The proposition that is “the ratio of numbers that have an even number and odd number of prime factors none repeated is 50 : 50” is equivalence relation with Riemann hypothesis. I prove this proposition using the posterior distribution of discrete uniform distribution.
Category: Statistics

[316] viXra:2202.0089 [pdf] submitted on 2022-02-13 23:14:57

Some (Vaguely Meaningful) Fun With A Coin Toss Game: The "(St.) Petersburg" Game Paradox

Authors: Gary J. Duggan
Comments: 5 Pages.

A simple coin toss game, attributed to Nicolaus Bernoulli in the early 1700s, results in a mathematical paradox which still appears to be subject to what might be described as "conceptual" rather than "mathematical" solutions. A mathematical solution is given showing that, if the number of games is 2^m-1 then the average payout per game for this number of games is m/(2-(1/2^(m-1))).
Category: Statistics

[315] viXra:2202.0084 [pdf] submitted on 2022-02-13 23:24:08

COVID-19 and All-Cause Mortality Data by Age Group Reveals Risk of COVID Vaccine-Induced Fatality is Equal to or Greater than the Risk of a COVID death for all Age Groups Under 80 Years Old as of 6 February 2022

Authors: Kathy Dopp, Stephanie Seneff
Comments: 21 Pages.

As of 6 February 2022, based on publicly available official UK and US data, all age groups under 50 years old are at greater risk of fatality after receiving a COVID-19 inoculation than an unvaccinated person is at risk of a COVID-19 death. All age groups under 80 years old have virtually no benefit from receiving a COVID-19 inoculation, and the younger ages incur significant risk. This analysis is conservative because it ignores the fact that inoculation-induced adverse events such as thrombosis, myocarditis, Bell’s palsy, and other vaccine-induced injuries can lead to shortened life span. When one takes into consideration the fact that there is approximately a 90% decrease in risk of COVID-19 death if early treatment is provided to all symptomatic high-risk persons, one can only conclude that mandates of COVID-19 inoculations are ill-advised. Considering the emergence of antibody-resistant variants like Delta and Omicron, for most age groups COVID-19 vaccine inoculations result in higher death rates than COVID-19 does for the unvaccinated.
Category: Statistics

[314] viXra:2201.0152 [pdf] submitted on 2022-01-23 18:41:23

Forensic Analysis of Lucy I and Lucy II

Authors: Robert Bennett
Comments: 6 Pages.

A quantitative test for the probability that two sets of photos are of the same woman. The result for 7 facial characteristics in each photo is that the odds are 30 million to 1 that Lucy I and Lucy II are the same person.
Category: Statistics

[313] viXra:2112.0158 [pdf] submitted on 2021-12-30 18:14:01

An Exhaustive Variable Selection Study for Linear Models of Soundscape Emotions: Rankings and Gibbs Analysis

Authors: R. San Millán-Castillo, L. Martino, E. Morgado, F. Llorente
Comments: 18 Pages.

In the last years, soundscapes have become one of the most active topics in Acoustics, providing a holistic approach to the acoustic environment, which involves human perception and context. Soundscapes-elicited emotions are central and substantially subtle and unnoticed (compared to speech or music). Currently, soundscape emotion recognition is a hot topic in the literature. We provide an exhaustive variable selection study (i.e., a selection of the soundscapes indicators) to a well-known dataset (emo-soundscapes).We consider linear soundscape emotion models for two soundscapes descriptors: arousal and valence. Several ranking schemes and procedures for selecting the number of variables are applied. We have also performed an alternating optimization scheme for obtaining the best sequences keeping fixed a certain number of features. Furthermore, we have designed a novel technique based on Gibbs sampling, which provides a more complete and clear view of the relevance of each variable. Finally, we have also compared our results with the analysis obtained by the classical methods based on p-values. As a result of our study, we suggest two simple and parsimonious linear models of only 7 and 16 variables (within the 122 possible features) for the two outputs (arousal and valence), respectively. The suggested linear models provide very good and competitive performance, with R2 > 0.86 and R2 > 0.63 (values obtained after a cross-validation procedure), respectively.
Category: Statistics

[312] viXra:2112.0058 [pdf] submitted on 2021-12-12 20:44:49

An Ordered Sample Mean That's a Bit Like Simpson's Rule

Authors: D Williams
Comments: 2 Pages.

A "Simpson's Rule"-like Ordered Sample Mean is compared with the standard version. It appears to be better at least for small sample sizes. A related integral approximation is also given and tested against the Mid Point Rule. Other Types of Ordered Sample Means need investigating.
Category: Statistics

[311] viXra:2112.0013 [pdf] submitted on 2021-12-02 02:52:21

Minimum with Inequality Constraint Applied to Increasing Cubic, Logistic and Gomperz or Convex Quartic and Biexponential Regressions

Authors: Josef Bukac
Comments: 28 Pages.

We present a method of minimizing an objective function subject to an inequality constraint. It enables us to minimize the sum of squares of deviations in linear regression under inequality restrictions. We demonstrate how to calculate the coefficients of cubic function under the restriction that it is increasing, we also mention how to fit a convex quartic polynomial. We use such results for interpolation as a method for calculation of starting values for iterative methods of fitting some specific functions, such as four-parameter logistic, positive bi-exponential, or Gomperz functions. Curvature-driven interpolation enables such calculations for otherwise solutions to interpolation equations may not exist or may not be unique. We also present examples to illustrate how it works and compare our approach with that of Zhang (2020).
Category: Statistics

[310] viXra:2111.0150 [pdf] submitted on 2021-11-28 14:29:35

Bayesian Inference Via Generalized Thermodynamic Integration

Authors: F. Llorente, L. Martino, D. Delgado
Comments: 17 Pages.

The idea of using a path of tempered posterior distributions has been widely applied in the literature for the computation of marginal likelihoods (a.k.a., Bayesian evidence). Thermodynamic integration, path sampling and annealing importance sampling are well-known examples of algorithms belonging to this family of methods. In this work, we introduce a generalized thermodynamic integration (GTI) scheme which is able to perform a complete Bayesian inference, i.e., GTI can approximate generic posterior exceptions (not only the marginal likelihood). Several scenarios of application of GTI are discussed and different numerical simulations are provided.
Category: Statistics

[309] viXra:2111.0145 [pdf] submitted on 2021-11-28 17:11:33

Effective Sample Size Approximations as Entropy Measures

Authors: L. Martino, V. Elvira
Comments: 11 Pages.

In this work, we analyze alternative e ective sample size (ESS) measures for importance sampling algorithms. More specifically, we study a family of ESS approximations introduced in [11]. We show that all the ESS functions included in this family (called Huggins-Roy's family) satisfy all the required theoretical conditions introduced in [17]. We also highlight the relationship of this family with the Renyi entropy. By numerical simulations, we study the performance of different ESS approximations introducing also an optimal linear combination of the most promising ESS indices introduced in literature. Moreover, we obtain the best ESS approximation within the Huggins-Roy's family, that provides almost a perfect match with the theoretical ESS values.
Category: Statistics

[308] viXra:2111.0012 [pdf] submitted on 2021-11-02 20:50:03

A Revised Comparison Between Fama and French Five-Factor Model and Three-Factor Model——based on China's a-Share Market

Authors: Zhijing Zhang, Yue Yu, Qinghua Ma, Haixiang Yao
Comments: 18 Pages.

In allusion to some contradicting results in existing research, this paper selects China's latest stock data from 2005 to 2020 for empirical analysis. By choosing this periods’ data, we avoid the periods of China's significant stock market reforms to reduce the impact of the government's policy on the factor effect. In this paper, the redundant factors (HML, CMA) are orthogonalized, and the regression analysis of 5*5 portfolio of Size-B/M and Size-Inv is carried out with these two orthogonalized factors. It found that the HML and the CMA are still significant in many portfolios, indicating that they have a strong explanatory ability, which is also consistent with the results of GRS test. All these show that the five-factor model has a better ability to explain the excess return rate. In the concrete analysis, this paper uses the methods of the five- factor 25-group portfolio returns calculation, the five-factor regression analysis, the orthogonal treatment, the five-factor 25-group regression and the GRS test to more comprehensively explain the excellent explanatory ability of the five-factor model to the excess return. Then, we analyze the possible reasons for the strong explanatory ability of the HML, CMA and RMW from the aspects of price to book ratio, turnover rate and correlation coefficient. We also give a detailed explanation of the results, and analyze the changes of China's stock market policy and investors' investment style recent years. Finally, this paper attempts to put forward some useful suggestions on the development of asset pricing model and China's stock market.
Category: Statistics

[307] viXra:2110.0128 [pdf] submitted on 2021-10-22 04:13:21

Violating the Second Law of Thermodynamics in a Dynamical System Through Equivalence Closure Via Mutual Information Carriers of a 5-Tuple Measure Space

Authors: Deep Bhattacharjee
Comments: 22 Pages, 5 Figures, TechRxiv (Computations), Ergodic Theory

Time and space average of an ergodic systems following the 5-tuple relations (A,~,J,Σ,μ) through the initial increment from a+bθ to a+c+bθ indicates the entropy to be reserved in the deterministic yet dynamical and conservative systems to hold for the set S_p= S_1 ∑_(i=2)^∞_S_i keeping S as the entropy ∃(S_∞=⋯S_3=S_2 )>S_1 obeying the Poincare ́ recurrence theorem throughout the constant attractor A. This in turn states the facts of the equivalence closure as the property of the induced systems to resemblance an entropy conserving scenarios.
Category: Statistics

[306] viXra:2110.0032 [pdf] submitted on 2021-10-07 09:24:06

On the Safe Use of Prior Densities for Bayesian Model Selection in Physics

Authors: F. Llorente, L. Martino, E. Curbelo, J. Lopez-Santiago, D. Delgado
Comments: 25 Pages.

The application of Bayesian inference in physics for the purpose of model selection is very popular nowadays. In this framework, models are compared through their marginallikelihoods, or their quotients, called Bayes factors. However, marginal likelihoods show strong dependence on the prior choice, even when the data are very informative, unlike the posterior distribution. Furthermore, when the prior is improper, the marginal likelihood of the corresponding model is undetermined. In this work, we aim to raise awareness about the issue of prior sensitivity of the marginal likelihood and its role in model selection. We also comment on the use of uninformative priors, which are very common choices in practice. Several practical suggestions are provided and possible solutions allowing the use of improper priors are discussed. The connection between the marginal likelihood approach and the well- known information criteria is also presented. We describe all the issues and possible solutions by illustrative numerical examples (providing some related code). One of them involving a real-world application on exoplanet detection.
Category: Statistics

[305] viXra:2109.0178 [pdf] submitted on 2021-09-24 07:34:10

Optimality in Noisy Importance Sampling

Authors: F. Llorente, L. Martino, J. Read, D. Delgado
Comments: 13 Pages.

Many applications in signal processing and machine learning require the study of probability density functions (pdfs) that can only be accessed through noisy evaluations. In this work, we analyze the noisy importance sampling (IS), i.e., IS working with noisy evaluations of the target density. We present the general framework and derive optimal proposal densities for noisy IS estimators. The optimal proposals incorporate the information of the variance of the noisy realizations, proposing points in regions where the noise power is higher. We also compare the use of the optimal proposals with previous optimality approaches considered in a noisy IS framework.
Category: Statistics

[304] viXra:2107.0131 [pdf] submitted on 2021-07-23 19:10:35

Measurement Space Partitioning for Estimation and Prediction

Authors: Glenn Healey, Shiyuan Zhao
Comments: 31 Pages.

An important and challenging problem in the evaluation of baseball players is the quan- tification of batted-ball talent. This problem has traditionally been addressed using linear regression on the value of a statistic derived from a set of observations. We use large sets of trajectory measurements acquired by in-game sensors to show that the predictive value of a batted ball depends on its physical properties. This knowledge is exploited to estimate batted-ball distributions defined over a multidimensional measurement space from observed distributions by using regression parameters that adapt to batted ball properties. This process is central to a new method for estimating batted-ball talent. The domain of the batted-ball distributions is defined by a partition of measurement space that is selected to optimize the accuracy of the estimates. We present examples illustrating facets of the new approach and use a set of experiments to show that the new method generates estimates that are significantly more accurate than those generated using current methods. The new methodology supports the use of fine-grained contextual adjustments and we show that this process further improves the accuracy of the technique.
Category: Statistics

[303] viXra:2107.0031 [pdf] submitted on 2021-07-05 20:36:40

Improvement of the Matlab Program Proposed in Vixra:2103.0018

Authors: Joh. J. Sauren, Aloys J. Sipers
Comments: 8 Pages. [Corrections made by viXra Admin to conform with the requirements of viXra.org]

In this article, the Matlab program proposed in the article viXra:2103.0018 is improved. Further, the constant d3 depends on the constants d2 and a3. Three theorems are stated on the generating functions for the constants d2 and a3. The first two theorems provide analytical expressions for these generating functions, whereas the third theorem relates them.
Category: Statistics

[302] viXra:2106.0144 [pdf] submitted on 2021-06-24 18:41:26

Wave Packets of Relaxation Type in Boundary Problems of Quantum Mechanics

Authors: Igor B. Krasnyuk
Comments: 23 Pages.

An initial value boundary problem for the linear Schr ˙odinger equation with nonlinear functional boundary conditions is considered. It is shown that attractor of problem contains periodic piecewise constant functions on the complex plane with finite points of discontinuities on a period. The method of reduction of the problem to a system of integro-difference equations has been applied. Applications to optical resonators with feedback has been considered. The elements of the attractor can be interpreted as white and black solitons in nonlinear optics.
Category: Statistics

[301] viXra:2106.0036 [pdf] submitted on 2021-06-07 20:38:46

Introduction to the Gaussian Information Criterion

Authors: Russell Leidich
Comments: 9 Pages. [Corrections are made by viXra Admin to comply with the rules of viXra.org]

There are many applications involving physical measurements which are expected to result in a probability density function (PDF) which is asymptotically Gaussian (normal) or lognormal. In the latter case, we can simply take the logs of the (positive) samples in order to obtain the former, so the math in this paper will focus exclusively on Gaussians.

For example, we would expect the distribution of radio power received at a dish to be lognormally distributed, given a sufficiently broad swath of sky to observe for a sufficiently long duration, and in the relative absence of terrestrial radio interference. However, if we were then to focus on a particular star system, the observed "experimental" PDF could substantially deviate from that "background" PDF. It might not even be lognormal if, for example, the star exhibits peaks in radio power at a few distinct frequencies.

It would therefore be useful to have a means to quantify the "surprise" factor of experimental PDFs relative to an established background PDF which is known to be, or be equivalent to, a Gaussian. If a given experimental PDF where also known to be Gaussian, then we could do this by employing the Kullback-Leibler (KL) divergence from one to the other, as Gupta appears to have done for the multidimensional case.

When the experimental PDF is not known to be Gaussian (or any PDF archetype, for that matter), the situation is more complicated, mainly because we are forced to deal with a real-valued set of samples ordered by increasing positivity -- a 1D point cloud, to be precise, although "vector" will suffice for brevity -- rather than an analytic function. Ranking the information cost of encoding such a vector, versus others arising from other experiments, under the prior assumption of the same background PDF, is the subject of this paper. We also investigate the question of ascertaining which background PDF is the most useful for the sake of discriminating anomalous from mundane experimental PDFs.


Category: Statistics

[300] viXra:2104.0046 [pdf] submitted on 2021-04-09 17:05:32

Respiratory Viruses Epidemic Dynamics Covid-19 Case Study and Forecast for 2021 in the Most Affected Countries

Authors: Johnny J. Mafra Jr.
Comments: 20 Pages.

It was researched and adopted a method to introduce a seasonal behavior on SIR model to study the dynamics of covid-19. This method is based on the calculation of β for each week of the year based on observed previous seasonal behavior for several countries and regions, which are the most affected in the world. Was also included in the model the vaccination, which will be a factor of major effect on this dynamic in 2021. The model was used to build a simulator and was done the determination of β and the forecast of covid-19 cases for USA, Brazil and India. β was found to range seasonally from 0,15 to 0,40 or from 0,10 to 0,80 depending on the region. It was found that vaccination will be very effective in reducing the cases in 2021 and that the herd immunity will be reached when around 55% of the population be immune. The simulation took to some unexpected findings, like the effect of lockdown in later waves of the epidemic and about the epidemic dynamics. It was found a condition for exogenic respiratory viruses that triggers a major epidemic and a condition that explains why a respiratory virus for which part of the population is already immune has a seasonal behavior, with a small number of cases. This dynamic explains the evolution of covid-19 in 2020 and 2021 and even the Spanish flu in 1918 and 1919.
Category: Statistics

[299] viXra:2103.0173 [pdf] submitted on 2021-03-27 02:04:39

Datasailr an R Package for Row by Row Data Processing, Using Datasailr Script

Authors: Toshihiro Umehara
Comments: 8 Pages.

Data processing and data cleaning are essential steps before applying statistical or machine learning procedures. R provides a flexible way for data processing using vectors. R packages also provide other ways for manipulating data such as using SQL and using chained functions. I present yet another way to process data in a row by row manner using data manipulation oriented script, DataSailr script. This article introduces datasailr package, and shows potential benefits of using domain specific language for data processing.
Category: Statistics

[298] viXra:2103.0079 [pdf] submitted on 2021-03-12 01:15:46

The Scale Invariant Prior and Its Generalizations

Authors: Stephen P. Smith
Comments: 8 Pages.

The scale invariant prior is revisited, for a single variance parameter and for a variance-covariance matrix. These results are generalized to develop different scale invariant priors where probability measure is assigned through the sum of variance components that represent partitions of total variance, or through a sum of variance-covariance matrices representing partitions of a total variance-covariance matrix.
Category: Statistics

[297] viXra:2103.0018 [pdf] submitted on 2021-03-03 14:39:39

On the Computation of the Principal Constants $d_{2}$ and $d_{3}$ Used to Construct Control Limits for Control Charts Applied in Statistical Process Control

Authors: Joh. J. Sauren
Comments: 3 Pages.

In this communication a short and straightforward algorithm, written in Octave (version 6.1.0 (2020-11-26))/Matlab (version '9.9.0.1538559 (R2020b) Update 3'), is proposed for brute-force computation of the principal constants $d_{2}$ and $d_{3}$ used to calculate control limits for various types of variables control charts encountered in statistical process control (SPC).
Category: Statistics

[296] viXra:2103.0008 [pdf] submitted on 2021-03-02 17:19:51

Compressed Particle Methods for Expensive Models with Application in Astronomy and Remote Sensing

Authors: L. Martino, V. Elvira, J. Lopez-Santiagoy G. Camps-Valls
Comments: 41 Pages.

In many inference problems, the evaluation of complex and costly models is often required. In this context, Bayesian methods have become very popular in several fields over the last years, in order to obtain parameter inversion, model selection or uncertainty quantification. Bayesian inference requires the approximation of complicated integrals involving (often costly) posterior distributions. Generally, this approximation is obtained by means of Monte Carlo (MC) methods. In order to reduce the computational cost of the corresponding technique, surrogate models (also called emulators) are often employed. Another alternative approach is the so-called Approximate Bayesian Computation (ABC) scheme. ABC does not require the evaluation of the costly model but the ability to simulate artificial data according to that model. Moreover, in ABC, the choice of a suitable distance between real and artificial data is also required. In this work, we introduce a novel approach where the expensive model is evaluated only in some well-chosen samples. The selection of these nodes is based on the so-called compressed Monte Carlo (CMC) scheme. We provide theoretical results supporting the novel algorithms and give empirical evidence of the performance of the proposed method in several numerical experiments. Two of them are real-world applications in astronomy and satellite remote sensing.
Category: Statistics

[295] viXra:2102.0094 [pdf] submitted on 2021-02-17 23:44:54

The kth Power Expectile Estimation and Testing

Authors: Fuming Lin, Yingying Jiang, Yong Zhou
Comments: 57 Pages.

This paper develops the theory of the kth power expectile estimation and considers its relevant hypothesis tests for coefficients of linear regression models. We prove that the asymptotic covariance matrix of kth power expectile regression converges to that of quantile regression as k converges to one, and hence provide a moment estimator of asymptotic matrix of quantile regression. The kth power expectile regression is then utilized to test for homoskedasticity and conditional symmetry of the data. Detailed comparisons of the local power among the kth power expectile regression tests, the quantile regression test, and the expectile regression test have been provided. When the underlying distribution is not standard normal, results show that the optimal k are often larger than 1 and smaller than 2, which suggests the general kth power expectile regression is necessary.
Category: Statistics

[294] viXra:2102.0027 [pdf] submitted on 2021-02-05 13:05:30

Hamiltonian Markov Chain Monte Carlo Using Second Derivatives

Authors: Stephen P. Smith
Comments: 10 Pages.

Hamiltonian Markov Chain Monte Carlo is one of the established methods to conduct a Bayesian simulation. This method uses evaluations of the probability density and its gradient at particular variables. This present paper describes how to incorporate information from second derivatives that relate to a direction set, and describes how to modify the simulation accordingly.
Category: Statistics

[293] viXra:2102.0026 [pdf] submitted on 2021-02-05 22:05:06

Revisiting the UK EU Membership Referendum (Brexit) Poll Tracker

Authors: Michaelino Mervisiano
Comments: 37 Pages. [Corrections made by viXra Admin to conform with the requirements on the Submission Form]

On the 23rd June 2016, the United Kingdom (UK) European Union (EU) membership referendum resulted in 51.9% of voters voted to leave EU—popularly termed as Brexit. Given its significant implications, correctly predicting Brexit was crucial but most pollsters predicted incorrectly. This paper assesses whether Brexit was evident and predictable from the pre-referendum polls data. Unlike previous studies—whose analytical tools are limited to latest poll analysis, descriptive statistics, point estimate, and simple linear regression—this project use more robust and sophisticated statistical methodologies
Category: Statistics

[292] viXra:2101.0082 [pdf] submitted on 2021-01-13 14:07:54

Special Beta Distribution for Big Data Analysis: X ~ Beta (α = λ + 1, β = 2 – λ)

Authors: Kuan-Shian Wang, Mei-Yu Lee
Comments: 89 Pages. [Corrections made to conform with the requirements on the Submission Form]

This book discusses the special case of Beta distribution as α = λ + 1 and β = 2 – λ. To compare with the continuous Bernoulli distribution, the change of λ affected the pdf of the special Beta distribution. Then find out the sufficient statistic, the point estimator, the confidence interval, the test statistic, and the goodness of fit. The special Beta distribution at the case of λ = 0.5 is different from the continuous Bernoulli distribution. The special Beta distribution pdf is changed in smoothing but the Continuous Bernoulli distribution pdf has a big wave when λ is from small to large. As the sample size becomes large, two distributions are approximated to Normal distribution with different relationships between λ and the sum of samples.
Category: Statistics

[291] viXra:2101.0046 [pdf] submitted on 2021-01-06 17:52:45

Analyzing the Side Force on a Baseball Using Hawk-Eye Measurements

Authors: Glenn Healey, Lequan Wang
Comments: 19 Pages.

We use Hawk-Eye measurements to analyze the side force on a baseball.
Category: Statistics

[290] viXra:2101.0034 [pdf] submitted on 2021-01-05 09:18:19

Multi Categories Analytic Method Using Continuous Bernoulli Distribution and Conditional Distribution

Authors: Kuan-Shian Wang, Mei-Yu Lee
Comments: 195 Pages.

This book provides four model designs to discuss how continuous Bernoulli distribution extends to the analysis of K categories. By contrast to the discrete polynomial distribution which is extended from Bernoulli distribution depending on the additive property, the random variable of continuous Bernoulli should be tested the pdf, cdf, distribution, and checked if maintain the characteristics of CB distribution or not. Model 1 is from random variable method(variable-added), Model 2 and 3 are from the probability model-building and suitable for the parameter-added or the conditional relationship of variables, respectively. Model 4 is from the continuous trinomial distribution and suitable for the joint relationship of variables.
Category: Statistics

[289] viXra:2012.0221 [pdf] submitted on 2020-12-30 12:07:53

The Continuous Bernoulli Approaching Distribution When λ → 0 and the Continuous Binomial Distribution

Authors: Kuan-Shian Wang, Mei-Yu Lee
Comments: 37 Pages. [Corrections are made by viXra Admin to comply with the rules of viXra.org]

We provide the mathematical deduction and numerical explanations to verify that as λ → 0, the continuous Bernoulli approximates to the exponential distribution in Chapter 1 and as λ → 0 and λ → 1, the continuous binomial distribution will approximate to Gamma distribution in Chapter 3. Meanwhile, Chapter 2 describes how to compute the continuous Binomial distribution which can be derived by the continuous Bernoulli.
Category: Statistics

[288] viXra:2012.0088 [pdf] submitted on 2020-12-12 09:51:59

Continuous Bernoulli Distribution-Simulator and Test Statistic

Authors: Kuan-Sian Wang, Mei-Yu Lee
Comments: Pages.

We discussed the simulator and test statistic of continuous Bernoulli distribution which is important to test the pervasive error of variational autoencoders in deep learning. We provided the sufficient statistic, the point estimator, the confidence interval, test statistic, goodness of fit, and one-way test for continuous Bernoulli distribution. Besides, continuous binomial distribution can be derived, so the the confidence interval and the test can be worked under two continuous Bernoulli populations. Continuous trinomial distribution can also be find. Please download the computer software of this book from https://github.com/meiyulee/continuous_Bernoulli
Category: Statistics

[287] viXra:2012.0044 [pdf] submitted on 2020-12-07 13:36:14

On a Linnik Theorem in Theory of Errors

Authors: Abdelmajid Ben Hadj Salem
Comments: 7 Pages. In French.

In this note, we give a proof of a theorem of Linnik concerning the theory of errors, stated in his book "Least squares method and the mathematical bases of the statistical theory of the treatment of observations", without proof.
Category: Statistics

[286] viXra:2012.0038 [pdf] submitted on 2020-12-06 14:50:31

Automatic Emulator and Optimized Look-up Table Generation for Radiative Transfer Models

Authors: L. Martino, J. Vicent, G. Camps-Valls
Comments: 5 Pages.

This paper introduces an automatic methodology to construct emulators for costly radiative transfer models (RTMs). The proposed method is sequential and adaptive, and it is based on the notion of the acquisition function by which instead of optimizing the unknown RTM underlying function we propose to achieve accurate approximations. The Automatic Gaussian Process Emulator (AGAPE) methodology combines the interpolation capabilities of Gaussian processes (GPs) with the accurate design of an acquisition function that favors sampling in low density regions and flatness of the interpolation function. We illustrate the good capabilities of the method in toy examples and for the construction of an optimal look-up-table for atmospheric correction based on MODTRAN5.
Category: Statistics

[285] viXra:2012.0037 [pdf] submitted on 2020-12-06 19:04:22

Adaptive Sequential Interpolator Using Active Learning for Efficient Emulation of Complex Systems

Authors: L.Martino, D. Heestermans Svendsen, J. Vicent, G. Camps-Valls
Comments: 5 Pages.

Many fields of science and engineering require the use of complex and computationally expensive models to understand the involved processes in the system of interest. Nevertheless, due to the high cost involved, the required study becomes a cumbersome process. This paper introduces an interpolation procedure which belongs to the family of active learning algorithms, in order to construct cheap surrogate models of such costly complex systems. The proposed technique is sequential and adaptive, and is based on the optimization of a suitable acquisition function. We illustrate its efficiency in a toy example and for the construction of an emulator of an atmosphere modeling system.
Category: Statistics

[284] viXra:2012.0036 [pdf] submitted on 2020-12-06 19:06:41

Particle Group Metropolis Methods for Tracking the Leaf Area Index

Authors: L. Martino, V. Elvira, G. Camps-Valls
Comments: 5 Pages.

Monte Carlo (MC) algorithms are widely used for Bayesian inference in statistics, signal processing, and machine learning. In this work, we introduce an Markov Chain Monte Carlo (MCMC) technique driven by a particle filter. The resulting scheme is a generalization of the so-called Particle Metropolis-Hastings (PMH) method, where a suitable Markov chain of sets of weighted samples is generated. We also introduce a marginal version for the goal of jointly inferring dynamic and static variables. The proposed algorithms outperform the corresponding standard PMH schemes, as shown by numerical experiments.
Category: Statistics

[283] viXra:2012.0035 [pdf] submitted on 2020-12-06 15:16:02

Group Metropolis Sampling

Authors: L. Martino, V. Elvira, G. Camps-Valls
Comments: 5 Pages.

Monte Carlo (MC) methods are widely used for Bayesian inference and optimization in statistics, signal processing and machine learning. Two well-known class of MC methods are the Importance Sampling (IS) techniques and the Markov Chain Monte Carlo (MCMC) algorithms. In this work, we introduce the Group Importance Sampling (GIS) framework where different sets of weighted samples are properly summarized with one summary particle and one summary weight. GIS facilitates the design of novel efficient MC techniques. For instance, we present the Group Metropolis Sampling (GMS) algorithm which produces a Markov chain of sets of weighted samples. GMS in general outperforms other multiple try schemes as shown by means of numerical simulations.
Category: Statistics

[282] viXra:2012.0034 [pdf] submitted on 2020-12-05 11:18:45

Joint Gaussian Processes for Inverse Modeling

Authors: D. Heestermans Svendsen, L. Martino, M. Campos-Taberner, G. Camps-Valls
Comments: 5 Pages.

Solving inverse problems is central in geosciences and remote sensing. Very often a mechanistic physical model of the system exists that solves the forward problem. Inverting the implied radiative transfer model (RTM) equations numerically implies, however, challenging and computationally demanding problems. Statistical models tackle the inverse problem and predict the biophysical parameter of interest from radiance data, exploiting either in situ data or simulated data from an RTM. We introduce a novel nonlinear and nonparametric statistical inversion model which incorporates both real observations and RTM-simulated data. The proposed Joint Gaussian Process (JGP) provides a solid framework for exploiting the regularities between the two types of data, in order to perform inverse modeling. Advantages of the JGP method over competing strategies are shown on both a simple toy example and in leaf area index (LAI) retrieval from Landsat data combined with simulated data generated by the PROSAIL model.
Category: Statistics

[281] viXra:2012.0033 [pdf] submitted on 2020-12-05 11:25:51

Distributed Particle Metropolis-Hastings Schemes

Authors: L. Martino, V. Elvira, G. Camps-Valls
Comments: 5 Pages.

We introduce a Particle Metropolis-Hastings algorithm driven by several parallel particle filters. The communication with the central node requires the transmission of only a set of weighted samples, one per filter. Furthermore, the marginal version of the previous scheme, called Distributed Particle Marginal Metropolis-Hastings (DPMMH) method, is also presented. DPMMH can be used for making inference on both a dynamical and static variable of interest. The ergodicity is guaranteed, and numerical simulations show the advantages of the novel schemes.
Category: Statistics

[280] viXra:2012.0032 [pdf] submitted on 2020-12-05 22:19:11

Probabilistic Cross-Validation Estimators for Gaussian Process Regression

Authors: L. Martino, V. Laparra, G. Camps-Valls
Comments: 5 Pages.

Gaussian Processes (GPs) are state-of-the-art tools for regression. Inference of GP hyperparameters is typically done by maximizing the marginal log-likelihood (ML). If the data truly follows the GP model, using the ML approach is optimal and computationally efficient. Unfortunately very often this is not case and suboptimal results are obtained in terms of prediction error. Alternative procedures such as cross-validation (CV) schemes are often employed instead, but they usually incur in high computational costs. We propose a probabilistic version of CV (PCV) based on two different model pieces in order to reduce the dependence on a specific model choice. PCV presents the benefits from both approaches, and allows us to find the solution for either the maximum a posteriori (MAP) or the Minimum Mean Square Error (MMSE) estimators. Experiments in controlled situations reveal that the PCV solution outperforms ML for both estimators, and that PCV-MMSE results outperforms other traditional approaches.
Category: Statistics

[279] viXra:2012.0031 [pdf] submitted on 2020-12-05 22:21:01

Recycling Gibbs Sampling

Authors: L. Martino, V. Elvira, G. Camps-Valls
Comments: 5 Pages.

Gibbs sampling is a well-known Markov chain Monte Carlo (MCMC) algorithm, extensively used in signal processing, machine learning and statistics. The key point for the successful application of the Gibbs sampler is the ability to draw samples from the full-conditional probability density functions efficiently. In the general case this is not possible, so in order to speed up the convergence of the chain, it is required to generate auxiliary samples. However, such intermediate information is finally disregarded. In this work, we show that these auxiliary samples can be recycled within the Gibbs estimators, improving their efficiency with no extra cost. Theoretical and exhaustive numerical comparisons show the validity of the approach.
Category: Statistics

[278] viXra:2012.0030 [pdf] submitted on 2020-12-05 22:23:48

Multioutput Automatic Emulator for Radiative Transfer Models

Authors: D. Heestermans Svendsen, L. Martino, J. Vicent, G. Camps-Valls
Comments: 4 Pages.

This paper introduces a methodology to construct emulators of costly radiative transfer models (RTMs). The proposed methodology is sequential and adaptive, and it is based on the notion of acquisition functions in Bayesian optimization. Here, instead of optimizing the unknown underlying RTM function, one aims to achieve accurate approximations. The Automatic Multi-Output Gaussian Process Emulator (AMOGAPE) methodology combines the interpolation capabilities of Gaussian processes (GPs) with the accurate design of an acquisition function that favors sampling in low density regions and flatness of the interpolation function. We illustrate the promising capabilities of the method for the construction of an emulator for a standard leaf-canopy RTM.
Category: Statistics

[277] viXra:2011.0183 [pdf] submitted on 2020-11-26 11:08:04

Statistical Analysis of the Presidential Elections in Belarus in 2020

Authors: Sergey L. Cherkas
Comments: 4 Pages. Both English and Russian versions of the paper

A Monte Carlo simulation is performed to calculate а probability of the difference between the average value in some random sample and the average over the total set. Method of analysis of the nature of the peculiarities in the probability distribution functions is suggested. The method consists in comparison of the probability distribution functions for the percentage and the number of voters for Mr. Lukashenko in each polling station.
Category: Statistics

[276] viXra:2011.0015 [pdf] submitted on 2020-11-02 21:38:17

Probability and Stochastic Analysis in Reproducing Kernels and Division by Zero Calculus

Authors: Tsutomu Matsuura, Hiroshi Okumura, Saburou Saitoh
Comments: 21 Pages.

Professor Rolin Zhang kindly invited in The 6th Int'l Conference on Probability and Stochastic Analysis (ICPSA 2021), January 5-7, 2021 in Sanya, China as a Keynote speaker and so, we will state the basic interrelations with reproducing kernels and division by zero from the viewpoint of the conference topics. The connection with reproducing kernels and Probability and Stochastic Analysis are already fundamental and well-known, and so, we will mainly refer to the basic relations with our new division by zero $1/0=0/0=z/0=\tan(\pi/2) =\log 0 =0, [(z^n)/n]_{n=0} = \log z$, $[e^{(1/z)}]_{z=0} = 1$. 
Category: Statistics

[275] viXra:2010.0257 [pdf] submitted on 2020-10-31 19:46:07

Hidden Markov Model Evaluation from First Principles

Authors: Russell Leidich
Comments: 9 Pages. [Corrections made by viXra Admin to conform with the requirements on the Submission Form]

Hidden Markov models (HMMs) are a class of generative stochastic process models which seek to explain, in the simplest possible terms subject to inherent structural constraints, a set of equally long sequences (time series) of observations. Given such a set, an HMM can be trivially constructed which will reproduce the set exactly. Such an approach, however, would amount to verfitting the data, yielding a model that fails to generalize to new observations of the same physical system under analysis. It’s therefore important to consider the information cost (entropy) of describing the HMM itself – not just the entropy of reproducing the observations, which would be zero in the foregoing extreme case, but in general would be the negative log of the probability of such reproduction occurring by chance. The sum of these entropies would then be suitable for the purpose of ranking a set of candidate HMMs by their respective likelihoods of having actually ​ generated the observations in the first place. To the author’s knowledge, however, no approach has yet been derived for the purpose of measuring HMM entropy from first principles, which is the subject of this paper, notwithstanding the popular use of the Bayesian information criterion (BIC) for this purpose.
Category: Statistics

[274] viXra:2010.0002 [pdf] submitted on 2020-10-01 10:42:20

Random Walks Are Not So Random, After All

Authors: Arturo Tozzi
Comments: 9 Pages.

Physical and biological phenomena are often portrayed in terms of random walks, white noise, Markov paths, stochastic trajectories with subsequent symmetry breaks. Here we show that this approach from dynamical systems theory is not profitable when random walks occur in phase spaces of dimensions higher than two. The more the dimensions, the more the (seemingly) stochastic paths are constrained, because their trajectories cannot resume to the starting point. This means that high-dimensional tracks, ubiquitous in real world physical/biological phenomena, cannot be operationally treated in terms of closed paths, symplectic manifolds, Betti numbers, Jordan theorem, topological vortexes. This also means that memoryless events disconnected from the past such as Markov chains cannot exist in high dimensions. Once expunged the operational role of random walks in the assessment of experimental phenomena, we take aim to somewhat “redeem” stochasticity. We suggest two methodological accounts alternative to random walks that partially rescue the operational role of white noise and Markov chains. The first option is to assess multidimensional systems in lower dimensions, the second option is to establish a different role for random walks. We diffusely describe the two alternatives and provide heterogeneous examples from boosting chemistry, tunneling nanotubes, backward entropy, chaotic attractors.
Category: Statistics

[273] viXra:2009.0135 [pdf] submitted on 2020-09-19 11:03:03

Joint Introduction to Gaussian Processes and Relevance Vector Machines with Connections to Kalman Ltering and Other Kernel Smoothers

Authors: L. Martino, J. Read
Comments: 50 Pages.

The expressive power of Bayesian kernel-based methods has led them to become an important tool across many different facets of artificial intelligence, and useful to a plethora of modern application domains, providing both power and interpretability via uncertainty analysis. This article introduces and discusses two methods which straddle the areas of probabilistic Bayesian schemes and kernel methods for regression: Gaussian Processes and Relevance Vector Machines. Our focus is on developing a common framework with which to view these methods, via intermediate methods a probabilistic version of the well-known kernel ridge regression, and drawing connections among them, via dual formulations, and discussion of their application in the context of major tasks: regression, smoothing, interpolation, and filtering. Overall, we provide understanding of the mathematical concepts behind these models, and we summarize and discuss in depth different interpretations and highlight the relationship to other methods, such as linear kernel smoothers, Kalman filtering and Fourier approximations. Throughout, we provide numerous figures to promote understanding, and we make numerous recommendations to practitioners. Benefits and drawbacks of the different techniques are highlighted. To our knowledge, this is the most in-depth study of its kind to date focused on these two methods, and will be relevant to theoretical understanding and practitioners throughout the domains of data-science, signal processing, machine learning, and artificial intelligence in general.
Category: Statistics

[272] viXra:2009.0082 [pdf] submitted on 2020-09-12 12:49:17

Importance of Statistical Tools and Methods in Data Science

Authors: Krish Bajaj
Comments: 10 Pages.

This paper aims to highlight the prominent position of statistics as a foundational pillar for descriptive and inferential statistical analysis to deduce underlying patterns in a population by looking at a sample drawn from the population. It focusses on the intuitive aspects of the statistical tools and its relevance and applicability .The paper concludes by highlighting some common misconceptions and misuse of statistics.
Category: Statistics

[271] viXra:2008.0131 [pdf] submitted on 2020-08-18 20:30:23

Combining Radar, Weather, and Optical Measurements to Model the Dependence of Baseball Lift on Spin and Surface Roughness

Authors: Glenn Healey, Lequan Wang
Comments: 24 Pages.

An accurate model for the lift force on a baseball is important for several applications. The precision of previous models has been limited by the use of small samples of measurements acquired in controlled experiments. The increased prevalence of ball-tracking radar systems provides an abundant source of data for modeling, but the effective use of these data requires overcoming several challenges. We develop a new model that uses this radar data and is constrained by the physical principles and measurements derived from the controlled experiments. The modeling process accounts for the uncertainty in different data sources while exploiting the size and diversity of the radar measurements to mitigate the effects of systematic biases, outliers, and the lack of geometric information that is typically available in controlled experiments. Fine-grained weather data is associated with each radar measurement to enable compensation for the local air density. We show that the new model is accurate enough to capture changes in lift due to small changes in surface roughness which could not be discerned by previous models.
Category: Statistics

[270] viXra:2008.0107 [pdf] submitted on 2020-08-15 11:37:29

Application of Markov Chain Model in Completion Rates

Authors: Idd Sifael Omary, Ngong-homa Jackson, Timothy A. Peter
Comments: 35 Pages. BSc. (Mathematics and Statistics) Research Report Mwenge Catholic University July, 2016.

Completion Rate and Enrollment forecasting is an essential element in budgeting, resource allocation, and the overall planning for the growth of education sector. Our paper purposeful demonstrated the use of Markov chain techniques in studying progression of BSMST Programme Students from the time of entry/enrollment in each academic year to graduation after the expected year of study in MWECAU. The target population included all BSMST programme students in MWECAU from 2013 to 2015. The model used to determine the student’s completion/dropout rate, retention rate and the expected duration of completing by sex. We established the completion rates for male students and that of female students and dropout rates. We saw how long Markov Transition Probability Matrices of BSMST students at MWECAU will be at a steady state. How the established completion and dropout rates will be in Absorbing rates/States. Also we saw female expectation of university education compared to male students in BSMST Programme students. The model was only suitable in make a short period projections.
Category: Statistics

[269] viXra:2008.0065 [pdf] submitted on 2020-08-10 16:54:00

La Théorie des Erreurs (Theory of Errors)

Authors: Philippe Hottier, Abdelmajid Ben Hadj Salem
Comments: 137 Pages. In French. Comments welcome.

It is a digital version of a manuscript of a course about the theory of errors given by the Engineer-in-Chief Philippe Hottier at the '80s, at the French National School of Geographic Sciences. The course gives the foundation of the method of the least squares for the case of linear models.
Category: Statistics

[268] viXra:2007.0240 [pdf] submitted on 2020-07-30 21:01:00

An Alternative Model of Probability Theory

Authors: D Williams
Comments: 15 Pages.

An alternative model of probability theory is give and compared with the standard version. Difficulties in extending the Central Limit Theorem for sums of random variables (rather than averages) are shown then resolved using the new model and dx-less integrals. Some new types of sample means are proposed and tested against the standard version.
Category: Statistics

[267] viXra:2006.0023 [pdf] submitted on 2020-06-03 09:40:46

Causal Inference for COVID-19 Interventions

Authors: Vikas Ramachandra
Comments: 14 Pages.

The exponential spread of the COVID-19 pandemic has caused countries to impose drastic measures on the public including social distancing, movement restrictions and lockdowns. These government interventions have led to different mobility patterns for the populations. We propose a method of causal inference using community mobility datasets to determine the treatment effects of government interventions on population mobility related outcomes. We first identify the changepoint based on the data of government interventions. We also perform changepoint detection to verify that there is indeed a changepoint at the time of intervention. Then we estimate the mobility trends using a Bayesian structural causal model and project the counterfactual. This is compared to the actual values after interventions to give the treatment effect of interventions. As a specific example, we analyze mobility trends in India before and after interventions. Our analysis shows that there are significant changes in mobility due to government interventions. Our paper aims to provide insights into changes in response to government measures and we hope that it is helpful to those making critical decisions to combat COVID-19.
Category: Statistics

[266] viXra:2006.0014 [pdf] submitted on 2020-06-01 12:18:06

Conditio Sine Qua Non

Authors: Ilija Barukčić
Comments: 27 pages. (C) Ilija Barukčić, 2020, Jever, Germany. All rights reserved.

Aims: Different processes or events which are objectively given and real are equally one of the foundations of human life (necessary conditions) too. However, a generally accepted, logically consistent (bio)-mathematical description of these natural processes is still not in sight. Methods: Discrete random variables are analysed. Results: The mathematical formula of the necessary condition is developed. The impact of study design on the results of a study is considered. Conclusion: Study data can be analysed for necessary conditions.
Category: Statistics

[265] viXra:2005.0215 [pdf] submitted on 2020-05-21 20:15:49

Introduction to Neutrosophic Statistics Translated Arabic Version مقدمة في الاحصاء النيوتروسوفكي

Authors: Huda E. Khalid, Ahmed K. Essa
Comments: 167 Pages. ISBN: 978-1-59973-906-9

على الرغم من أن الإحصاء النيوتروسوفكي قد تم تعريفه منذ العام 1996 ، ثم نشر في عام 1998 بالكتاب المعنون " النيوتروسوفيا/ المنطق، المجموعة والاحتمالية النيوتروسوفكية" إلاّ انه لم ينل حظاً من الاهتمام والتطور إلى يومنا هذا. وكذلك كان الحال مع الاحتمالية النيوتروسوفكية، باستثناء بعض المقالات المتفرقة التي حظيت بتطور بسيط لا يكاد يرتقي لشمولية الفكرة التي تقوم عليها ، وقد نشرت عام 2013 ضمن الكتاب المعنون " مقدمة في القياس، التكامل والاحتمالية النيوتروسوفكية". يعد الإحصاء النيوتروسوفكي مفهوماً موسعاً للإحصاء التقليدي (الكلاسيكي)، إذ يتم فيه التعامل مع قيم ذات مجموعات بدلاً عن قيم هشة ، بحيث يكون من السهل في اغلب المعادلات والصيغ الإحصائية التقليدية استبدال عدَّة أعداد بمجاميع . أي أن العمليات ستجري على المجاميع بدلاً من إجراء العمليات على الأعداد ، وسيتم ذلك باستخدام المعلمات غير المعينة (غير الدقيقة، التي فيها لاتأكيد ، وحتى تلك التي تكون مجهولة تماماً) بدلاً من استخدام المعلمات الطبيعية المتعارف عليها في الإحصاء التقليدي.
Category: Statistics

[264] viXra:2005.0182 [pdf] submitted on 2020-05-17 18:03:58

Estimated Life Expectancy Impact of Sars-Cov-2 Infection on the Entire German Population

Authors: Tobias Martens, Wieland Lühder
Comments: 3 Pages.

The life expectancy of the currently living German population is calculated per age and as weighted average. The same calculation is repeated after considering everyone infected with and potentially killed by SARS-CoV-2 within one year, given the current age-dependent lethality estimates from a study at London Imperial College [1]. For an average life expectancy of 83.0 years in the current population, the reduction due to SARS-CoV-2 infection amounts to 2.0 (1.1-3.9) months. The individual values show a maximum of 7.7 (4.4-15.2) months for a 70-year-old. People below age 50 loose less than 1 month in average.
Category: Statistics

[263] viXra:2004.0452 [pdf] submitted on 2020-04-19 11:42:28

Multiple Sclerosis is Caused by an Epstein Bar Virus Infection

Authors: Ilija Barukčić
Comments: 17 Pages. (C) Ilija Barukčić, 2019, Jever, Germany. All rights reserved.

Aim: The relationship between Epstein-Barr virus and multiple sclerosis is assessed once again in order to gain a better understanding of this disease. Methods: A systematic review and meta-analysis is provided aimed to answer among other the following question. Is there a cause effect relationship between Epstein-Barr virus and multiple sclerosis? The conditio sine qua non relationship proofed the hypothesis without an Epstein-Barr virus infection no multiple sclerosis. The mathematical formula of the causal relationship k proofed the hypothesis of a cause effect relationship between Epstein-Barr virus infection and multiple sclerosis. Significance was indicated by a p-value of less than 0.05. Results: The data of the studies analysed provide evidence that an Epstein-Barr virus infection is a necessary condition (a conditio sine qua non) of multiple sclerosis. In particular and more than that. The data of the studies analysed provided impressive evidence of a cause-effect relationship between Epstein-Barr virus infection and multiple sclerosis. Conclusion: Multiple sclerosis is caused by an Epstein-Barr virus infection.
Category: Statistics

[262] viXra:2004.0425 [pdf] submitted on 2020-04-17 13:15:53

Automatic Tempered Posterior Distributions for Inverse Problems

Authors: Luca Martino
Comments: 7 Pages.

We propose a new Monte Carlo technique for Bayesian inversion problem. The power of the noise perturbation in the observation model is also estimated jointly with the rest of parameters. Moreover, it is also used as a tempered parameter. Hence, a sequence of tempered posterior densities is considered where the tempered parameter is automatically selected according to the actual estimation of the power of the noise perturbation.
Category: Statistics

[261] viXra:2004.0060 [pdf] submitted on 2020-04-02 23:05:18

Study on the Average Speed of a Particle Swarm Derived from Particles with the Same Speed and Random Directions in Space

Authors: Tao Guo
Comments: 11 Pages.

It has been more than 100 years since the advent of special relativity, but the reasons behind the related phenomena are still unknown. This article aims to inspire people to think about such problems. With the help of Mathematica software, I have proven the following problem by means of statistics: In 3-dimensional Euclidean space, for point particles whose speeds are c and whose directions are uniformly distributed in space (assuming these particles’ reference system is R0 if their average velocity is 0), when some particles (assuming their reference system is Ru), as a particle swarm, move in a certain direction with a group speed u (i.e., the norm of the average velocity) relative to R0, their (or the sub-particle swarm’s) average speed relative to Ru is slower than that of particles (or the same scale sub-particle swarm) in R0 relative to R0. The degree of slowing depends on the speed u of Ru and accords with the quantitative c2 −u2 relationship described by the Lorentz factor.
Category: Statistics

[260] viXra:2003.0340 [pdf] submitted on 2020-03-16 13:55:18

Modeling and Projecting Offensive Value Using Combined Hit-Tracking and Speed Measurements

Authors: Glenn Healey
Comments: 20 Pages.

Outcome-based statistics for representing batter and pitcher skill have been shown to have a low degree of repeatability due to the effects of multiple confounding variables such as the defense, weather, and ballpark. Statistics derived from pitch and hit-tracking data acquired by the Statcast system have been shown to provide greater repeatability and predictive value than outcome-based statistics. The wOBA cube representation uses three-dimensional hit-tracking data to compute intrinsic batted ball statistics for batters and pitchers. While providing more reliable measures than outcome-based statistics, this representation also revealed that running speed is an important determinant of batter success. We address this issue by building a four-dimensional model for a batted ball's value as a function of its physical contact parameters and the batter's time-to-first speed.
Category: Statistics

[259] viXra:2002.0368 [pdf] submitted on 2020-02-19 13:31:17

The Risk Ratio is Logically Inconsistent

Authors: Ilija Barukčić
Comments: 9 Pages. (C) Ilija Barukčić, 2019, Jever, Germany. All rights reserved.

Many different measures of association are used by medical literature, the relative risk is one of these measures. However, to judge whether results of studies are reliable, it is essential to use among other measures of association which are logically consistent. In this paper, we will present how to deal with one of the most commonly used measures of association, the relative risk. The conclusion is inescapable that the relative risk is logically inconsistent and should not be used any longer.
Category: Statistics

[258] viXra:2001.0650 [pdf] submitted on 2020-01-29 12:50:56

An Example of The Use of The Least-Squares Method

Authors: Abdelmajid Ben Hadj Salem
Comments: 9 Pages. In French.

In this paper, we present an example of the use of the least-squares method in topographic and surveying works.
Category: Statistics

[257] viXra:2001.0052 [pdf] submitted on 2020-01-04 16:39:29

Marginal Likelihood Computation for Model Selection and Hypothesis Testing: an Extensive Review

Authors: F. Llorente, L. Martino, D. Delgado, J. Lopez-Santiago
Comments: 58 Pages.

This is an up-to-date introduction to, and overview of, marginal likelihood computation for model selection and hypothesis testing. Computing normalizing constants of probability models (or ratio of constants) is a fundamental issue in many applications in statistics, applied mathematics, signal processing and machine learning. This article provides a comprehensive study of the state-of-the-art of the topic. We highlight limitations, benets, connections and differences among the dierent techniques. Problems and possible solutions with the use of improper priors are also described. Some of the most relevant methodologies are compared through theoretical comparisons and numerical experiments.
Category: Statistics

[256] viXra:2001.0037 [pdf] submitted on 2020-01-03 14:40:30

Anomaly Detection for Cybersecurity: Time Series Forecasting and Deep Learning

Authors: Giordano Colò
Comments: 32 Pages.

Finding anomalies when dealing with a great amount of data creates issues related to the heterogeneity of different values and to the difficulty of modelling trend data during time. In this paper we combine the classical methods of time series analysis with deep learning techniques, with the aim to improve the forecast when facing time series with long-term dependencies. Starting with forecasting methods and comparing the expected values with the observed ones, we will find anomalies in time series. We apply this model to a bank cybersecurity case to find anomalous behavior related to branches applications usage.
Category: Statistics

[255] viXra:2001.0003 [pdf] submitted on 2020-01-01 06:38:26

Probability Models and Ultralogics

Authors: Robert A. Herrmann
Comments: 10 Pages.

In this paper, we show how nonstandard consequence operators, ultralogics, can generate the general informational content displayed by probability models. In particular, a model that states a specific probability that an event will occur and those models that use a specific distribution to predict that an event will occur. These results have many diverse applications and even apply to the collapse of the wave function.
Category: Statistics

[254] viXra:1912.0129 [pdf] submitted on 2019-12-06 12:00:41

Statins and Death Due to Any Cause – All Doubts Removed?

Authors: Ilija Barukčić
Comments: 29 Pages. (C) Ilija Barukčić, 2019, Jever, Germany. All rights reserved.

Objective: To date, it is quite common to claim that some patient groups benefit from statin therapy in both primary and secondary prevention of cardiovascular disease while equally the use of higher-intensity statin therapies is emphasized. In this Review, the efficacy of statin therapy in light of the study data available is explored. Methods: All in all, 40 studies with a sample size of n = 88388 were re-analyzed. The exclusion relationship was used to test the null-hypothesis: a certain statin does exclude death due to any cause. The causal relationship k was used to test the data for causality. The level of significance was set to Alpha = 0,05. Results: The data of the studies reanalyzed provide convincing evidence that statins unfortunately do not exclude death due to any cause. An immediate statin therapy discontinuation should be considered. Conclusions: Overwhelming evidence suggests that the risk potential harmful effects of statin therapy far outweigh any real or perceived benefit. Keywords: Statins, death, causal relationship. Barukcic@t-online.de
Category: Statistics

[253] viXra:1911.0237 [pdf] submitted on 2019-11-13 13:18:58

Human Cytomegalovirus is the Cause of Essential Hypertension

Authors: Ilija Barukčić
Comments: 26 Pages. (C) Ilija Barukčić, 2019, Jever, Germany. All rights reserved.

Objective: To our knowledge, no study has provided strict evidence of a clear relationship between a human cytomegalovirus (HCMV) infection and human essential hypertension (EH). Methods: To examine the possible role of HCMV in the etiology of EH, a literature searched through the electronic database PubMed was performed. Data were accurately assessed and re-analyzed by new statistical methods. Results: The meta-analysis results of this study provide evidence that HCMV infection and essential hypertension are connected. Conclusions: Without HCMV infection no EH. Keywords: Human cytomegalovirus, essential hypertension, causal relationship.
Category: Statistics

[252] viXra:1911.0184 [pdf] submitted on 2019-11-10 04:05:32

Without a Varicella Zoster Virus Infection, no Schizophrenia

Authors: Ilija Barukčić
Comments: 17 pages. Copyright © 2018 by Ilija Barukčić, Jever, Germany. Published by:

Objective: Despite decades of research and major efforts, a cause or the cause of schizophrenia is still not identified. Although many studies indicate that infectious agents are related to schizophrenia no definite consensus has been reached on this issue. Methods: The purpose of this study was to investigate relationship between varicella zoster virus (VZS) and schizophrenia while relying on new statistical methods. Results: The meta-analysis results provide striking evidence that VZV is a necessary condition of schizophrenia. Conclusions: There is some weak evidence that VZV infection is the cause of schizophrenia. Keywords: Varicella zoster virus, schizophrenia, causal relationship.
Category: Statistics

[251] viXra:1911.0024 [pdf] submitted on 2019-11-01 11:17:10

The P Value of Likely Extreme Events

Authors: Ilija Barukčić
Comments: 21 pages.

Objective: Sometimes there are circumstances where it is necessary to calculate the P Value of extremely events xt like p(xt) = 1 while reliable methods are rare. Methods: A systematic approach to the problem of the P Values of extremely events is provided. Results: New theorems for calculating P Values of extremely likely events are developed. Conclusions: It is possible to calculate the P Values even of extreme events. E-mail: Barukcic@t-online.de Keywords: P Value, likely events, cause, effect, causal relationship.
Category: Statistics

[250] viXra:1910.0656 [pdf] submitted on 2019-10-31 17:01:19

On Maximum Likelihood Estimates for the Shape Parameter of the Generalized Pareto Distribution

Authors: Kouider Mohammed Ridha
Comments: 6 Pages.

The general Pareto distribution (GPD) has been widely used a lot in the extreme value for example to model exceedance over a threshold. Feature of The GPD that when applied to real data sets depends substantially and clearly on the parameter estimation process. Mostly the estimation is preferred by maximum likelihood because have a consistent estimator with lowest bias and variance. The objective of the present study is to develop efficient estimation methods for the maximum likelihood estimator for the shape parameter or extreme value index. Which based on the numerical methods for maximizing the log-likelihood by introduce an algorithm for computing maximum likelihood estimate of The GPD parameters. Finally, a numerical examples are given to illustrate the obtained results, they are carried out to investigate the behavior of the method
Category: Statistics

[249] viXra:1910.0219 [pdf] submitted on 2019-10-13 10:03:32

Herpes Simplex Virus Type 1 is the Cause of Alzheimer’s Disease

Authors: Ilija Barukčić
Comments: 46 Pages.

Objective: The possible involvement of viruses, specifically Herpes simplex virus type 1 (HSV-1), in senile dementia of the Alzheimer type has been investigated by numerous publications. Over 120 publications are providing direct or indirect evidence of a potential relationship between Herpes simplex virus type 1 and Alzheimer’s disease (AD) but a causal relation is still not established yet. Methods: A systematic review and re-analysis of studies which investigated the relationship between HSV-1 and AD by HSV-1 immunoglobulin G (IgG) serology and polymerase chain reaction (PCR) methods was conducted. The method of the conditio sine qua non relationship (SINE) was used to proof the hypothesis: without HSV-1 infection of human brain no AD. The method of the conditio per quam relationship (IMP) was used to proof the hypothesis: if HSV-1 infection of human brain then AD. The mathematical formula of the causal relationship k was used to proof the hypotheses is, whether there is a cause-effect relationship between HSV-1 and AD. Significance was indicated by a p-value of less than 0.05. Results: The studies analyzed were able to provide strict evidence that HSV-1 is a necessary condition (a conditio sine qua non), a sufficient condition and a necessary and sufficient condition of AD. Furthermore, the cause-effect relationship between HSV-1 and AD was highly significant. Conclusions: The data analyzed provide sufficient evidence to conclude that HSV-1 is the cause of AD. Keywords: Herpes simplex virus type 1, Alzheimer’s disease, causal relationship.
Category: Statistics

[248] viXra:1909.0376 [pdf] submitted on 2019-09-17 06:56:34

A Fully Bayesian Solution to K-Sample Tests for Comparison and the Behrens-Fisher Problem Based on the Henstock-Kurzweil Integral

Authors: Fabrice J. P. R. Pautot
Comments: 17 Pages.

We present a simple, fully probabilistic, Bayesian solution to k-sample omnibus tests for comparison, with the Behrens-Fisher problem as a special case, which is free from the many defects found in the standard, classical frequentist, likelihoodist and Bayesian approaches to those problems. We solve the main measure-theoretic difficulty for degenerate problems with continuous parameters of interest and Lebesgue-negligible point null hypothesis by approximating the corresponding continuous random variables by sequences of discrete ones defined on partitions of the parameter spaces and by taking the limit of the prior-to-posterior ratios of the probability of the null hypothesis for the corresponding discrete problems. Those limits are well defined under proper technicalities thanks to the Henstock-Kurzweil integral that is as powerful as the Lebesgue integral but still relies on Riemann sums, which are essential in the present approach. The solutions to the relative continuous problems take the form of Bayes-Poincaré factors that are new objects in Bayesian probability theory and should play a key role in the general theory of point null hypothesis testing, including other important problems such as the Jeffreys-Lindley paradox.
Category: Statistics

[247] viXra:1908.0288 [pdf] submitted on 2019-08-15 11:43:49

Learning a Function Over Distributions

Authors: Glenn Healey, Shiyuan Zhao
Comments: 16 Pages.

We present a method for learning a function over distributions. The method is based on generalizing nonparametric kernel regression by using the earth mover's distance as a metric for distribution space. The technique is applied to the problem of learning the dependence of pitcher performance in baseball on multidimensional pitch distributions that are controlled by the pitcher. The distributions are derived from sensor measurements that capture the physical properties of each pitch. Finding this dependence allows the recovery of optimal pitch frequencies for individual pitchers. This application is amenable to the use of signatures to represent the distributions and a whitening step is employed to account for the correlations and variances of the pitch variables. Cross validation is used to optimize the kernel smoothing parameter. A set of experiments demonstrates that the method accurately predicts changes in pitcher performance in response to changes in pitch distribution.
Category: Statistics

[246] viXra:1907.0430 [pdf] submitted on 2019-07-24 05:53:26

Without Oxygen no Burning Candle

Authors: Ilija Barukčić
Comments: 24 pages. Copyright © 2019 by Ilija Barukčić, Jever, Germany. All rights reserved. Published by:

Objective. Under certain circumstances, the results of multiple investigations – particularly, rigorously-designed trials, can be summarized by systematic reviews and meta-analyses. However, the results of properly conducted meta-analyses can but need not be stronger than single investigations, if (publication) bias is not considered to a necessary extent. Methods. In assessing the significance of publication bias due to study design simple to handle statistical measures for quantifying publication bias are developed and discussed which can be used as a characteristic of a meta-analysis. In addition, these measures may permit comparisons of publication biases between different meta-analyses. Results. Various properties and the performance of the new measures of publication bias are studied and illustrated using simulations and clearly described thought experiments. As a result, individual studies can be reviewed with a higher degree of certainty. Conclusions. Publication bias due to study design is a serious problem in scientific research, which can affect the validity and generalization of conclusions. The index of unfairness and the index of independence are of use to quantify publication bias and to improve the quality of systematic reviews and meta-analyses. Keywords: study design, study type, measuring technique, publication bias
Category: Statistics

[245] viXra:1907.0077 [pdf] submitted on 2019-07-04 06:22:03

Expansions of Maximum and Minimum from Generalized Maxwell Distribution

Authors: Jianwen Huang, Xinling Liu, Jianjun Wang
Comments: 13 Pages.

Generalized Maxwell distribution is an extension of the classic Maxwell distribution. In this paper, we concentrate on the joint distributional asymptotics of normalized maxima and minima. Under optimal normalizing constants, asymptotic expansions of joint distribution and density for normalized partial maxima and minima are established. These expansions are used to educe speeds of convergence of joint distribution and density of normalized maxima and minima tending to its corresponding ultimate limits. Numerical analysis are provided to support our results.
Category: Statistics

[244] viXra:1906.0370 [pdf] submitted on 2019-06-19 09:29:49

Formulation of the Classical Probability and Some Probability Distributions Due to Neutrosophic Logic and Its Impact on Decision Making (Arabic Version))

Authors: Rafif Alhabib, Moustafa Mzher Ranna, Haitham Farah, A.A. Salama
Comments: 169 Pages.

والجوهر الأساسي لبحثنا هو تطبيق منطق النيتروسوفيك على جزء من نظرية الاحتمالات الكلاسيكية وذلك من خلال تقديم الاحتمال الكلاسيكي وبعض التوزيعات الاحتمالية وفق منطق النيتروسوفيك ومن ثم د ا رسة أثر استخدام هذا المنطق على عملية اتخاذ الق ا رر مع المقارنة المستمرة بين المنطق الكلاسيكي ومنطق النيتروسوفيك من خلال الد ا رسات والنتائج. تضم الأطروحة خمسة فصول
Category: Statistics

[243] viXra:1905.0538 [pdf] submitted on 2019-05-29 00:56:35

Atherosclerosis is an Infectious Disease

Authors: Ilija Barukčić
Comments: 16 pages. Copyright © 2019 by Ilija Barukčić, Jever, Germany. All rights reserved. Published by:

Aim Rheumatoid arthritis (RA) is associated with increased risk of coronary artery disease (CAD). Studies reported that anti-rheumatic drug usage is associated with decreased risk of CAD events in RA patients. This study was conducted to investigate the effect of some anti-inflammatory drugs (etanercept, leflunomide, etoricoxib) on the development of CAD events among patients with RA using anti-rheumatic drug in comparison with nonusers. Methods A systematic review of CAD events in RA patients was performed who used leflunomide, etanercept and etoricoxib and was compared with RA patients who don’t use these drugs. The exclusion relationship and the causal relationship k were used to test the significance of the result. A p-value of < 0.05 was treated as significant. Results Among RA patients, use of leflunomide (p (EXCL) =0,999022483; X2 (EXCL) = 0,06; k = -0,03888389; p-value ( k | HGD) =0,00037588), etanercept and etoricoxib was associated with significantly decreased incidence of CAD. The use leflunomide, etanercept and etoricoxib excludes cardiac events in RA patients. Conclusion The results of study provide further support for the infectious hypothesis of atherosclerosis. Key words: atherosclerosis, rheumatoid arthritis, therapy, causal relationship
Category: Statistics

[242] viXra:1905.0371 [pdf] submitted on 2019-05-19 06:42:36

Glyphosate and Non-Hodgkin Lymphoma: no Causal Relationship

Authors: Ilija Barukčić
Comments: 29 pages. Copyright © 2019 by Ilija Barukčić, Jever, Germany. All rights reserved. Published by:

Objective: Herbicides are used worldwide by both residential and agricultural users. Due to the statistical analysis of some epidemiologic studies the International Agency for Research on Cancer classified the broad-spectrum herbicide glyphosate (GS) in 2015, as potentially carcinogenic to humans especially with respect to non-Hodgkin lymphoma (NHL). In this systematic review and re-analysis, the relationship between glyphosate and NHL was re- investigated. Methods: A systematic review and re-analysis of studies which investigated the relationship between GS and NHL was conducted. The method of the conditio sine qua non relationship, the method of the conditio per quam relationship, the method of the exclusion relationship and the mathematical formula of the causal relationship k were used to proof the hypothesis. Significance was indicated by a p-value of less than 0.05. Results: The studies analyzed do not provide any direct and indirect evidence that NHL is caused GS. Conclusion: In this re-analysis, no causal relationship was apparent between glyphosate and NHL and its subtypes. Keywords: Glyphosate, Non-Hodgkin lymphoma, no causal relationship
Category: Statistics

[241] viXra:1905.0345 [pdf] submitted on 2019-05-18 15:08:29

Conjecture Sur Les Familles Exponentielles

Authors: Idriss olivier BADO
Comments: 5 Pages.

in this article we will establish some properties of random variables and then we will propose a conjecture related to the exponential family. This conjecture seems interesting to me. Our results are based on the consideration of continuous random variables $X_{i}$ defined on the same space $\Omega$ and the same super-extra density law of parameter $\theta_{i} $ and canonique function $T$ Let $n\in \mathbb{N}^{*}$ Considering the random variable $J$ and $I$ a subsect of $\{1,2,..n\}$ such that : $ X_{J}=\inf_{i\in I}(X_{i})$ we show that : $$\forall i\in I:\mathbb{P}( J=i)=\frac{\theta_{i}\prod_{j\in I}c(\theta_{j})}{\sum_{j\in I}\theta_{j}}\int_{T(\Omega)}e^{-x}dx$$. We conjecture that if the density of $ X_{i}$ is $ c(\theta_{i})e^{-\theta_{i}T(x)}\mathbf{1}_{\Omega}(x)$ Hence $\exists h,r$ two functions h such that $$ \forall i\in I:\mathbb{P}( J=i)=\frac{r(\theta_{i})\prod_{j\in I}h(\theta_{j})}{\sum_{j\in I}r(\theta_{j})}\int_{T(\Omega)}e^{-x}dx$$
Category: Statistics

[240] viXra:1905.0211 [pdf] submitted on 2019-05-14 14:10:00

Modelling Passive Forever Churn via Bayesian Survival Analysis

Authors: Gavin Steininger
Comments: 8 Pages.

This paper presents an approach to modelling passive forever churn (i.e., the probability that a user never returns to a game that does not require them to cancel it). The approach is based on parametric mixture models (Weibull, Gamma, and Log-normal) for return times. The model and data are inverted using Bayesian methods (MCMC and DIC) to get parameter estimates, uncertainties, as well as determine the return time distribution for retained users. The inversion scheme is tested on three groups of simulated data sets and one observed data set. The simulated data are generated with each of the parametric models. Each data set is censored to six time horizons, creating 18 data sets. All data sets are inverted with all three parametric models and the DIC is used to select the return time distribution. For all data sets the true return time distribution (i.e., the one that is used to simulate the data) has the best DIC value; for 16 inversions the true return time distribution is found to be significantly better than the other options. For the observed data set inversion, the scheme is able to accurately estimate the \% of users that did return (before the game transitioned into open beta) to given 14 days of observations.
Category: Statistics

[239] viXra:1904.0348 [pdf] submitted on 2019-04-17 07:39:46

Remark on Possible Use of Quadruple Neutrosophic Numbers for Realistic Modelling of Physical Systems

Authors: Victor Christianto, Florentin Smarandache
Comments: 6 Pages. This paper has been submitted to Axioms journal (MDPI). Comments are welcome

During mathematical modeling of real technical system we can meet any type and rate model uncertainty. Its reasons can be incognizance of modelers or data inaccuracy. So, classification of uncertainties, with respect to their sources, distinguishes between aleatory and epistemic ones. The aleatory uncertainty is an inherent data variation associated with the investigated system or its environment. Epistemic one is an uncertainty that is due to a lack of knowledge of quantities or processes of the system or the environment. In this short communication, we discuss quadruple neutrosophic numbers and their potential application for realistic modelling of physical systems, especially in the reliability assessment of engineering structures.
Category: Statistics

[238] viXra:1904.0333 [pdf] submitted on 2019-04-18 03:53:13

The Theorems of Rao--Blackwell and Lehmann--Scheffe, Revisited

Authors: Hazhir Homei
Comments: 5 Pages.

It has been stated in the literature that for finding uniformly minimum-variance unbiased estimator through the theorems of Rao-Blackwell and Lehmann-Scheffe, the sufficient statistic should be complete; otherwise the discussion and the way of finding uniformly minimum-variance unbiased estimator should be changed, since the sufficiency assumption in the Rao-Blackwell and Lehmann-Scheffe theorems limits its applicability. So, it seems that the sufficiency assumptions should be expressed in a way that the uniformly minimum-variance unbiased estimator be derivable via the Rao-Blackwell and Lehmann-Scheffe theorems.
Category: Statistics

Replacements of recent Submissions

[141] viXra:2310.0032 [pdf] replaced on 2024-02-06 21:09:42

Adaptive Posterior Distributions for Uncertainty Analysis of Covariance Matrices in Bayesian Inversion Problems for Multioutput Signals

Authors: E. Curbelo, L. Martino, F. Llorente, D. Delgado-Gomez
Comments: 28 Pages.

In this paper we address the problem of performing Bayesian inference for the parameters of a nonlinear multi-output model and the covariance matrix of the different output signals. We proposean adaptive importance sampling (AIS) scheme for multivariate Bayesian inversion problems, which is based in two main ideas: the variables of interest are split in two blocks and the inferencetakes advantage of known analytical optimization formulas. We estimate both the unknown parameters of the multivariate non-linear model and the covariance matrix of the noise. In the firstpart of the proposed inference scheme, a novel AIS technique called adaptive target AIS (ATAIS) is designed, which alternates iteratively between an IS technique over the parameters of the nonlinearmodel and a frequentist approach for the covariance matrix of the noise. In the second part of the proposed inference scheme, a prior density over the covariance matrix is considered and the cloud of samples obtained by ATAIS are recycled and re-weighted for obtaining a complete Bayesian study over the model parameters and covariance matrix. ATAIS is the main contribution of the work. Additionally, the inverted layered importance sampling (ILIS) is presented as a possible compelling algorithm (but based on a conceptually simpler idea). Different numerical examples show the benefits of the proposed approaches
Category: Statistics

[140] viXra:2209.0132 [pdf] replaced on 2023-06-14 09:56:05

Universal and Automatic Elbow Detection for Learning the Effective Number of Components in Model Selection Problems

Authors: E. Morgado, L. Martino, R. San Millán-Castillo
Comments: 25 Pages. (to appear) Digital Signal Processing, 2023

We design a Universal Automatic Elbow Detector (UAED) for deciding the effective number of components in model selection problems. The relationship with the information criteria widely employed in the literature is also discussed. The proposed UAED does not require the knowledge of a likelihood function and can be easily applied in diverse applications, such as regression and classification, feature and/or order selection, clustering, and dimension reduction. Several experiments involving synthetic and real data show the advantages of the proposed scheme with benchmark techniques in the literature.
Category: Statistics

[139] viXra:2209.0132 [pdf] replaced on 2023-06-06 15:06:55

Universal and Automatic Elbow Detection for Learning the Effective Number of Components in Model Selection Problems

Authors: E. Morgado, L. Martino, R. San Millán-Castillo
Comments: 25 Pages. (to appear) Digital Signal Processing, 2023.

We design a Universal Automatic Elbow Detector (UAED) for deciding the effective number of components in model selection problems. The relationship with the information criteria widely employed in the literature is also discussed. The proposed UAED does not require the knowledge of a likelihood function and can be easily applied in diverse applications, such as regression and classification, feature and/or order selection, clustering, and dimension reduction. Several experiments involving synthetic and real data show the advantages of the proposed scheme with benchmark techniques in the literature.
Category: Statistics

[138] viXra:2209.0132 [pdf] replaced on 2022-10-11 11:49:06

Universal and Automatic Elbow Detection for Learning the Effective Number of Components in Model Selection Problems

Authors: E. Morgado, L. Martino, R. San Millán-Castillo
Comments: 15 Pages.

We design a universal automatic elbow detector (UAED) for deciding effective number of components in model selection problems. The relationship with the information criteria widely employed in the literature is also discussed. The proposed UAED does not requirethe knowledge of a likelihood function and can be easily applied in diverse applications, such as regression and classification, feature and/or order selection, clustering, dimension reduction etc. Several experiments involving synthetic and real data show the advantages ofthe proposed scheme with benchmark techniques in the literature.
Category: Statistics

[137] viXra:2209.0132 [pdf] replaced on 2022-10-09 09:46:48

Universal and Automatic Elbow Detection for Learning the Effective Number of Components in Model Selection Problems

Authors: E. Morgado, L. Martino, R. San Millán-Castillo
Comments: 15 Pages.

We design a universal automatic elbow detector (UAED) for deciding effective number of components in model selection problems. The relationship with the information criteria widely employed in the literature is also discussed. The proposed UAED does not requirethe knowledge of a likelihood function and can be easily applied in diverse applications, such as regression and classification, feature and/or order selection, clustering, dimension reduction etc. Several experiments involving synthetic and real data show the advantages ofthe proposed scheme with benchmark techniques in the literature.
Category: Statistics

[136] viXra:2209.0123 [pdf] replaced on 2023-06-06 14:57:34

Spectral Information Criterion for Automatic Elbow Detection

Authors: L. Martino, R. San Millán-Castillo, E. Morgado
Comments: 22 Pages. (to appear) Expert Systems With Applications, 2023.

We introduce a generalized information criterion that contains other well-known information criteria, such as Bayesian information Criterion (BIC) and Akaike information criterion (AIC), as special cases. Furthermore, the proposed spectral information criterion(SIC) is also more general than the other information criteria, e.g., since the knowledge of a likelihood function is not strictly required. SIC extracts geometric features of the error curve and, as a consequence, it can be considered an automatic elbow detector. SIC provides a subset of all possible models, with a cardinality that often is much smaller than the total number of possible models. The elements of this subset are elbows" of the error curve. A practical rule for selecting a unique model within the sets of elbows is suggested as well. Theoretical invariance properties of SIC are analyzed. Moreover, we test SIC in ideal scenarios where provides always the optimal expected results. We also test SIC inseveral numerical experiments: some involving synthetic data, and two experiments involving real datasets. They are all real-world applications such as clustering, variable selection, or polynomial order selection, to name a few. The results show the benefits of the proposed scheme. Matlab code related to the experiments is also provided. Possible future research lines are finally discussed.
Category: Statistics

[135] viXra:2204.0074 [pdf] replaced on 2022-07-17 15:30:07

Matter Theory on EM field

Authors: Sheng-Ping Wu
Comments: 12 Pages.

This article try to unified the four basic forces by Maxwell equations, the only experimental theory. Self-consistent Maxwell equations with the e-current coming from matter current is proposed, and is solved to electrons and the structures of particles and atomic nucleus. The static properties and decay are reasoned, all meet experimental data. The equation of general relativity sheerly with electromagnetic field is discussed as the base of this theory. In the end the conformation elementarily between this theory and QED and weak theory is discussed.
Category: Statistics

[134] viXra:2201.0152 [pdf] replaced on 2022-02-13 20:15:49

Forensic Analysis of Lucy I and Lucy II

Authors: Robert Bennett
Comments: 6 Pages.

A quantitative test for the probability that two sets of photos are of the same woman. The result for 7 facial characteristics in each photo is that the odds are 13 million to 1 that Lucy I and Lucy II are the same person.
Category: Statistics

[133] viXra:2112.0158 [pdf] replaced on 2022-07-17 10:28:12

An Exhaustive Variable Selection Study for Linear Models of Soundscape Emotions: Rankings and Gibbs Analysis

Authors: R. San Millán-Castillo, L. Martino, E. Morgado, F. Llorente
Comments: 26 Pages. (to appear)) IEEE Transactions on Audio, Speech and Language Processing

In the last years, soundscapes have become one of the most active topics in Acoustics, providing a holistic approach to the acoustic environment, which involves human perception and context. Soundscapes-elicited emotions are central and substantially subtle and unnoticed (compared to speech or music). Currently, soundscape emotion recognition is a hot topic in the literature. We provide an exhaustive variable selection study (i.e., a selection of the soundscapes indicators) to a well-known dataset (emo-soundscapes).We consider linear soundscape emotion models for two soundscapes descriptors: arousal and valence. Several ranking schemes and procedures for selecting the number of variables are applied. We have also performed an alternating optimization scheme for obtaining the best sequences keeping fixed a certain number of features. Furthermore, we have designed a novel technique based on Gibbs sampling, which provides a more complete and clear view of the relevance of each variable. Finally, we have also compared our results with the analysis obtained by the classical methods based on p-values. As a result of our study, we suggest two simple and parsimonious linear models of only 7 and 16 variables (within the 122 possible features) for the two outputs (arousal and valence), respectively. The suggested linear models provide very good and competitive performance, with R2 > 0.86 and R2 > 0.63 (values obtained after a cross-validation procedure), respectively.
Category: Statistics

[132] viXra:2110.0032 [pdf] replaced on 2022-06-10 12:08:11

On the Safe Use of Prior Densities for Bayesian Model Selection

Authors: F. Llorente, L. Martino, E. Curbelo, J. Lopez-Santiago, D. Delgado
Comments: 38 Pages.

The application of Bayesian inference for the purpose of model selection is very popular nowadays. In this framework, models are compared through their marginal likelihoods, or their quotients, called Bayes factors. However, marginal likelihoods depends on the prior choice. For model selection, even di use priors can be actually very informative, unlike for the parameter estimation problem. Furthermore, when the prior is improper, the marginal likelihood of the corresponding model is undetermined. In this work, we discuss the issue of prior sensitivity of the marginal likelihood and its role in model selection. We also comment on the use of uninformative priors, which are very common choices in practice. Several practical suggestions are discussed and many possible solutions, proposed in the literature, to design objective priors for model selection are described. Some of them also allow the use of improper priors. The connection between the marginal likelihood approach and the well-known information criteria is also presented. We describe the main issues and possible solutions by illustrative numerical examples, providing also some related code. One of them involving a real-world application on exoplanet detection.
Category: Statistics

[131] viXra:2110.0032 [pdf] replaced on 2022-05-11 12:51:09

On the Safe Use of Prior Densities for Bayesian Model Selection

Authors: F. Llorente, L. Martino, E. Curbelo, J. Lopez-Santiago, D. Delgado
Comments: 38 Pages.

The application of Bayesian inference for the purpose of model selection is very popular nowadays. In this framework, models are compared through their marginal likelihoods, or their quotients, called Bayes factors. However, marginal likelihoods depends on the prior choice. For model selection, even di use priors can be actually very informative, unlike for the parameter estimation problem. Furthermore, when the prior is improper, the marginal likelihood of the corresponding model is undetermined. In this work, we discuss the issue of prior sensitivity of the marginal likelihood and its role in model selection. We also comment on the use of uninformative priors, which are very common choices in practice. Several practical suggestions are discussed and many possible solutions, proposed in the literature, to design objective priors for model selection are described. Some of them also allow the use of improper priors. The connection between the marginal likelihood approach and the well-known information criteria is also presented. We describe the main issues and possible solutions by illustrative numerical examples, providing also some related code. One of them involving a real-world application on exoplanet detection.
Category: Statistics

[130] viXra:2110.0032 [pdf] replaced on 2022-03-23 12:51:19

On the Safe Use of Prior Densities for Bayesian Model Selection

Authors: F. Llorente, L. Martino, E. Curbelo, J. Lopez-Santiago, D. Delgado
Comments: 34 Pages.

The application of Bayesian inference for the purpose of model selection is very popular nowadays. In this framework, models are compared through their marginal likelihoods, or their quotients, called Bayes factors. However, marginal likelihoods depends on the prior choice. For model selection, even di use priors can be actually very informative, unlike for the parameter estimation problem. Furthermore, when the prior is improper, the marginal likelihood of the corresponding model is undetermined. In this work, we discuss the issue of prior sensitivity of the marginal likelihood and its role in model selection. We also comment on the use of uninformative priors, which are very common choices in practice. Several practical suggestions are discussed and many possible solutions, proposed in the literature, to design objective priors for model selection are described. Some of them also allow the use of improper priors. The connection between the marginal likelihood approach and the well-known information criteria is also presented. We describe the main issues and possible solutions by illustrative numerical examples, providing also some related code. One of them involving a real-world application on exoplanet detection.
Category: Statistics

[129] viXra:2110.0032 [pdf] replaced on 2021-11-07 07:59:55

On the Safe Use of Prior Densities for Bayesian Model Selection

Authors: F. Llorente, L. Martino, E. Curbelo, J. Lopez-Santiago, D. Delgado
Comments: 25 Pages.

The application of Bayesian inference for the purpose of model selection is very popular nowadays. In this framework, models are compared through their marginallikelihoods, or their quotients, called Bayes factors. However, marginal likelihoods show strong dependence on the prior choice, even when the data are very informative, unlike the posterior distribution. Furthermore, when the prior is improper, the marginal likelihood of the corresponding model is undetermined. In this work, we aim to raise awareness about the issue of prior sensitivity of the marginal likelihood and its role in model selection. We also comment on the use of uninformative priors, which are very common choices in practice. Several practical suggestions are provided and possible solutions allowing the use of improper priors are discussed. The connection between the marginal likelihood approach and the well- known information criteria is also presented. We describe all the issues and possible solutions by illustrative numerical examples (providing some related code). One of them involving a real-world application on exoplanet detection.
Category: Statistics

[128] viXra:2109.0178 [pdf] replaced on 2022-01-13 03:48:54

Optimality in Noisy Importance Sampling

Authors: F. Llorente, L. Martino, J. Read, D. Delgado
Comments: 14 Pages. Signal Processing, Volume 194, 2022, 108455 - doi:10.1016/j.sigpro.2022.108455

Many applications in signal processing and machine learning require the study of probability density functions (pdfs) that can only be accessed through noisy evaluations. In this work, we analyze the noisy importance sampling (IS), i.e., IS working with noisy evaluations of the target density. We present the general framework and derive optimal proposal densities for noisy IS estimators. The optimal proposals incorporate the information of the variance of the noisy realizations, proposing points in regions where the noise power is higher. We also compare the use of the optimal proposals with previous optimality approaches considered in a noisy IS framework.
Category: Statistics

[127] viXra:2011.0183 [pdf] replaced on 2021-01-29 20:41:23

Statistical Analysis of the Presidential Elections in Belarus in 2020

Authors: Sergey L. Cherkas
Comments: 7 Pages.

Usually, one wants to have a simple picture of the trustworthiness of the main elections result. However, in some situations only partial information about the elections is available. Here we suggest some criterion of comparing of the available information with the official results. One of the criterions consists in comparison of the mean value over available sample with the official mean value. A Monte Carlo simulation is performed to calculate a probability of the difference between the average value in some random sample and the average over the total set. Another method is an analysis of the nature of the peculiarities in the probability distribution functions consisting in comparison of the probability distribution functions for the percentage and the number of voters for Mr. Lukashenko in each polling station. The last criterion is rather esthetic than exposing. It could be applied to arbitrary elections systems such as United Kingdom or United States if one wants to extract the main result in a few pictures.
Category: Statistics

[126] viXra:2011.0183 [pdf] replaced on 2020-12-02 10:52:49

Statistical Analysis of the Presidential Elections in Belarus in 2020

Authors: Sergey L. Cherkas
Comments: 4 Pages. Both English and Russian versions of the paper

A Monte Carlo simulation is performed to calculate а probability of the difference between the average value in some random sample and the average over the total set. Method of analysis of the nature of the peculiarities in the probability distribution functions is suggested. The method consists of a comparison of the probability distribution functions for the percentage and the number of voters for Mr. Lukashenko in each polling station.
Category: Statistics

[125] viXra:2011.0183 [pdf] replaced on 2020-12-01 11:48:00

Statistical Analysis of the Presidential Elections in Belarus in 2020

Authors: Sergey L. Cherkas
Comments: 4 Pages. Both English and Russian versions of the paper

A Monte Carlo simulation is performed to calculate а probability of the difference between the average value in some random sample and the average over the total set. Method of analysis of the nature of the peculiarities in the probability distribution functions is suggested. The method consists of a comparison of the probability distribution functions for the percentage and the number of voters for Mr. Lukashenko in each polling station.
Category: Statistics

[124] viXra:2009.0135 [pdf] replaced on 2021-07-11 15:23:07

A Joint Introduction to Gaussian Processes and Relevance Vector Machines with Connections to Kalman Filtering and Other Kernel Smoothers

Authors: L. Martino, J. Read
Comments: L. Martino, J. Read, "A Joint introduction to Gaussian Processes and Relevance Vector Machines with Connections to Kalman filtering and other Kernel Smoothers", Information Fusion, Volume 74, Pages 17-38, 2021

The expressive power of Bayesian kernel-based methods has led them to become an important tool across many different facets of artificial intelligence, and useful to a plethora of modern application domains, providing both power and interpretability via uncertainty analysis. This article introduces and discusses two methods which straddle the areas of probabilistic Bayesian schemes and kernel methods for regression: Gaussian Processes and Relevance Vector Machines. Our focus is on developing a common framework with which to view these methods, via intermediate methods a probabilistic version of the well-known kernel ridge regression, and drawing connections among them, via dual formulations, and discussion of their application in the context of major tasks: regression, smoothing, interpolation, and filtering. Overall, we provide understanding of the mathematical concepts behind these models, and we summarize and discuss in depth different interpretations and highlight the relationship to other methods, such as linear kernel smoothers, Kalman filtering and Fourier approximations. Throughout, we provide numerous figures to promote understanding, and we make numerous recommendations to practitioners. Benefits and drawbacks of the different techniques are highlighted. To our knowledge, this is the most in-depth study of its kind to date focused on these two methods, and will be relevant to theoretical understanding and practitioners throughout the domains of data-science, signal processing, machine learning, and artificial intelligence in general.
Category: Statistics

[123] viXra:2009.0135 [pdf] replaced on 2021-03-24 18:50:49

A Joint Introduction to Gaussian Processes and Relevance Vector Machines with Connections to Kalman Filtering and Other Kernel Smoothers

Authors: L. Martino, J. Read
Comments: 52 Pages. (to appear) Information Fusion

The expressive power of Bayesian kernel-based methods has led them to become an important tool across many different facets of artificial intelligence, and useful to a plethora of modern application domains, providing both power and interpretability via uncertainty analysis. This article introduces and discusses two methods which straddle the areas of probabilistic Bayesian schemes and kernel methods for regression: Gaussian Processes and Relevance Vector Machines. Our focus is on developing a common framework with which to view these methods, via intermediate methods a probabilistic version of the well-known kernel ridge regression, and drawing connections among them, via dual formulations, and discussion of their application in the context of major tasks: regression, smoothing, interpolation, and filtering. Overall, we provide understanding of the mathematical concepts behind these models, and we summarize and discuss in depth different interpretations and highlight the relationship to other methods, such as linear kernel smoothers, Kalman filtering and Fourier approximations. Throughout, we provide numerous figures to promote understanding, and we make numerous recommendations to practitioners. Benefits and drawbacks of the different techniques are highlighted. To our knowledge, this is the most in-depth study of its kind to date focused on these two methods, and will be relevant to theoretical understanding and practitioners throughout the domains of data-science, signal processing, machine learning, and artificial intelligence in general.
Category: Statistics

[122] viXra:2009.0135 [pdf] replaced on 2020-09-21 16:45:05

Joint Introduction to Gaussian Processes and Relevance Vector Machines with Connections to Kalman Filtering and Other Kernel Smoothers

Authors: L. Martino, J. Read
Comments: 50 Pages.

The expressive power of Bayesian kernel-based methods has led them to become an important tool across many different facets of artificial intelligence, and useful to a plethora of modern application domains, providing both power and interpretability via uncertainty analysis. This article introduces and discusses two methods which straddle the areas of probabilistic Bayesian schemes and kernel methods for regression: Gaussian Processes and Relevance Vector Machines. Our focus is on developing a common framework with which to view these methods, via intermediate methods a probabilistic version of the well-known kernel ridge regression, and drawing connections among them, via dual formulations, and discussion of their application in the context of major tasks: regression, smoothing, interpolation, and filtering. Overall, we provide understanding of the mathematical concepts behind these models, and we summarize and discuss in depth different interpretations and highlight the relationship to other methods, such as linear kernel smoothers, Kalman filtering and Fourier approximations. Throughout, we provide numerous figures to promote understanding, and we make numerous recommendations to practitioners. Benefits and drawbacks of the different techniques are highlighted. To our knowledge, this is the most in-depth study of its kind to date focused on these two methods, and will be relevant to theoretical understanding and practitioners throughout the domains of data-science, signal processing, machine learning, and artificial intelligence in general.
Category: Statistics

[121] viXra:2004.0425 [pdf] replaced on 2021-02-27 09:46:18

Automatic Tempered Posterior Distributions for Bayesian Inversion Problems

Authors: L. Martino, F. Llorente, E. Curbelo, J. Lopez-Santiago, J. Miguez
Comments: 18 Pages.

We propose a novel adaptive importance sampling scheme for Bayesian inversion problems where the inference of the variables of interest and the power of the data noise is split. More specifically, we consider a Bayesian analysis for the variables of interest (i.e., the parameters of the model to invert), whereas we employ a maximum likelihood approach for the estimation of the noise power. The whole technique is implemented by means of an iterative procedure, alternating sampling and optimization steps. Moreover, the noise power is also used as a tempered parameter for the posterior distribution of the the variables of interest. Therefore, a sequence of tempered posterior densities is generated, where the tempered parameter is automatically selected according to the actual estimation of the noise power. A complete Bayesian study over the model parameters and the scale parameter can be also performed. Numerical experiments show the benefits of the proposed approach.
Category: Statistics

[120] viXra:2004.0425 [pdf] replaced on 2020-09-06 08:24:05

Automatic Tempered Posterior Distributions for Bayesian Inversion Problems

Authors: L. Martino, J. Lopez-Santiago, J. Miguez
Comments: 17 Pages.

We propose a novel adaptive importance sampling scheme for Bayesian inversion problems where the inference of the variables of interest and the power of the data noise is split. More specifically, we consider a Bayesian analysis for the variables of interest (i.e., the parameters of the model to invert), whereas we employ a maximum likelihood approach for the estimation of the noise power. The whole technique is implemented by means of an iterative procedure, alternating sampling and optimization steps. Moreover, the noise power is also used as a tempered parameter for the posterior distribution of the the variables of interest. Therefore, a sequence of tempered posterior densities is generated, where the tempered parameter is automatically selected according to the actual estimation of the noise power. Numerical experiments show the benefits of the proposed approach.
Category: Statistics

[119] viXra:2004.0425 [pdf] replaced on 2020-09-03 11:41:36

Automatic Tempered Posterior Distributions for Bayesian Inversion Problems

Authors: L. Martino, J. Lopez-Santiago, J. Miguez
Comments: 17 Pages.

We propose a novel adaptive importance sampling scheme for Bayesian inversion problems where the inference of the variables of interest and the power of the data noise is split. More specifically, we consider a Bayesian analysis for the variables of interest (i.e., the parameters of the model to invert), whereas we employ a maximum likelihood approach for the estimation of the noise power. The whole technique is implemented by means of an iterative procedure, alternating sampling and optimization steps. Moreover, the noise power is also used as a tempered parameter for the posterior distribution of the the variables of interest. Therefore, a sequence of tempered posterior densities is generated, where the tempered parameter is automatically selected according to the actual estimation of the noise power. Numerical experiments show the benefits of the proposed approach.
Category: Statistics

[118] viXra:2004.0060 [pdf] replaced on 2020-04-22 09:01:21

Study on the Average Speed of a Particle Swarm Derived from Particles with the Same Speed and Random Directions in Space

Authors: Tao Guo
Comments: 14 Pages.

It has been more than 100 years since the advent of special relativity, but the reasons behind the related phenomena are still unknown. This article aims to inspire people to think about such problems. With the help of Mathematica software, I have proven the following problem by means of statistics: In 3-dimensional Euclidean space, for point particles whose speeds are c and whose directions are uniformly distributed in space (assuming these particles’ reference system is R0 if their average velocity is 0), when some particles (assuming their reference system is Ru), as a particle swarm, move in a certain direction with a group speed u (i.e., the norm of the average velocity) relative to R0, their (or the sub-particle swarm’s) average speed relative to Ru is slower than that of particles (or the same scale sub-particle swarm) in R0 relative to R0. The degree of slowing depends on the speed u of Ru and accords with the quantitative c2 −u2 relationship described by the Lorentz factor.
Category: Statistics

[117] viXra:2004.0060 [pdf] replaced on 2020-04-14 08:41:36

Study on the Average Speed of a Particle Swarm Derived from Particles with the Same Speed and Random Directions in Space

Authors: Tao Guo
Comments: 14 Pages.

It has been more than 100 years since the advent of special relativity, but the reasons behind the related phenomena are still unknown. This article aims to inspire people to think about such problems. With the help of Mathematica software, I have proven the following problem by means of statistics: In 3-dimensional Euclidean space, for point particles whose speeds are c and whose directions are uniformly distributed in space (assuming these particles’ reference system is R0 if their average velocity is 0), when some particles (assuming their reference system is Ru), as a particle swarm, move in a certain direction with a group speed u (i.e., the norm of the average velocity) relative to R0, their (or the sub-particle swarm’s) average speed relative to Ru is slower than that of particles (or the same scale sub-particle swarm) in R0 relative to R0. The degree of slowing depends on the speed u of Ru and accords with the quantitative c2 −u2 relationship described by the Lorentz factor.
Category: Statistics

[116] viXra:2002.0368 [pdf] replaced on 2020-02-29 11:03:38

The Realtive Risk Is Logically Inconsistent

Authors: Ilija Barukčić
Comments: 10 Pages. (C) Ilija Barukčić, 2019, Jever, Germany. All rights reserved.

Many different measures of association are used by medical literature, the relative risk is one of these measures. However, to judge whether results of studies are reliable, it is essential to use among other measures of association which are logically consistent. In this paper, we will present how to deal with one of the most commonly used measures of association, the relative risk. The conclusion is inescapable that the relative risk is logically inconsistent and should not be used any longer.
Category: Statistics

[115] viXra:2001.0052 [pdf] replaced on 2021-02-06 13:32:52

Marginal Likelihood Computation for Model Selection and Hypothesis Testing: an Extensive Review

Authors: F. Llorente, L. Martino, D. Delgado, J. Lopez-Santiago
Comments: 91 Pages.

This is an up-to-date introduction to, and overview of, marginal likelihood computation for model selection and hypothesis testing. Computing normalizing constants of probability models (or ratio of constants) is a fundamental issue in many applications in statistics, applied mathematics, signal processing and machine learning. This article provides a comprehensive study of the state-of-the-art of the topic. We highlight limitations, benefits, connections and differences among the different techniques. Problems and possible solutions with the use of improper priors are also described. Some of the most relevant methodologies are compared through theoretical comparisons and numerical experiments.
Category: Statistics

[114] viXra:2001.0052 [pdf] replaced on 2020-05-18 05:13:39

Marginal Likelihood Computation for Model Selection and Hypothesis Testing: an Extensive Review

Authors: F. Llorente, L. Martino, D. Delgado, J. Lopez-Santiago
Comments: 58 Pages.

This is an up-to-date introduction to, and overview of, marginal likelihood computation for model selection and hypothesis testing. Computing normalizing constants of probability models (or ratio of constants) is a fundamental issue in many applications in statistics, applied mathematics, signal processing and machine learning. This article provides a comprehensive study of the state-of-the-art of the topic. We highlight limitations, benefits, connections and differences among the different techniques. Problems and possible solutions with the use of improper priors are also described. Some of the most relevant methodologies are compared through theoretical comparisons and numerical experiments.
Category: Statistics

[113] viXra:2001.0052 [pdf] replaced on 2020-05-15 16:58:59

Marginal Likelihood Computation for Model Selection and Hypothesis Testing: an Extensive Review

Authors: F. Llorente, L. Martino, D. Delgado, J. Lopez-Santiago
Comments: 58 Pages.

This is an up-to-date introduction to, and overview of, marginal likelihood computation for model selection and hypothesis testing. Computing normalizing constants of probability models (or ratio of constants) is a fundamental issue in many applications in statistics, applied mathematics, signal processing and machine learning. This article provides a comprehensive study of the state-of-the-art of the topic. We highlight limitations, benefits, connections and differences among the different techniques. Problems and possible solutions with the use of improper priors are also described. Some of the most relevant methodologies are compared through theoretical comparisons and numerical experiments.
Category: Statistics

[112] viXra:1910.0219 [pdf] replaced on 2019-10-15 00:13:36

Herpes Simplex Virus Type 1 is the Cause of Alzheimer’s Disease

Authors: Ilija Barukčić
Comments: 35 Pages.

Objective: The possible involvement of viruses, specifically Herpes simplex virus type 1 (HSV-1), in senile dementia of the Alzheimer type has been investigated by numerous publications. Over 120 publications are providing direct or indirect evidence of a potential relationship between Herpes simplex virus type 1 and Alzheimer’s disease (AD) but a causal relation is still not established yet. Methods: A systematic review and re-analysis of studies which investigated the relationship between HSV-1 and AD by HSV-1 immunoglobulin G (IgG) serology and polymerase chain reaction (PCR) methods was conducted. The method of the conditio sine qua non relationship (SINE) was used to proof the hypothesis: without HSV-1 infection of human brain no AD. The method of the conditio per quam relationship (IMP) was used to proof the hypothesis: if HSV-1 infection of human brain then AD. The mathematical formula of the causal relationship k was used to proof the hypotheses is, whether there is a cause-effect relationship between HSV-1 and AD. Significance was indicated by a p-value of less than 0.05. Results: The studies analyzed were able to provide strict evidence that HSV-1 is a necessary condition (a conditio sine qua non), a sufficient condition and a necessary and sufficient condition of AD. Furthermore, the cause-effect relationship between HSV-1 and AD was highly significant. Conclusions: The data analyzed provide sufficient evidence to conclude that HSV-1 is the cause of AD. Keywords: Herpes simplex virus type 1, Alzheimer’s disease, causal relationship.
Category: Statistics

[111] viXra:1909.0376 [pdf] replaced on 2019-09-24 07:16:46

A Fully Bayesian Solution to K-Sample Tests for Comparison and the Behrens-Fisher Problem Based on the Henstock-Kurzweil Integral

Authors: Fabrice J.P.R. Pautot
Comments: 17 Pages.

We present a simple, fully probabilistic, Bayesian solution to -sample omnibus tests for comparison, with the Behrens-Fisher problem as a special case, which is free from the many defects found in the standard, classical, frequentist, likelihoodist and Bayesian approaches to those problems. We solve the main measure-theoretic difficulty for degenerate problems with continuous parameters of interest and Lebesgue-negligible point null hypothesis by approximating the corresponding continuous random variables by sequences of discrete ones defined on partitions of the parameter spaces and by taking the limit of the prior-to-posterior ratios of the probability of the null hypothesis for the corresponding discrete problems. Those limits are well defined under proper technicalities thanks to the Henstock-Kurzweil integral that is as powerful as the Lebesgue integral but still relies on Riemann sums, which are essential in the present approach. The solutions to the relative continuous problems take the form of Bayes-Poincaré factors that are new objects in Bayesian probability theory and should play a key role in the general theory of point null hypothesis testing, including other important problems such as the Jeffreys-Lindley paradox.
Category: Statistics

[110] viXra:1905.0371 [pdf] replaced on 2020-01-04 08:24:03

Glyphosate and Non-Hodgkin Lymphoma: no Causal Relationship

Authors: Ilija Barukčić
Comments: 41 Pages.

Objective: Herbicides are used worldwide by both residential and agricultural users. Due to the statistical analysis of some epidemiologic studies the International Agency for Research on Cancer classified the broad-spectrum herbicide glyphosate (GS) in 2015, as potentially carcinogenic to humans especially with respect to non-Hodgkin lymphoma (NHL). In this systematic review and re-analysis, the relationship between glyphosate and NHL was re- investigated. Methods: A systematic review and re-analysis of studies which investigated the relationship between GS and NHL was conducted. The method of the conditio sine qua non relationship, the method of the conditio per quam relationship, the method of the exclusion relationship and the mathematical formula of the causal relationship k were used to proof the hypothesis. Significance was indicated by a p-value of less than 0.05. Results: The studies analyzed do not provide any direct and indirect evidence that NHL is caused GS. Conclusion: In this re-analysis, no causal relationship was apparent between glyphosate and NHL and its subtypes. Keywords: Glyphosate, Non-Hodgkin lymphoma, no causal relationship
Category: Statistics

[109] viXra:1905.0371 [pdf] replaced on 2020-01-01 12:38:11

Glyphosate and Non-Hodgkin Lymphoma: no Causal Relationship

Authors: Ilija Barukčić
Comments: 41 Pages.

Objective: Herbicides are used worldwide by both residential and agricultural users. Due to the statistical analysis of some epidemiologic studies the International Agency for Research on Cancer classified the broad-spectrum herbicide glyphosate (GS) in 2015, as potentially carcinogenic to humans especially with respect to non-Hodgkin lymphoma (NHL). In this systematic review and re-analysis, the relationship between glyphosate and NHL was re- investigated. Methods: A systematic review and re-analysis of studies which investigated the relationship between GS and NHL was conducted. The method of the conditio sine qua non relationship, the method of the conditio per quam relationship, the method of the exclusion relationship and the mathematical formula of the causal relationship k were used to proof the hypothesis. Significance was indicated by a p-value of less than 0.05. Results: The studies analyzed do not provide any direct and indirect evidence that NHL is caused GS. Conclusion: In this re-analysis, no causal relationship was apparent between glyphosate and NHL and its subtypes. Keywords: Glyphosate, Non-Hodgkin lymphoma, no causal relationship
Category: Statistics