Publication • Sep 20, 2018

An R package for efficient estimation of smooth distributions from coarsely binned data

Marius D. Pascariu, Maciej J. Dańko, Jonas Schöley, Silvia Rizzi (2018). Journal of Open Source Software, 10.21105/joss.00937.

ungroup is an R library that introduces a versatile method for ungrouping histograms.

ungroup is an open source software library written in the R programming language (R Core Team, 2018) that introduces a versatile method for ungrouping histograms (binned count data) assuming that counts are Poisson distributed and that the underlying sequence over a fine grid to be estimated is smooth. The method is based on the composite link model (Thompson & Baker, 1981) and estimation is achieved by maximizing a penalized likelihood (PH Eilers, 2007), which extends standard generalized linear models. The penalized composite link model (PCLM) implements the idea that observed counts, interpreted as realizations from Poisson distributions, are indirect observations of a finer (ungrouped) but latent sequence. This latent sequence represents the distribution of expected means on a fine resolution and has to be estimated from the aggregated data. Estimates are obtained by maximizing a penalized likelihood. This maximization is performed efficiently by a version of the iteratively re-weighted least-squares algorithm. Optimal values of the smoothing parameter are chosen by minimizing Bayesian or Akaike’s Information Criterion (Hastie & Tibshirani, 1990).