# glum package

The two main classes in glum are GeneralizedLinearRegressor and GeneralizedLinearRegressorCV. Most users will use fit() and predict()

class glum.BinomialDistribution

A class for the Binomial distribution.

The Binomial distribution is for targets y in [0, 1].

deviance(y, mu, sample_weight=1)

Compute the deviance.

The deviance is a weighted sum of the unit deviances, $$\sum_i s_i \times d(y_i, \mu_i)$$, where $$d(y, \mu)$$ is the unit deviance and $$s$$ are weights. In terms of the log likelihood, it is $$-2\phi \times [L(y, \mu, \phi / s) - L(y, y, \phi / s)]$$.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inversely proportional.

Return type

float

deviance_derivative(y, mu, sample_weight=1)

Compute the derivative of the deviance with respect to mu.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

• sample_weight (array-like, shape (n_samples,) (default=1)) – Weights or exposure to which variance is inverse proportional.

Return type

array-like, shape (n_samples,)

dispersion(y, mu, sample_weight=None, ddof=1, method='pearson')

Estimate the dispersion parameter $$\phi$$.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

• sample_weight (array-like, shape (n_samples,), optional (default=None)) – Weights or exposure to which variance is inversely proportional.

• ddof (int, optional (default=1)) – Degrees of freedom consumed by the model for mu.

• {'pearson' (method =) – Whether to base the estimate on the Pearson residuals or the deviance.

• 'deviance'} – Whether to base the estimate on the Pearson residuals or the deviance.

• (default='pearson') (optional) – Whether to base the estimate on the Pearson residuals or the deviance.

Return type

float

eta_mu_deviance(link, factor, cur_eta, X_dot_d, y, sample_weight)

Compute eta, mu and the deviance.

Compute: * the linear predictor, eta, as cur_eta + factor * X_dot_d; * the link-function-transformed prediction, mu; * the deviance.

Returns

• numpy.ndarray, shape (X.shape[0],) – The linear predictor, eta.

• numpy.ndarray, shape (X.shape[0],) – The link-function-transformed prediction, mu.

• float – The deviance.

Parameters

• factor (float) –

• cur_eta (ndarray) –

• X_dot_d (ndarray) –

• y (ndarray) –

• sample_weight (ndarray) –

in_y_range(x)

Return True if x is in the valid range of the EDM.

Parameters

x (array-like, shape (n_samples,)) – Target values.

Return type

np.ndarray

log_likelihood(y, mu, sample_weight=None, dispersion=1)

Compute the log likelihood.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Sample weights.

• dispersion (float, optional (default=1)) – Ignored.

Return type

float

Compute the gradient and negative Hessian of the log likelihood row-wise.

Returns

• numpy.ndarray, shape (X.shape[0],) – The gradient of the log likelihood, row-wise.

• numpy.ndarray, shape (X.shape[0],) – The negative Hessian of the log likelihood, row-wise.

Parameters

• coef (ndarray) –

• X (Union[MatrixBase, StandardizedMatrix]) –

• y (ndarray) –

• sample_weight (ndarray) –

• eta (ndarray) –

• mu (ndarray) –

• offset (ndarray) –

unit_deviance(y, mu)

Get the unit-level deviance.

See superclass documentation.

Parameters
• y (array-like) –

• mu (array-like) –

Return type

array-like

unit_deviance_derivative(y, mu)

Compute the derivative of the unit deviance with respect to mu.

The derivative of the unit deviance is given by $$-2 \times (y - \mu) / v(\mu)$$, where $$v(\mu)$$ is the unit variance.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

Return type

array-like, shape (n_samples,)

unit_variance(mu)

Get the unit-level expected variance.

See superclass documentation.

Parameters

mu (array-like) –

Return type

array-like

unit_variance_derivative(mu)

Get the derivative of the unit variance.

See superclass documentation.

Parameters

mu (array-like or float) –

Return type

array-like

variance(mu, dispersion=1, sample_weight=1)

Compute the variance function.

The variance of $$Y_i \sim \mathrm{EDM}(\mu_i, \phi / s_i)$$ is $$\mathrm{var}(Y_i) = (\phi / s_i) * v(\mu_i)$$, with unit variance $$v(\mu)$$ and weights $$s_i$$.

Parameters
• mu (array-like, shape (n_samples,)) – Predicted mean.

• dispersion (float, optional (default=1)) – Dispersion parameter $$\phi$$.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.

Return type

array-like, shape (n_samples,)

variance_derivative(mu, dispersion=1, sample_weight=1)

Compute the derivative of the variance with respect to mu.

The derivative of the variance is equal to $$(\phi / s_i) * v'(\mu_i)$$, where $$v(\mu)$$ is the unit variance and $$s_i$$ are weights.

Parameters
• mu (array-like, shape (n_samples,)) – Predicted mean.

• dispersion (float, optional (default=1)) – Dispersion parameter $$\phi$$.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.

Return type

array-like, shape (n_samples,)

Bases: Link

The complementary log-log link function log(-log(-p)).

derivative(mu)

Get the derivative of the cloglog link.

See superclass documentation.

Parameters

mu (array-like) –

Return type

array-like

inverse(lin_pred)

Get the inverse of the cloglog link.

See superclass documentation.

Note: since passing a very large value might result in an output of one, this function bounds the output to be between [50*eps, 1 - 50*eps], where eps is floating point epsilon.

Parameters

lin_pred (array-like) –

Return type

array-like

inverse_derivative(lin_pred)

Compute the derivative of the inverse link function h'(lin_pred).

Parameters

lin_pred (array, shape (n_samples,)) – Usually the (fitted) linear predictor.

inverse_derivative2(lin_pred)

Compute 2nd derivative of the inverse link function h''(lin_pred).

Parameters

lin_pred (array, shape (n_samples,)) – Usually the (fitted) linear predictor.

Get the logit function of mu.

See superclass documentation.

Parameters

mu (array-like) –

Return type

numpy.ndarray

class glum.ExponentialDispersionModel

Bases: object

Base class for reproductive Exponential Dispersion Models (EDM).

The PDF of $$Y \sim \mathrm{EDM}(\mu, \phi)$$ is given by

$\begin{split}p(y \mid \theta, \phi) &= c(y, \phi) \exp((\theta y - A(\theta)_ / \phi) \\ &= \tilde{c}(y, \phi) \exp(-d(y, \mu) / (2\phi))\end{split}$

with mean $$\mathrm{E}(Y) = A'(\theta) = \mu$$, variance $$\mathrm{var}(Y) = \phi \cdot v(\mu)$$, unit variance $$v(\mu)$$ and unit deviance $$d(y, \mu)$$.

## Properties

lower_bound upper_bound include_lower_bound include_upper_bound

in_y_range()
Return type

ndarray

unit_variance()
unit_variance_derivative()
variance()
Parameters

mu (ndarray) –

Return type

ndarray

variance_derivative()
unit_deviance()
unit_deviance_derivative()
deviance()
deviance_derivative()
starting_mu()
_mu_deviance_derivative()
eta_mu_deviance()
Parameters

• factor (float) –

• cur_eta (ndarray) –

• X_dot_d (ndarray) –

• y (ndarray) –

• sample_weight (ndarray) –

References

deviance(y, mu, sample_weight=1)

Compute the deviance.

The deviance is a weighted sum of the unit deviances, $$\sum_i s_i \times d(y_i, \mu_i)$$, where $$d(y, \mu)$$ is the unit deviance and $$s$$ are weights. In terms of the log likelihood, it is $$-2\phi \times [L(y, \mu, \phi / s) - L(y, y, \phi / s)]$$.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inversely proportional.

Return type

float

deviance_derivative(y, mu, sample_weight=1)

Compute the derivative of the deviance with respect to mu.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

• sample_weight (array-like, shape (n_samples,) (default=1)) – Weights or exposure to which variance is inverse proportional.

Return type

array-like, shape (n_samples,)

dispersion(y, mu, sample_weight=None, ddof=1, method='pearson')

Estimate the dispersion parameter $$\phi$$.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inversely proportional.

• ddof (int, optional (default=1)) – Degrees of freedom consumed by the model for mu.

• {'pearson' (method =) – Whether to base the estimate on the Pearson residuals or the deviance.

• 'deviance'} – Whether to base the estimate on the Pearson residuals or the deviance.

• (default='pearson') (optional) – Whether to base the estimate on the Pearson residuals or the deviance.

Return type

float

eta_mu_deviance(link, factor, cur_eta, X_dot_d, y, sample_weight)

Compute eta, mu and the deviance.

Compute: * the linear predictor, eta, as cur_eta + factor * X_dot_d; * the link-function-transformed prediction, mu; * the deviance.

Returns

• numpy.ndarray, shape (X.shape[0],) – The linear predictor, eta.

• numpy.ndarray, shape (X.shape[0],) – The link-function-transformed prediction, mu.

• float – The deviance.

Parameters

• factor (float) –

• cur_eta (ndarray) –

• X_dot_d (ndarray) –

• y (ndarray) –

• sample_weight (ndarray) –

in_y_range(x)

Return True if x is in the valid range of the EDM.

Parameters

x (array-like, shape (n_samples,)) – Target values.

Return type

np.ndarray

abstract property include_lower_bound: bool

Return whether lower_bound is allowed as a value of y.

abstract property include_upper_bound: bool

Return whether upper_bound is allowed as a value of y.

abstract property lower_bound: float

Get the lower bound of values for the EDM.

Compute the gradient and negative Hessian of the log likelihood row-wise.

Returns

• numpy.ndarray, shape (X.shape[0],) – The gradient of the log likelihood, row-wise.

• numpy.ndarray, shape (X.shape[0],) – The negative Hessian of the log likelihood, row-wise.

Parameters

• coef (ndarray) –

• X (Union[MatrixBase, StandardizedMatrix]) –

• y (ndarray) –

• sample_weight (ndarray) –

• eta (ndarray) –

• mu (ndarray) –

• offset (ndarray) –

abstract unit_deviance(y, mu)

Compute the unit deviance.

In terms of the log likelihood $$L$$, the unit deviance is $$-2\phi\times [L(y, \mu, \phi) - L(y, y, \phi)].$$

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

unit_deviance_derivative(y, mu)

Compute the derivative of the unit deviance with respect to mu.

The derivative of the unit deviance is given by $$-2 \times (y - \mu) / v(\mu)$$, where $$v(\mu)$$ is the unit variance.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

Return type

array-like, shape (n_samples,)

abstract unit_variance(mu)

Compute the unit variance function.

The unit variance $$v(\mu)$$ determines the variance as a function of the mean $$\mu$$ by $$\mathrm{var}(y_i) = (\phi / s_i) \times v(\mu_i)$$. It can also be derived from the unit deviance $$d(y, \mu)$$ as

$v(\mu) = \frac{2}{\frac{\partial^2 d(y, \mu)}{\partial\mu^2}}\big|_{y=\mu}.$

See also variance().

Parameters

mu (array-like, shape (n_samples,)) – Predicted mean.

abstract unit_variance_derivative(mu)

Compute the derivative of the unit variance with respect to mu.

Return $$v'(\mu)$$.

Parameters

mu (array-like, shape (n_samples,)) – Predicted mean.

abstract property upper_bound: float

Get the upper bound of values for the EDM.

variance(mu, dispersion=1, sample_weight=1)

Compute the variance function.

The variance of $$Y_i \sim \mathrm{EDM}(\mu_i, \phi / s_i)$$ is $$\mathrm{var}(Y_i) = (\phi / s_i) * v(\mu_i)$$, with unit variance $$v(\mu)$$ and weights $$s_i$$.

Parameters
• mu (array-like, shape (n_samples,)) – Predicted mean.

• dispersion (float, optional (default=1)) – Dispersion parameter $$\phi$$.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.

Return type

array-like, shape (n_samples,)

variance_derivative(mu, dispersion=1, sample_weight=1)

Compute the derivative of the variance with respect to mu.

The derivative of the variance is equal to $$(\phi / s_i) * v'(\mu_i)$$, where $$v(\mu)$$ is the unit variance and $$s_i$$ are weights.

Parameters
• mu (array-like, shape (n_samples,)) – Predicted mean.

• dispersion (float, optional (default=1)) – Dispersion parameter $$\phi$$.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.

Return type

array-like, shape (n_samples,)

Class for the Gamma distribution.

deviance(y, mu, sample_weight=None)

Compute the deviance.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Sample weights.

Return type

float

deviance_derivative(y, mu, sample_weight=1)

Compute the derivative of the deviance with respect to mu.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

• sample_weight (array-like, shape (n_samples,) (default=1)) – Weights or exposure to which variance is inverse proportional.

Return type

array-like, shape (n_samples,)

dispersion(y, mu, sample_weight=None, ddof=1, method='pearson')

Estimate the dispersion parameter $$\phi$$.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

• sample_weight (array-like, shape (n_samples,), optional (default=None)) – Weights or exposure to which variance is inversely proportional.

• ddof (int, optional (default=1)) – Degrees of freedom consumed by the model for mu.

• {'pearson' (method =) – Whether to base the estimate on the Pearson residuals or the deviance.

• 'deviance'} – Whether to base the estimate on the Pearson residuals or the deviance.

• (default='pearson') (optional) – Whether to base the estimate on the Pearson residuals or the deviance.

Return type

float

eta_mu_deviance(link, factor, cur_eta, X_dot_d, y, sample_weight)

Compute eta, mu and the deviance.

Compute: * the linear predictor, eta, as cur_eta + factor * X_dot_d; * the link-function-transformed prediction, mu; * the deviance.

Returns

• numpy.ndarray, shape (X.shape[0],) – The linear predictor, eta.

• numpy.ndarray, shape (X.shape[0],) – The link-function-transformed prediction, mu.

• float – The deviance.

Parameters

• factor (float) –

• cur_eta (ndarray) –

• X_dot_d (ndarray) –

• y (ndarray) –

• sample_weight (ndarray) –

in_y_range(x)

Return True if x is in the valid range of the EDM.

Parameters

x (array-like, shape (n_samples,)) – Target values.

Return type

np.ndarray

property include_lower_bound: bool

Return whether lower_bound is allowed as a value of y.

log_likelihood(y, mu, sample_weight=None, dispersion=None)

Compute the log likelihood.

For 1 < power < 2, we use the series approximation by Dunn and Smyth (2005) to compute the normalization term.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Sample weights.

• dispersion (float, optional (default=None)) – Dispersion parameter $$\phi$$. Estimated if None.

Return type

float

property lower_bound: float

Return the lowest value of y allowed.

property power: float

Return the Tweedie power parameter.

Compute the gradient and negative Hessian of the log likelihood row-wise.

Returns

• numpy.ndarray, shape (X.shape[0],) – The gradient of the log likelihood, row-wise.

• numpy.ndarray, shape (X.shape[0],) – The negative Hessian of the log likelihood, row-wise.

Parameters

• coef (ndarray) –

• X (Union[MatrixBase, StandardizedMatrix]) –

• y (ndarray) –

• sample_weight (ndarray) –

• eta (ndarray) –

• mu (ndarray) –

• offset (ndarray) –

unit_deviance(y, mu)

Get the deviance of each observation.

unit_deviance_derivative(y, mu)

Compute the derivative of the unit deviance with respect to mu.

The derivative of the unit deviance is given by $$-2 \times (y - \mu) / v(\mu)$$, where $$v(\mu)$$ is the unit variance.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

Return type

array-like, shape (n_samples,)

unit_variance(mu)

Compute the unit variance of a Tweedie distribution v(mu) = mu^power.

Parameters

mu (array-like, shape (n_samples,)) – Predicted mean.

Return type

numpy.ndarray, shape (n_samples,)

unit_variance_derivative(mu)

Compute the derivative of the unit variance of a Tweedie distribution.

Equation: $$v(\mu) = p \times \mu^{(p-1)}$$.

Parameters

mu (array-like, shape (n_samples,)) – Predicted mean.

Return type

numpy.ndarray, shape (n_samples,)

variance(mu, dispersion=1, sample_weight=1)

Compute the variance function.

The variance of $$Y_i \sim \mathrm{EDM}(\mu_i, \phi / s_i)$$ is $$\mathrm{var}(Y_i) = (\phi / s_i) * v(\mu_i)$$, with unit variance $$v(\mu)$$ and weights $$s_i$$.

Parameters
• mu (array-like, shape (n_samples,)) – Predicted mean.

• dispersion (float, optional (default=1)) – Dispersion parameter $$\phi$$.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.

Return type

array-like, shape (n_samples,)

variance_derivative(mu, dispersion=1, sample_weight=1)

Compute the derivative of the variance with respect to mu.

The derivative of the variance is equal to $$(\phi / s_i) * v'(\mu_i)$$, where $$v(\mu)$$ is the unit variance and $$s_i$$ are weights.

Parameters
• mu (array-like, shape (n_samples,)) – Predicted mean.

• dispersion (float, optional (default=1)) – Dispersion parameter $$\phi$$.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.

Return type

array-like, shape (n_samples,)

class glum.GeneralizedHyperbolicSecant

A class for the Generalized Hyperbolic Secant (GHS) distribution.

The GHS distribution is for targets y in (-∞, +∞).

deviance(y, mu, sample_weight=1)

Compute the deviance.

The deviance is a weighted sum of the unit deviances, $$\sum_i s_i \times d(y_i, \mu_i)$$, where $$d(y, \mu)$$ is the unit deviance and $$s$$ are weights. In terms of the log likelihood, it is $$-2\phi \times [L(y, \mu, \phi / s) - L(y, y, \phi / s)]$$.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inversely proportional.

Return type

float

deviance_derivative(y, mu, sample_weight=1)

Compute the derivative of the deviance with respect to mu.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

• sample_weight (array-like, shape (n_samples,) (default=1)) – Weights or exposure to which variance is inverse proportional.

Return type

array-like, shape (n_samples,)

dispersion(y, mu, sample_weight=None, ddof=1, method='pearson')

Estimate the dispersion parameter $$\phi$$.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inversely proportional.

• ddof (int, optional (default=1)) – Degrees of freedom consumed by the model for mu.

• {'pearson' (method =) – Whether to base the estimate on the Pearson residuals or the deviance.

• 'deviance'} – Whether to base the estimate on the Pearson residuals or the deviance.

• (default='pearson') (optional) – Whether to base the estimate on the Pearson residuals or the deviance.

Return type

float

eta_mu_deviance(link, factor, cur_eta, X_dot_d, y, sample_weight)

Compute eta, mu and the deviance.

Compute: * the linear predictor, eta, as cur_eta + factor * X_dot_d; * the link-function-transformed prediction, mu; * the deviance.

Returns

• numpy.ndarray, shape (X.shape[0],) – The linear predictor, eta.

• numpy.ndarray, shape (X.shape[0],) – The link-function-transformed prediction, mu.

• float – The deviance.

Parameters

• factor (float) –

• cur_eta (ndarray) –

• X_dot_d (ndarray) –

• y (ndarray) –

• sample_weight (ndarray) –

in_y_range(x)

Return True if x is in the valid range of the EDM.

Parameters

x (array-like, shape (n_samples,)) – Target values.

Return type

np.ndarray

Compute the gradient and negative Hessian of the log likelihood row-wise.

Returns

• numpy.ndarray, shape (X.shape[0],) – The gradient of the log likelihood, row-wise.

• numpy.ndarray, shape (X.shape[0],) – The negative Hessian of the log likelihood, row-wise.

Parameters

• coef (ndarray) –

• X (Union[MatrixBase, StandardizedMatrix]) –

• y (ndarray) –

• sample_weight (ndarray) –

• eta (ndarray) –

• mu (ndarray) –

• offset (ndarray) –

unit_deviance(y, mu)

Get the unit-level deviance.

See superclass documentation.

Parameters
• y (array-like) –

• mu (array-like) –

Return type

array-like

unit_deviance_derivative(y, mu)

Compute the derivative of the unit deviance with respect to mu.

The derivative of the unit deviance is given by $$-2 \times (y - \mu) / v(\mu)$$, where $$v(\mu)$$ is the unit variance.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

Return type

array-like, shape (n_samples,)

unit_variance(mu)

Get the unit-level expected variance.

See superclass documentation.

Parameters

mu (array-like or float) –

Return type

array-like

unit_variance_derivative(mu)

Get the derivative of the unit variance.

See superclass documentation.

Parameters

mu (array-like or float) –

Return type

array-like

variance(mu, dispersion=1, sample_weight=1)

Compute the variance function.

The variance of $$Y_i \sim \mathrm{EDM}(\mu_i, \phi / s_i)$$ is $$\mathrm{var}(Y_i) = (\phi / s_i) * v(\mu_i)$$, with unit variance $$v(\mu)$$ and weights $$s_i$$.

Parameters
• mu (array-like, shape (n_samples,)) – Predicted mean.

• dispersion (float, optional (default=1)) – Dispersion parameter $$\phi$$.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.

Return type

array-like, shape (n_samples,)

variance_derivative(mu, dispersion=1, sample_weight=1)

Compute the derivative of the variance with respect to mu.

The derivative of the variance is equal to $$(\phi / s_i) * v'(\mu_i)$$, where $$v(\mu)$$ is the unit variance and $$s_i$$ are weights.

Parameters
• mu (array-like, shape (n_samples,)) – Predicted mean.

• dispersion (float, optional (default=1)) – Dispersion parameter $$\phi$$.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.

Return type

array-like, shape (n_samples,)

class glum.GeneralizedLinearRegressor(alpha=None, l1_ratio=0, P1='identity', P2='identity', fit_intercept=True, family='normal', link='auto', solver='auto', max_iter=100, gradient_tol=None, step_size_tol=None, hessian_approx=0.0, warm_start=False, alpha_search=False, alphas=None, n_alphas=100, min_alpha_ratio=None, min_alpha=None, start_params=None, selection='cyclic', random_state=None, copy_X=None, check_input=True, verbose=0, scale_predictors=False, lower_bounds=None, upper_bounds=None, A_ineq=None, b_ineq=None, force_all_finite=True, drop_first=False, robust=True, expected_information=False)

Bases: GeneralizedLinearRegressorBase

Regression via a Generalized Linear Model (GLM) with penalties.

GLMs based on a reproductive Exponential Dispersion Model (EDM) aimed at fitting and predicting the mean of the target y as mu=h(X*w). Therefore, the fit minimizes the following objective function with combined L1 and L2 priors as regularizer:

1/(2*sum(s)) * deviance(y, h(X*w); s)
+ alpha * l1_ratio * ||P1*w||_1
+ 1/2 * alpha * (1 - l1_ratio) * w*P2*w


with inverse link function h and s=sample_weight. Note that, for alpha=0 the unregularized GLM is recovered. This is not the default behavior (see alpha parameter description for details). Additionally, for sample_weight=None, one has s_i=1 and sum(s)=n_samples. For P1=P2='identity', the penalty is the elastic net:

alpha * l1_ratio * ||w||_1 + 1/2 * alpha * (1 - l1_ratio) * ||w||_2^2.


If you are interested in controlling the L1 and L2 penalties separately, keep in mind that this is equivalent to:

a * L1 + b * L2,


where:

alpha = a + b and l1_ratio = a / (a + b).


The parameter l1_ratio corresponds to alpha in the R package glmnet, while alpha corresponds to the lambda parameter in glmnet. Specifically, l1_ratio = 1 is the lasso penalty.

Parameters
• alpha ({float, array-like}, optional (default=None)) – Constant that multiplies the penalty terms and thus determines the regularization strength. If alpha_search is False (the default), then alpha must be a scalar or None (equivalent to alpha=1.0). If alpha_search is True, then alpha must be an iterable or None. See alpha_search to find how the regularization path is set if alpha is None. See the notes for the exact mathematical meaning of this parameter. alpha = 0 is equivalent to unpenalized GLMs. In this case, the design matrix X must have full column rank (no collinearities).

• l1_ratio (float, optional (default=0)) – The elastic net mixing parameter, with 0 <= l1_ratio <= 1. For l1_ratio = 0, the penalty is an L2 penalty. For l1_ratio = 1, it is an L1 penalty. For 0 < l1_ratio < 1, the penalty is a combination of L1 and L2.

• P1 ({'identity', array-like}, shape (n_features,), optional (default='identity')) – This array controls the strength of the regularization for each coefficient independently. A high value will lead to higher regularization while a value of zero will remove the regularization on this parameter. Note that n_features = X.shape[1]. If X is a pandas DataFrame with a categorical dtype and P1 has the same size as the number of columns, the penalty of the categorical column will be applied to all the levels of the categorical.

• P2 ({'identity', array-like, sparse matrix}, shape (n_features,) or (n_features, n_features), optional (default='identity')) – With this option, you can set the P2 matrix in the L2 penalty w*P2*w. This gives a fine control over this penalty (Tikhonov regularization). A 2d array is directly used as the square matrix P2. A 1d array is interpreted as diagonal (square) matrix. The default 'identity' sets the identity matrix, which gives the usual squared L2-norm. If you just want to exclude certain coefficients, pass a 1d array filled with 1 and 0 for the coefficients to be excluded. Note that P2 must be positive semi-definite. If X is a pandas DataFrame with a categorical dtype and P2 has the same size as the number of columns, the penalty of the categorical column will be applied to all the levels of the categorical. Note that if P2 is two-dimensional, its size needs to be of the same length as the expanded X matrix.

• fit_intercept (bool, optional (default=True)) – Specifies if a constant (a.k.a. bias or intercept) should be added to the linear predictor (X * coef + intercept).

• family (str or ExponentialDispersionModel, optional (default='normal')) – The distributional assumption of the GLM, i.e. the loss function to minimize. If a string, one of: 'binomial', 'gamma', 'gaussian', 'inverse.gaussian', 'normal', 'poisson', 'tweedie' or 'negative.binomial'. Note that 'tweedie' sets the power of the Tweedie distribution to 1.5; to use another value, specify it in parentheses (e.g., 'tweedie (1.5)'). The same applies for 'negative.binomial' and theta parameter.

• link ({'auto', 'identity', 'log', 'logit', 'cloglog'} or Link, optional (default='auto')) –

The link function of the GLM, i.e. mapping from linear predictor (X * coef) to expectation (mu). Option 'auto' sets the link depending on the chosen family as follows:

• 'identity' for family 'normal'

• 'log' for families 'poisson', 'gamma', 'inverse.gaussian' and 'negative.binomial'.

• 'logit' for family 'binomial'

• solver ({'auto', 'irls-cd', 'irls-ls', 'lbfgs', 'trust-constr'}, optional (default='auto')) –

Algorithm to use in the optimization problem:

• 'auto': 'irls-ls' if l1_ratio is zero and 'irls-cd' otherwise.

• 'irls-cd': Iteratively reweighted least squares with a coordinate descent inner solver. This can deal with L1 as well as L2 penalties. Note that in order to avoid unnecessary memory duplication of X in the fit method, X should be directly passed as a Fortran-contiguous Numpy array or sparse CSC matrix.

• 'irls-ls': Iteratively reweighted least squares with a least squares inner solver. This algorithm cannot deal with L1 penalties.

• 'lbfgs': Scipy’s L-BFGS-B optimizer. It cannot deal with L1 penalties.

• 'trust-constr': Calls scipy.optimize.minimize(method='trust-constr'). It cannot deal with L1 penalties. This solver can optimize problems with inequality constraints, passed via A_ineq and b_ineq. It will be selected automatically when inequality constraints are set and solver='auto'. Note that using this method can lead to significantly increased runtimes by a factor of ten or higher.

• max_iter (int, optional (default=100)) – The maximal number of iterations for solver algorithms.

• gradient_tol (float, optional (default=None)) –

Stopping criterion. If None, solver-specific defaults will be used. The default value for most solvers is 1e-4, except for 'trust-constr', which requires more conservative convergence settings and has a default value of 1e-8.

For the IRLS-LS, L-BFGS and trust-constr solvers, the iteration will stop when max{|g_i|, i = 1, ..., n} <= tol, where g_i is the i-th component of the gradient (derivative) of the objective function. For the CD solver, convergence is reached when sum_i(|minimum norm of g_i|), where g_i is the subgradient of the objective and the minimum norm of g_i is the element of the subgradient with the smallest L2 norm.

If you wish to only use a step-size tolerance, set gradient_tol to a very small number.

• step_size_tol (float, optional (default=None)) – Alternative stopping criterion. For the IRLS-LS and IRLS-CD solvers, the iteration will stop when the L2 norm of the step size is less than step_size_tol. This stopping criterion is disabled when step_size_tol is None.

• hessian_approx (float, optional (default=0.0)) – The threshold below which data matrix rows will be ignored for updating the Hessian. See the algorithm documentation for the IRLS algorithm for further details.

• warm_start (bool, optional (default=False)) – Whether to reuse the solution of the previous call to fit as initialization for coef_ and intercept_ (supersedes start_params). If False or if the attribute coef_ does not exist (first call to fit), start_params sets the start values for coef_ and intercept_.

• alpha_search (bool, optional (default=False)) –

Whether to search along the regularization path for the best alpha. When set to True, alpha should either be None or an iterable. To determine the regularization path, the following sequence is used:

1. If alpha is an iterable, use it directly. All other parameters governing the regularization path are ignored.

2. If min_alpha is set, create a path from min_alpha to the lowest alpha such that all coefficients are zero.

3. If min_alpha_ratio is set, create a path where the ratio of min_alpha / max_alpha = min_alpha_ratio.

4. If none of the above parameters are set, use a min_alpha_ratio of 1e-6.

• alphas (DEPRECATED. Use alpha instead.) –

• n_alphas (int, optional (default=100)) – Number of alphas along the regularization path

• min_alpha_ratio (float, optional (default=None)) – Length of the path. min_alpha_ratio=1e-6 means that min_alpha / max_alpha = 1e-6. If None, 1e-6 is used.

• min_alpha (float, optional (default=None)) – Minimum alpha to estimate the model with. The grid will then be created over [max_alpha, min_alpha].

• start_params (array-like, shape (n_features*,), optional (default=None)) – Relevant only if warm_start is False or if fit is called for the first time (so that self.coef_ does not exist yet). If None, all coefficients are set to zero and the start value for the intercept is the weighted average of y (If fit_intercept is True). If an array, used directly as start values; if fit_intercept is True, its first element is assumed to be the start value for the intercept_. Note that n_features* = X.shape[1] + fit_intercept, i.e. it includes the intercept.

• selection (str, optional (default='cyclic')) – For the CD solver ‘cd’, the coordinates (features) can be updated in either cyclic or random order. If set to 'random', a random coefficient is updated every iteration rather than looping over features sequentially in the same order, which often leads to significantly faster convergence, especially when gradient_tol is higher than 1e-4.

• random_state (int or RandomState, optional (default=None)) – The seed of the pseudo random number generator that selects a random feature to be updated for the CD solver. If an integer, random_state is the seed used by the random number generator; if a RandomState instance, random_state is the random number generator; if None, the random number generator is the RandomState instance used by np.random. Used when selection is 'random'.

• copy_X (bool, optional (default=None)) – Whether to copy X. Since X is never modified by GeneralizedLinearRegressor, this is unlikely to be needed; this option exists mainly for compatibility with other scikit-learn estimators. If False, X will not be copied and there will be an error if you pass an X in the wrong format, such as providing integer X and float y. If None, X will not be copied unless it is in the wrong format.

• check_input (bool, optional (default=True)) – Whether to bypass several checks on input: y values in range of family, sample_weight non-negative, P2 positive semi-definite. Don’t use this parameter unless you know what you are doing.

• verbose (int, optional (default=0)) – For the IRLS solver, any positive number will result in a pretty progress bar showing convergence. This features requires having the tqdm package installed. For the L-BFGS and 'trust-constr' solvers, set verbose to any positive number for verbosity.

• scale_predictors (bool, optional (default=False)) –

If True, estimate a scaled model where all predictors have a standard deviation of 1. This can result in better estimates if predictors are on very different scales (for example, centimeters and kilometers).

Advanced developer note: Internally, predictors are always rescaled for computational reasons, but this only affects results if scale_predictors is True.

• lower_bounds (array-like, shape (n_features,), optional (default=None)) – Set a lower bound for the coefficients. Setting bounds forces the use of the coordinate descent solver ('irls-cd').

• upper_bounds (array-like, shape=(n_features,), optional (default=None)) – See lower_bounds.

• A_ineq (array-like, shape=(n_constraints, n_features), optional (default=None)) – Constraint matrix for linear inequality constraints of the form A_ineq w <= b_ineq. Setting inequality constraints forces the use of the local gradient-based solver 'trust-constr', which may increase runtime significantly. Note that the constraints only apply to coefficients related to features in X. If you want to constrain the intercept, add it to the feature matrix X manually and set fit_intercept==False.

• b_ineq (array-like, shape=(n_constraints,), optional (default=None)) – Constraint vector for linear inequality constraints of the form A_ineq w <= b_ineq. Refer to the documentation of A_ineq for details.

• drop_first (bool, optional (default = False)) – If True, drop the first column when encoding categorical variables. Set this to True when alpha=0 and solver=’auto’ to prevent an error due to a singular feature matrix.

• robust (bool, optional (default = False)) – If true, then robust standard errors are computed by default.

• expected_information (bool, optional (default = False)) – If true, then the expected information matrix is computed by default. Only relevant when computing robust standard errors.

• force_all_finite (bool) –

coef_

Estimated coefficients for the linear predictor (X*coef_+intercept_) in the GLM.

Type

numpy.array, shape (n_features,)

intercept_

Intercept (a.k.a. bias) added to linear predictor.

Type

float

n_iter_

Actual number of iterations used in solver.

Type

int

Notes

The fit itself does not need outcomes to be from an EDM, but only assumes the first two moments to be $$\mu_i \equiv \mathrm{E}(y_i) = h(x_i' w)$$ and $$\mathrm{var}(y_i) = (\phi / s_i) v(\mu_i)$$. The unit variance function $$v(\mu_i)$$ is a property of and given by the specific EDM; see background.

The parameters $$w$$ (coef_ and intercept_) are estimated by minimizing the deviance plus penalty term, which is equivalent to (penalized) maximum likelihood estimation.

For alpha > 0, the feature matrix X should be standardized in order to penalize features equally strong. Call sklearn.preprocessing.StandardScaler before calling fit.

If the target y is a ratio, appropriate sample weights s should be provided. As an example, consider Poisson distributed counts z (integers) and weights s = exposure (time, money, persons years, …). Then you fit y ≡ z/s, i.e. GeneralizedLinearModel(family='poisson').fit(X, y, sample_weight=s). The weights are necessary for the right (finite sample) mean. Consider $$\bar{y} = \sum_i s_i y_i / \sum_i s_i$$: in this case, one might say that $$y$$ follows a ‘scaled’ Poisson distribution. The same holds for other distributions.

References

For the coordinate descent implementation:
aic(X, y, sample_weight=None)

Akaike’s information criteria. Computed as: $$-2\log\hat{\mathcal{L}} + 2\hat{k}$$ where $$\hat{\mathcal{L}}$$ is the maximum likelihood estimate of the model, and $$\hat{k}$$ is the effective number of parameters. See _compute_information_criteria for more information on the computation of $$\hat{k}$$.

Parameters
• X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Same data as used in ‘fit’

• y (array-like, shape (n_samples,)) – Same data as used in ‘fit’

• sample_weight (array-like, shape (n_samples,), optional (default=None)) – Same data as used in ‘fit’

aicc(X, y, sample_weight=None)

Second-order Akaike’s information criteria (or small sample AIC). Computed as: $$-2\log\hat{\mathcal{L}} + 2\hat{k} + \frac{2k(k+1)}{n-k-1}$$ where $$\hat{\mathcal{L}}$$ is the maximum likelihood estimate of the model, $$n$$ is the number of training instances, and $$\hat{k}$$ is the effective number of parameters. See _compute_information_criteria for more information on the computation of $$\hat{k}$$.

Parameters
• X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Same data as used in ‘fit’

• y (array-like, shape (n_samples,)) – Same data as used in ‘fit’

• sample_weight (array-like, shape (n_samples,), optional (default=None)) – Same data as used in ‘fit’

bic(X, y, sample_weight=None)

Bayesian information criterion. Computed as: $$-2\log\hat{\mathcal{L}} + k\log(n)$$ where $$\hat{\mathcal{L}}$$ is the maximum likelihood estimate of the model, $$n$$ is the number of training instances, and $$\hat{k}$$ is the effective number of parameters. See _compute_information_criteria for more information on the computation of $$\hat{k}$$.

Parameters
• X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Same data as used in ‘fit’

• y (array-like, shape (n_samples,)) – Same data as used in ‘fit’

• sample_weight (array-like, shape (n_samples,), optional (default=None)) – Same data as used in ‘fit’

coef_table(confidence_level=0.95, X=None, y=None, mu=None, offset=None, sample_weight=None, dispersion=None, robust=None, clusters=None, expected_information=None)

Get a table of of the regression coefficients.

Includes coefficient estimates, standard errors, t-values, p-values and confidence intervals.

Parameters
• confidence_level (float, optional, default=0.95) – The confidence level for the confidence intervals.

• X ({array-like, sparse matrix}, shape (n_samples, n_features), optional) – Training data. Can be omitted if a covariance matrix has already been computed.

• y (array-like, shape (n_samples,), optional) – Target values. Can be omitted if a covariance matrix has already been computed.

• mu (array-like, optional, default=None) – Array with predictions. Estimated if absent.

• offset (array-like, optional, default=None) – Array with additive offsets.

• sample_weight (array-like, shape (n_samples,), optional, default=None) – Individual weights for each sample.

• dispersion (float, optional, default=None) – The dispersion parameter. Estimated if absent.

• robust (boolean, optional, default=None) – Whether to compute robust standard errors instead of normal ones. If not specified, the model’s robust attribute is used.

• clusters (array-like, optional, default=None) – Array with cluster membership. Clustered standard errors are computed if clusters is not None.

• expected_information (boolean, optional, default=None) – Whether to use the expected or observed information matrix. Only relevant when computing robust standard errors. If not specified, the model’s expected_information attribute is used.

Returns

A table of the regression results.

Return type

pandas.DataFrame

covariance_matrix(X=None, y=None, mu=None, offset=None, sample_weight=None, dispersion=None, robust=None, clusters=None, expected_information=None, store_covariance_matrix=False, skip_checks=False)

Calculate the covariance matrix for generalized linear models.

Parameters
• X ({array-like, sparse matrix}, shape (n_samples, n_features), optional) – Training data. Can be omitted if a covariance matrix has already been computed.

• y (array-like, shape (n_samples,), optional) – Target values. Can be omitted if a covariance matrix has already been computed.

• mu (array-like, optional, default=None) – Array with predictions. Estimated if absent.

• offset (array-like, optional, default=None) – Array with additive offsets.

• sample_weight (array-like, shape (n_samples,), optional, default=None) – Individual weights for each sample.

• dispersion (float, optional, default=None) – The dispersion parameter. Estimated if absent.

• robust (boolean, optional, default=None) – Whether to compute robust standard errors instead of normal ones. If not specified, the model’s robust attribute is used.

• clusters (array-like, optional, default=None) – Array with cluster membership. Clustered standard errors are computed if clusters is not None.

• expected_information (boolean, optional, default=None) – Whether to use the expected or observed information matrix. Only relevant when computing robust standard errors. If not specified, the model’s expected_information attribute is used.

• store_covariance_matrix (boolean, optional, default=False) – Whether to store the covariance matrix in the model instance. If a covariance matrix has already been stored, it will be overwritten.

• skip_checks (boolean, optional, default=False) – Whether to skip input validation. For internal use only.

Notes

We support three types of covariance matrices:

• non-robust

• robust (HC-1)

• clustered

For maximum-likelihood estimator, the covariance matrix takes the form $$\mathcal{H}^{-1}(\theta_0)\mathcal{I}(\theta_0) \mathcal{H}^{-1}(\theta_0)$$ where $$\mathcal{H}^{-1}$$ is the inverse Hessian and $$\mathcal{I}$$ is the Information matrix. The different types of covariance matrices use different approximation of these quantities.

The non-robust covariance matrix is computed as the inverse of the Fisher information matrix. This assumes that the information matrix equality holds.

The robust (HC-1) covariance matrix takes the form $$\mathbf{H}^{−1} (\hat{\theta})\mathbf{G}^{T}(\hat{\theta})\mathbf{G}(\hat{\theta}) \mathbf{H}^{−1}(\hat{\theta})$$ where $$\mathbf{H}$$ is the empirical Hessian and $$\mathbf{G}$$ is the gradient. We apply a finite-sample correction of $$\frac{N}{N-p}$$.

The clustered covariance matrix uses a similar approach to the robust (HC-1) covariance matrix. However, instead of using $$\mathbf{G}^{T}(\hat{\theta} \mathbf{G}(\hat{\theta})$$ directly, we first sum over all the groups first. The finite-sample correction is affected as well, becoming $$\frac{M}{M-1} \frac{N}{N-p}$$ where $$M$$ is the number of groups.

References

property family_instance: ExponentialDispersionModel

Return an ExponentialDispersionModel.

fit(X, y, sample_weight=None, offset=None, store_covariance_matrix=False, clusters=None, weights_sum=None)

Fit a Generalized Linear Model.

Parameters
• X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Training data. Note that a float32 matrix is acceptable and will result in the entire algorithm being run in 32-bit precision. However, for problems that are poorly conditioned, this might result in poor convergence or flawed parameter estimates. If a Pandas data frame is provided, it may contain categorical columns. In that case, a separate coefficient will be estimated for each category. No category is omitted. This means that some regularization is required to fit models with an intercept or models with several categorical columns.

• y (array-like, shape (n_samples,)) – Target values.

• sample_weight (array-like, shape (n_samples,), optional (default=None)) – Individual weights w_i for each sample. Note that, for an Exponential Dispersion Model (EDM), one has $$\mathrm{var}(y_i) = \phi \times v(mu) / w_i$$. If $$y_i \sim EDM(\mu, \phi / w_i)$$, then $$\sum w_i y_i / \sum w_i \sim EDM(\mu, \phi / \sum w_i)$$, i.e. the mean of $$y$$ is a weighted average with weights equal to sample_weight.

• offset (array-like, shape (n_samples,), optional (default=None)) – Added to linear predictor. An offset of 3 will increase expected y by 3 if the link is linear and will multiply expected y by 3 if the link is logarithmic.

• store_covariance_matrix (bool, optional (default=False)) – Whether to estimate and store the covariance matrix of the parameter estimates. If True, the covariance matrix will be available in the covariance_matrix_ attribute after fitting.

• clusters (array-like, optional, default=None) – Array with cluster membership. Clustered standard errors are computed if clusters is not None.

• weights_sum (float, optional (default=None)) –

Return type

self

get_formatted_diagnostics(full_report=False, custom_columns=None)

Get formatted diagnostics; can be printed with _report_diagnostics.

Parameters
• full_report (bool, optional (default=False)) – Print all available information. When False and custom_columns is None, a restricted set of columns is printed out.

• custom_columns (iterable, optional (default=None)) – Print only the specified columns.

Return type

Union[str, DataFrame]

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns

routing – A MetadataRequest encapsulating routing information.

Return type

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

dict

linear_predictor(X, offset=None, alpha_index=None, alpha=None)

Compute the linear predictor, X * coef_ + intercept_.

If alpha_search is True, but alpha_index and alpha are both None, we use the last alpha value self._alphas[-1].

Parameters
• X (array-like, shape (n_samples, n_features)) – Observations. X may be a pandas data frame with categorical types. If X was also a data frame with categorical types during fitting and a category wasn’t observed at that point, the corresponding prediction will be numpy.nan.

• offset (array-like, shape (n_samples,), optional (default=None)) –

• alpha_index (int or list[int], optional (default=None)) – Sets the index of the alpha(s) to use in case alpha_search is True. Incompatible with alpha (see below).

• alpha (float or list[float], optional (default=None)) – Sets the alpha(s) to use in case alpha_search is True. Incompatible with alpha_index (see above).

Returns

The linear predictor.

Return type

array, shape (n_samples, n_alphas)

Return a Link.

predict(X, sample_weight=None, offset=None, alpha_index=None, alpha=None)

Predict using GLM with feature matrix X.

If alpha_search is True, but alpha_index and alpha are both None, we use the last alpha value self._alphas[-1].

Parameters
• X (array-like, shape (n_samples, n_features)) – Observations. X may be a pandas data frame with categorical types. If X was also a data frame with categorical types during fitting and a category wasn’t observed at that point, the corresponding prediction will be numpy.nan.

• sample_weight (array-like, shape (n_samples,), optional (default=None)) – Sample weights to multiply predictions by.

• offset (array-like, shape (n_samples,), optional (default=None)) –

• alpha_index (int or list[int], optional (default=None)) – Sets the index of the alpha(s) to use in case alpha_search is True. Incompatible with alpha (see below).

• alpha (float or list[float], optional (default=None)) – Sets the alpha(s) to use in case alpha_search is True. Incompatible with alpha_index (see above).

Returns

Predicted values times sample_weight.

Return type

array, shape (n_samples, n_alphas)

report_diagnostics(full_report=False, custom_columns=None)

Print diagnostics to stdout.

Parameters
• full_report (bool, optional (default=False)) – Print all available information. When False and custom_columns is None, a restricted set of columns is printed out.

• custom_columns (iterable, optional (default=None)) – Print only the specified columns.

Return type

None

score(X, y, sample_weight=None, offset=None)

Compute $$D^2$$, the percentage of deviance explained.

$$D^2$$ is a generalization of the coefficient of determination $$R^2$$. The $$R^2$$ uses the squared error and the $$D^2$$, the deviance. Note that those two are equal for family='normal'.

$$D^2$$ is defined as $$D^2 = 1 - \frac{D(y_{\mathrm{true}}, y_{\mathrm{pred}})}{D_{\mathrm{null}}}$$, $$D_{\mathrm{null}}$$ is the null deviance, i.e. the deviance of a model with intercept alone. The best possible score is one and it can be negative.

Parameters
• X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Test samples.

• y (array-like, shape (n_samples,)) – True values of target.

• sample_weight (array-like, shape (n_samples,), optional (default=None)) – Sample weights.

• offset (array-like, shape (n_samples,), optional (default=None)) –

Returns

D^2 of self.predict(X) w.r.t. y.

Return type

float

set_fit_request(*, clusters='$UNCHANGED$', offset='$UNCHANGED$', sample_weight='$UNCHANGED$', store_covariance_matrix='$UNCHANGED$', weights_sum='$UNCHANGED$')

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

• True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

• False: metadata is not requested and the meta-estimator will not pass it to fit.

• None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

• str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters
• clusters (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for clusters parameter in fit.

• offset (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for offset parameter in fit.

• sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

• store_covariance_matrix (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for store_covariance_matrix parameter in fit.

• weights_sum (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for weights_sum parameter in fit.

• self (GeneralizedLinearRegressor) –

Returns

self – The updated object.

Return type

object

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance

set_predict_request(*, alpha='$UNCHANGED$', alpha_index='$UNCHANGED$', offset='$UNCHANGED$', sample_weight='$UNCHANGED$')

Request metadata passed to the predict method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

• True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

• False: metadata is not requested and the meta-estimator will not pass it to predict.

• None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

• str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters
• alpha (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for alpha parameter in predict.

• alpha_index (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for alpha_index parameter in predict.

• offset (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for offset parameter in predict.

• sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in predict.

• self (GeneralizedLinearRegressor) –

Returns

self – The updated object.

Return type

object

set_score_request(*, offset='$UNCHANGED$', sample_weight='$UNCHANGED$')

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

• True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

• False: metadata is not requested and the meta-estimator will not pass it to score.

• None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

• str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters
• offset (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for offset parameter in score.

• sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

• self (GeneralizedLinearRegressor) –

Returns

self – The updated object.

Return type

object

std_errors(X=None, y=None, mu=None, offset=None, sample_weight=None, dispersion=None, robust=None, clusters=None, expected_information=None, store_covariance_matrix=False)

Calculate standard errors for generalized linear models.

See covariance_matrix for an in-depth explanation of how the standard errors are computed.

Parameters
• X ({array-like, sparse matrix}, shape (n_samples, n_features), optional) – Training data. Can be omitted if a covariance matrix has already been computed.

• y (array-like, shape (n_samples,), optional) – Target values. Can be omitted if a covariance matrix has already been computed.

• mu (array-like, optional, default=None) – Array with predictions. Estimated if absent.

• offset (array-like, optional, default=None) – Array with additive offsets.

• sample_weight (array-like, shape (n_samples,), optional, default=None) – Individual weights for each sample.

• dispersion (float, optional, default=None) – The dispersion parameter. Estimated if absent.

• robust (boolean, optional, default=None) – Whether to compute robust standard errors instead of normal ones. If not specified, the model’s robust attribute is used.

• clusters (array-like, optional, default=None) – Array with cluster membership. Clustered standard errors are computed if clusters is not None.

• expected_information (boolean, optional, default=None) – Whether to use the expected or observed information matrix. Only relevant when computing robust standard errors. If not specified, the model’s expected_information attribute is used.

• store_covariance_matrix (boolean, optional, default=False) – Whether to store the covariance matrix in the model instance. If a covariance matrix has already been stored, it will be overwritten.

wald_test(R=None, features=None, r=None, X=None, y=None, mu=None, offset=None, sample_weight=None, dispersion=None, robust=None, clusters=None, expected_information=None)

Compute the Wald test statistic and p-value for a linear hypothesis.

The left hand side of the hypothesis may be specified in the following ways:

• R: The restriction matrix representing the linear combination of coefficients to test.

• features: The name of a feature or a list of features to test.

The right hand side of the tested hypothesis is specified by r.

Parameters
• R (np.ndarray, optional, default=None) – The restriction matrix representing the linear combination of coefficients to test.

• features (Union[str, list[str]], optional, default=None) – The name of a feature or a list of features to test.

• r (np.ndarray, optional, default=None) – The vector representing the values of the linear combination. If None, the test is for whether the linear combinations of the coefficients are zero.

• X ({array-like, sparse matrix}, shape (n_samples, n_features), optional) – Training data. Can be omitted if a covariance matrix has already been computed.

• y (array-like, shape (n_samples,), optional) – Target values. Can be omitted if a covariance matrix has already been computed.

• mu (array-like, optional, default=None) – Array with predictions. Estimated if absent.

• offset (array-like, optional, default=None) – Array with additive offsets.

• sample_weight (array-like, shape (n_samples,), optional, default=None) – Individual weights for each sample.

• dispersion (float, optional, default=None) – The dispersion parameter. Estimated if absent.

• robust (boolean, optional, default=None) – Whether to compute robust standard errors instead of normal ones. If not specified, the model’s robust attribute is used.

• clusters (array-like, optional, default=None) – Array with cluster membership. Clustered standard errors are computed if clusters is not None.

• expected_information (boolean, optional, default=None) – Whether to use the expected or observed information matrix. Only relevant when computing robust standard errors. If not specified, the model’s expected_information attribute is used.

Returns

NamedTuple with test statistic, p-value, and degrees of freedom.

Return type

WaldTestResult

class glum.GeneralizedLinearRegressorCV(l1_ratio=0, P1='identity', P2='identity', fit_intercept=True, family='normal', link='auto', solver='auto', max_iter=100, gradient_tol=None, step_size_tol=None, hessian_approx=0.0, warm_start=False, n_alphas=100, alphas=None, min_alpha_ratio=None, min_alpha=None, start_params=None, selection='cyclic', random_state=None, copy_X=True, check_input=True, verbose=0, scale_predictors=False, lower_bounds=None, upper_bounds=None, A_ineq=None, b_ineq=None, force_all_finite=True, cv=None, n_jobs=None, drop_first=False, robust=True, expected_information=False)

Bases: GeneralizedLinearRegressorBase

Generalized linear model with iterative fitting along a regularization path.

The best model is selected by cross-validation.

Cross-validated regression via a Generalized Linear Model (GLM) with penalties. For more on GLMs and on these parameters, see the documentation for GeneralizedLinearRegressor. CV conventions follow sklearn.linear_model.LassoCV.

Parameters
• l1_ratio (float or array of floats, optional (default=0)) – If you pass l1_ratio as an array, the fit method will choose the best value of l1_ratio and store it as self.l1_ratio.

• P1 ({'identity', array-like}, shape (n_features,), optional (default='identity')) – This array controls the strength of the regularization for each coefficient independently. A high value will lead to higher regularization while a value of zero will remove the regularization on this parameter. Note that n_features = X.shape[1]. If X is a pandas DataFrame with a categorical dtype and P1 has the same size as the number of columns, the penalty of the categorical column will be applied to all the levels of the categorical.

• P2 ({'identity', array-like, sparse matrix}, shape (n_features,) or (n_features, n_features), optional (default='identity')) – With this option, you can set the P2 matrix in the L2 penalty w*P2*w. This gives a fine control over this penalty (Tikhonov regularization). A 2d array is directly used as the square matrix P2. A 1d array is interpreted as diagonal (square) matrix. The default 'identity' sets the identity matrix, which gives the usual squared L2-norm. If you just want to exclude certain coefficients, pass a 1d array filled with 1 and 0 for the coefficients to be excluded. Note that P2 must be positive semi-definite. If X is a pandas DataFrame with a categorical dtype and P2 has the same size as the number of columns, the penalty of the categorical column will be applied to all the levels of the categorical. Note that if P2 is two-dimensional, its size needs to be of the same length as the expanded X matrix.

• fit_intercept (bool, optional (default=True)) – Specifies if a constant (a.k.a. bias or intercept) should be added to the linear predictor (X * coef + intercept).

• family (str or ExponentialDispersionModel, optional (default='normal')) – The distributional assumption of the GLM, i.e. the loss function to minimize. If a string, one of: 'binomial', 'gamma', 'gaussian', 'inverse.gaussian', 'normal', 'poisson', 'tweedie' or 'negative.binomial'. Note that 'tweedie' sets the power of the Tweedie distribution to 1.5; to use another value, specify it in parentheses (e.g., 'tweedie (1.5)'). The same applies for 'negative.binomial' and theta parameter.

• link ({'auto', 'identity', 'log', 'logit', 'cloglog'} or Link, optional (default='auto')) –

The link function of the GLM, i.e. mapping from linear predictor (X * coef) to expectation (mu). Option 'auto' sets the link depending on the chosen family as follows:

• 'identity' for family 'normal'

• 'log' for families 'poisson', 'gamma', 'inverse.gaussian' and 'negative.binomial'.

• 'logit' for family 'binomial'

• solver ({'auto', 'irls-cd', 'irls-ls', 'lbfgs'}, optional (default='auto')) –

Algorithm to use in the optimization problem:

• 'auto': 'irls-ls' if l1_ratio is zero and 'irls-cd' otherwise.

• 'irls-cd': Iteratively reweighted least squares with a coordinate descent inner solver. This can deal with L1 as well as L2 penalties. Note that in order to avoid unnecessary memory duplication of X in the fit method, X should be directly passed as a Fortran-contiguous Numpy array or sparse CSC matrix.

• 'irls-ls': Iteratively reweighted least squares with a least squares inner solver. This algorithm cannot deal with L1 penalties.

• 'lbfgs': Scipy’s L-BFGS-B optimizer. It cannot deal with L1 penalties.

• max_iter (int, optional (default=100)) – The maximal number of iterations for solver algorithms.

• gradient_tol (float, optional (default=None)) –

Stopping criterion. If None, solver-specific defaults will be used. The default value for most solvers is 1e-4, except for 'trust-constr', which requires more conservative convergence settings and has a default value of 1e-8.

For the IRLS-LS, L-BFGS and trust-constr solvers, the iteration will stop when max{|g_i|, i = 1, ..., n} <= tol, where g_i is the i-th component of the gradient (derivative) of the objective function. For the CD solver, convergence is reached when sum_i(|minimum norm of g_i|), where g_i is the subgradient of the objective and the minimum norm of g_i is the element of the subgradient with the smallest L2 norm.

If you wish to only use a step-size tolerance, set gradient_tol to a very small number.

• step_size_tol (float, optional (default=None)) – Alternative stopping criterion. For the IRLS-LS and IRLS-CD solvers, the iteration will stop when the L2 norm of the step size is less than step_size_tol. This stopping criterion is disabled when step_size_tol is None.

• hessian_approx (float, optional (default=0.0)) – The threshold below which data matrix rows will be ignored for updating the Hessian. See the algorithm documentation for the IRLS algorithm for further details.

• warm_start (bool, optional (default=False)) – Whether to reuse the solution of the previous call to fit as initialization for coef_ and intercept_ (supersedes start_params). If False or if the attribute coef_ does not exist (first call to fit), start_params sets the start values for coef_ and intercept_.

• n_alphas (int, optional (default=100)) – Number of alphas along the regularization path

• alphas (array-like, optional (default=None)) – List of alphas for which to compute the models. If None, the alphas are set automatically. Setting None is preferred.

• min_alpha_ratio (float, optional (default=None)) – Length of the path. min_alpha_ratio=1e-6 means that min_alpha / max_alpha = 1e-6. If None, 1e-6 is used.

• min_alpha (float, optional (default=None)) – Minimum alpha to estimate the model with. The grid will then be created over [max_alpha, min_alpha].

• start_params (array-like, shape (n_features*,), optional (default=None)) – Relevant only if warm_start is False or if fit is called for the first time (so that self.coef_ does not exist yet). If None, all coefficients are set to zero and the start value for the intercept is the weighted average of y (If fit_intercept is True). If an array, used directly as start values; if fit_intercept is True, its first element is assumed to be the start value for the intercept_. Note that n_features* = X.shape[1] + fit_intercept, i.e. it includes the intercept.

• selection (str, optional (default='cyclic')) – For the CD solver ‘cd’, the coordinates (features) can be updated in either cyclic or random order. If set to 'random', a random coefficient is updated every iteration rather than looping over features sequentially in the same order, which often leads to significantly faster convergence, especially when gradient_tol is higher than 1e-4.

• random_state (int or RandomState, optional (default=None)) – The seed of the pseudo random number generator that selects a random feature to be updated for the CD solver. If an integer, random_state is the seed used by the random number generator; if a RandomState instance, random_state is the random number generator; if None, the random number generator is the RandomState instance used by np.random. Used when selection is 'random'.

• copy_X (bool, optional (default=None)) – Whether to copy X. Since X is never modified by GeneralizedLinearRegressor, this is unlikely to be needed; this option exists mainly for compatibility with other scikit-learn estimators. If False, X will not be copied and there will be an error if you pass an X in the wrong format, such as providing integer X and float y. If None, X will not be copied unless it is in the wrong format.

• check_input (bool, optional (default=True)) – Whether to bypass several checks on input: y values in range of family, sample_weight non-negative, P2 positive semi-definite. Don’t use this parameter unless you know what you are doing.

• verbose (int, optional (default=0)) – For the IRLS solver, any positive number will result in a pretty progress bar showing convergence. This features requires having the tqdm package installed. For the L-BFGS solver, set verbose to any positive number for verbosity.

• scale_predictors (bool, optional (default=False)) –

If True, estimate a scaled model where all predictors have a standard deviation of 1. This can result in better estimates if predictors are on very different scales (for example, centimeters and kilometers).

Advanced developer note: Internally, predictors are always rescaled for computational reasons, but this only affects results if scale_predictors is True.

• lower_bounds (array-like, shape (n_features), optional (default=None)) – Set a lower bound for the coefficients. Setting bounds forces the use of the coordinate descent solver ('irls-cd').

• upper_bounds (array-like, shape=(n_features), optional (default=None)) – See lower_bounds.

• A_ineq (array-like, shape=(n_constraints, n_features), optional (default=None)) – Constraint matrix for linear inequality constraints of the form A_ineq w <= b_ineq.

• b_ineq (array-like, shape=(n_constraints,), optional (default=None)) – Constraint vector for linear inequality constraints of the form A_ineq w <= b_ineq.

• cv (int, cross-validation generator or Iterable, optional (default=None)) –

Determines the cross-validation splitting strategy. One of:

• None, to use the default 5-fold cross-validation,

• int, to specify the number of folds.

• Iterable yielding (train, test) splits as arrays of indices.

For integer/None inputs, KFold is used

• n_jobs (int, optional (default=None)) – The maximum number of concurrently running jobs. The number of jobs that are needed is len(l1_ratio) x n_folds. -1 is the same as the number of CPU on your machine. None means 1 unless in a joblib.parallel_backend context.

• drop_first (bool, optional (default = False)) – If True, drop the first column when encoding categorical variables.

• force_all_finite (bool) –

• robust (bool) –

• expected_information (bool) –

alpha_

The amount of regularization chosen by cross validation.

Type

float

alphas_

Alphas used by the model.

Type

array, shape (n_l1_ratios, n_alphas)

l1_ratio_

The compromise between L1 and L2 regularization chosen by cross validation.

Type

float

coef_

Estimated coefficients for the linear predictor in the GLM at the optimal (l1_ratio_, alpha_).

Type

array, shape (n_features,)

intercept_

Intercept (a.k.a. bias) added to linear predictor.

Type

float

n_iter_

The number of iterations run by the CD solver to reach the specified tolerance for the optimal alpha.

Type

int

coef_path_

Estimated coefficients for the linear predictor in the GLM at every point along the regularization path.

Type

array, shape (n_folds, n_l1_ratios, n_alphas, n_features)

deviance_path_

Deviance for the test set on each fold, varying alpha.

Type

array, shape(n_folds, n_alphas)

robust

If true, then robust standard errors are computed by default.

Type

bool, optional (default = False)

expected_information

If true, then the expected information matrix is computed by default. Only relevant when computing robust standard errors.

Type

bool, optional (default = False)

coef_table(confidence_level=0.95, X=None, y=None, mu=None, offset=None, sample_weight=None, dispersion=None, robust=None, clusters=None, expected_information=None)

Get a table of of the regression coefficients.

Includes coefficient estimates, standard errors, t-values, p-values and confidence intervals.

Parameters
• confidence_level (float, optional, default=0.95) – The confidence level for the confidence intervals.

• X ({array-like, sparse matrix}, shape (n_samples, n_features), optional) – Training data. Can be omitted if a covariance matrix has already been computed.

• y (array-like, shape (n_samples,), optional) – Target values. Can be omitted if a covariance matrix has already been computed.

• mu (array-like, optional, default=None) – Array with predictions. Estimated if absent.

• offset (array-like, optional, default=None) – Array with additive offsets.

• sample_weight (array-like, shape (n_samples,), optional, default=None) – Individual weights for each sample.

• dispersion (float, optional, default=None) – The dispersion parameter. Estimated if absent.

• robust (boolean, optional, default=None) – Whether to compute robust standard errors instead of normal ones. If not specified, the model’s robust attribute is used.

• clusters (array-like, optional, default=None) – Array with cluster membership. Clustered standard errors are computed if clusters is not None.

• expected_information (boolean, optional, default=None) – Whether to use the expected or observed information matrix. Only relevant when computing robust standard errors. If not specified, the model’s expected_information attribute is used.

Returns

A table of the regression results.

Return type

pandas.DataFrame

covariance_matrix(X=None, y=None, mu=None, offset=None, sample_weight=None, dispersion=None, robust=None, clusters=None, expected_information=None, store_covariance_matrix=False, skip_checks=False)

Calculate the covariance matrix for generalized linear models.

Parameters
• X ({array-like, sparse matrix}, shape (n_samples, n_features), optional) – Training data. Can be omitted if a covariance matrix has already been computed.

• y (array-like, shape (n_samples,), optional) – Target values. Can be omitted if a covariance matrix has already been computed.

• mu (array-like, optional, default=None) – Array with predictions. Estimated if absent.

• offset (array-like, optional, default=None) – Array with additive offsets.

• sample_weight (array-like, shape (n_samples,), optional, default=None) – Individual weights for each sample.

• dispersion (float, optional, default=None) – The dispersion parameter. Estimated if absent.

• robust (boolean, optional, default=None) – Whether to compute robust standard errors instead of normal ones. If not specified, the model’s robust attribute is used.

• clusters (array-like, optional, default=None) – Array with cluster membership. Clustered standard errors are computed if clusters is not None.

• expected_information (boolean, optional, default=None) – Whether to use the expected or observed information matrix. Only relevant when computing robust standard errors. If not specified, the model’s expected_information attribute is used.

• store_covariance_matrix (boolean, optional, default=False) – Whether to store the covariance matrix in the model instance. If a covariance matrix has already been stored, it will be overwritten.

• skip_checks (boolean, optional, default=False) – Whether to skip input validation. For internal use only.

Notes

We support three types of covariance matrices:

• non-robust

• robust (HC-1)

• clustered

For maximum-likelihood estimator, the covariance matrix takes the form $$\mathcal{H}^{-1}(\theta_0)\mathcal{I}(\theta_0) \mathcal{H}^{-1}(\theta_0)$$ where $$\mathcal{H}^{-1}$$ is the inverse Hessian and $$\mathcal{I}$$ is the Information matrix. The different types of covariance matrices use different approximation of these quantities.

The non-robust covariance matrix is computed as the inverse of the Fisher information matrix. This assumes that the information matrix equality holds.

The robust (HC-1) covariance matrix takes the form $$\mathbf{H}^{−1} (\hat{\theta})\mathbf{G}^{T}(\hat{\theta})\mathbf{G}(\hat{\theta}) \mathbf{H}^{−1}(\hat{\theta})$$ where $$\mathbf{H}$$ is the empirical Hessian and $$\mathbf{G}$$ is the gradient. We apply a finite-sample correction of $$\frac{N}{N-p}$$.

The clustered covariance matrix uses a similar approach to the robust (HC-1) covariance matrix. However, instead of using $$\mathbf{G}^{T}(\hat{\theta} \mathbf{G}(\hat{\theta})$$ directly, we first sum over all the groups first. The finite-sample correction is affected as well, becoming $$\frac{M}{M-1} \frac{N}{N-p}$$ where $$M$$ is the number of groups.

References

property family_instance: ExponentialDispersionModel

Return an ExponentialDispersionModel.

fit(X, y, sample_weight=None, offset=None, store_covariance_matrix=False, clusters=None)

Choose the best model along a ‘regularization path’ by cross-validation.

Parameters
• X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Training data. Note that a float32 matrix is acceptable and will result in the entire algorithm being run in 32-bit precision. However, for problems that are poorly conditioned, this might result in poor convergence or flawed parameter estimates. If a Pandas data frame is provided, it may contain categorical columns. In that case, a separate coefficient will be estimated for each category. No category is omitted. This means that some regularization is required to fit models with an intercept or models with several categorical columns.

• y (array-like, shape (n_samples,)) – Target values.

• sample_weight (array-like, shape (n_samples,), optional (default=None)) – Individual weights w_i for each sample. Note that, for an Exponential Dispersion Model (EDM), one has $$\mathrm{var}(y_i) = \phi \times v(mu) / w_i$$. If $$y_i \sim EDM(\mu, \phi / w_i)$$, then $$\sum w_i y_i / \sum w_i \sim EDM(\mu, \phi / \sum w_i)$$, i.e. the mean of $$y$$ is a weighted average with weights equal to sample_weight.

• offset (array-like, shape (n_samples,), optional (default=None)) – Added to linear predictor. An offset of 3 will increase expected y by 3 if the link is linear and will multiply expected y by 3 if the link is logarithmic.

• store_covariance_matrix (bool, optional (default=False)) – Whether to store the covariance matrix of the parameter estimates corresponding to the best best model.

• clusters (array-like, optional, default=None) – Array with cluster membership. Clustered standard errors are computed if clusters is not None.

get_formatted_diagnostics(full_report=False, custom_columns=None)

Get formatted diagnostics; can be printed with _report_diagnostics.

Parameters
• full_report (bool, optional (default=False)) – Print all available information. When False and custom_columns is None, a restricted set of columns is printed out.

• custom_columns (iterable, optional (default=None)) – Print only the specified columns.

Return type

Union[str, DataFrame]

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns

routing – A MetadataRequest encapsulating routing information.

Return type

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

dict

linear_predictor(X, offset=None, alpha_index=None, alpha=None)

Compute the linear predictor, X * coef_ + intercept_.

If alpha_search is True, but alpha_index and alpha are both None, we use the last alpha value self._alphas[-1].

Parameters
• X (array-like, shape (n_samples, n_features)) – Observations. X may be a pandas data frame with categorical types. If X was also a data frame with categorical types during fitting and a category wasn’t observed at that point, the corresponding prediction will be numpy.nan.

• offset (array-like, shape (n_samples,), optional (default=None)) –

• alpha_index (int or list[int], optional (default=None)) – Sets the index of the alpha(s) to use in case alpha_search is True. Incompatible with alpha (see below).

• alpha (float or list[float], optional (default=None)) – Sets the alpha(s) to use in case alpha_search is True. Incompatible with alpha_index (see above).

Returns

The linear predictor.

Return type

array, shape (n_samples, n_alphas)

Return a Link.

predict(X, sample_weight=None, offset=None, alpha_index=None, alpha=None)

Predict using GLM with feature matrix X.

If alpha_search is True, but alpha_index and alpha are both None, we use the last alpha value self._alphas[-1].

Parameters
• X (array-like, shape (n_samples, n_features)) – Observations. X may be a pandas data frame with categorical types. If X was also a data frame with categorical types during fitting and a category wasn’t observed at that point, the corresponding prediction will be numpy.nan.

• sample_weight (array-like, shape (n_samples,), optional (default=None)) – Sample weights to multiply predictions by.

• offset (array-like, shape (n_samples,), optional (default=None)) –

• alpha_index (int or list[int], optional (default=None)) – Sets the index of the alpha(s) to use in case alpha_search is True. Incompatible with alpha (see below).

• alpha (float or list[float], optional (default=None)) – Sets the alpha(s) to use in case alpha_search is True. Incompatible with alpha_index (see above).

Returns

Predicted values times sample_weight.

Return type

array, shape (n_samples, n_alphas)

report_diagnostics(full_report=False, custom_columns=None)

Print diagnostics to stdout.

Parameters
• full_report (bool, optional (default=False)) – Print all available information. When False and custom_columns is None, a restricted set of columns is printed out.

• custom_columns (iterable, optional (default=None)) – Print only the specified columns.

Return type

None

score(X, y, sample_weight=None, offset=None)

Compute $$D^2$$, the percentage of deviance explained.

$$D^2$$ is a generalization of the coefficient of determination $$R^2$$. The $$R^2$$ uses the squared error and the $$D^2$$, the deviance. Note that those two are equal for family='normal'.

$$D^2$$ is defined as $$D^2 = 1 - \frac{D(y_{\mathrm{true}}, y_{\mathrm{pred}})}{D_{\mathrm{null}}}$$, $$D_{\mathrm{null}}$$ is the null deviance, i.e. the deviance of a model with intercept alone. The best possible score is one and it can be negative.

Parameters
• X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Test samples.

• y (array-like, shape (n_samples,)) – True values of target.

• sample_weight (array-like, shape (n_samples,), optional (default=None)) – Sample weights.

• offset (array-like, shape (n_samples,), optional (default=None)) –

Returns

D^2 of self.predict(X) w.r.t. y.

Return type

float

set_fit_request(*, clusters='$UNCHANGED$', offset='$UNCHANGED$', sample_weight='$UNCHANGED$', store_covariance_matrix='$UNCHANGED$')

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

• True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

• False: metadata is not requested and the meta-estimator will not pass it to fit.

• None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

• str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters
• clusters (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for clusters parameter in fit.

• offset (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for offset parameter in fit.

• sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

• store_covariance_matrix (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for store_covariance_matrix parameter in fit.

• self (GeneralizedLinearRegressorCV) –

Returns

self – The updated object.

Return type

object

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance

set_predict_request(*, alpha='$UNCHANGED$', alpha_index='$UNCHANGED$', offset='$UNCHANGED$', sample_weight='$UNCHANGED$')

Request metadata passed to the predict method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

• True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

• False: metadata is not requested and the meta-estimator will not pass it to predict.

• None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

• str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters
• alpha (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for alpha parameter in predict.

• alpha_index (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for alpha_index parameter in predict.

• offset (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for offset parameter in predict.

• sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in predict.

• self (GeneralizedLinearRegressorCV) –

Returns

self – The updated object.

Return type

object

set_score_request(*, offset='$UNCHANGED$', sample_weight='$UNCHANGED$')

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

• True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

• False: metadata is not requested and the meta-estimator will not pass it to score.

• None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

• str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters
• offset (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for offset parameter in score.

• sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

• self (GeneralizedLinearRegressorCV) –

Returns

self – The updated object.

Return type

object

std_errors(X=None, y=None, mu=None, offset=None, sample_weight=None, dispersion=None, robust=None, clusters=None, expected_information=None, store_covariance_matrix=False)

Calculate standard errors for generalized linear models.

See covariance_matrix for an in-depth explanation of how the standard errors are computed.

Parameters
• X ({array-like, sparse matrix}, shape (n_samples, n_features), optional) – Training data. Can be omitted if a covariance matrix has already been computed.

• y (array-like, shape (n_samples,), optional) – Target values. Can be omitted if a covariance matrix has already been computed.

• mu (array-like, optional, default=None) – Array with predictions. Estimated if absent.

• offset (array-like, optional, default=None) – Array with additive offsets.

• sample_weight (array-like, shape (n_samples,), optional, default=None) – Individual weights for each sample.

• dispersion (float, optional, default=None) – The dispersion parameter. Estimated if absent.

• robust (boolean, optional, default=None) – Whether to compute robust standard errors instead of normal ones. If not specified, the model’s robust attribute is used.

• clusters (array-like, optional, default=None) – Array with cluster membership. Clustered standard errors are computed if clusters is not None.

• expected_information (boolean, optional, default=None) – Whether to use the expected or observed information matrix. Only relevant when computing robust standard errors. If not specified, the model’s expected_information attribute is used.

• store_covariance_matrix (boolean, optional, default=False) – Whether to store the covariance matrix in the model instance. If a covariance matrix has already been stored, it will be overwritten.

wald_test(R=None, features=None, r=None, X=None, y=None, mu=None, offset=None, sample_weight=None, dispersion=None, robust=None, clusters=None, expected_information=None)

Compute the Wald test statistic and p-value for a linear hypothesis.

The left hand side of the hypothesis may be specified in the following ways:

• R: The restriction matrix representing the linear combination of coefficients to test.

• features: The name of a feature or a list of features to test.

The right hand side of the tested hypothesis is specified by r.

Parameters
• R (np.ndarray, optional, default=None) – The restriction matrix representing the linear combination of coefficients to test.

• features (Union[str, list[str]], optional, default=None) – The name of a feature or a list of features to test.

• r (np.ndarray, optional, default=None) – The vector representing the values of the linear combination. If None, the test is for whether the linear combinations of the coefficients are zero.

• X ({array-like, sparse matrix}, shape (n_samples, n_features), optional) – Training data. Can be omitted if a covariance matrix has already been computed.

• y (array-like, shape (n_samples,), optional) – Target values. Can be omitted if a covariance matrix has already been computed.

• mu (array-like, optional, default=None) – Array with predictions. Estimated if absent.

• offset (array-like, optional, default=None) – Array with additive offsets.

• sample_weight (array-like, shape (n_samples,), optional, default=None) – Individual weights for each sample.

• dispersion (float, optional, default=None) – The dispersion parameter. Estimated if absent.

• robust (boolean, optional, default=None) – Whether to compute robust standard errors instead of normal ones. If not specified, the model’s robust attribute is used.

• clusters (array-like, optional, default=None) – Array with cluster membership. Clustered standard errors are computed if clusters is not None.

• expected_information (boolean, optional, default=None) – Whether to use the expected or observed information matrix. Only relevant when computing robust standard errors. If not specified, the model’s expected_information attribute is used.

Returns

NamedTuple with test statistic, p-value, and degrees of freedom.

Return type

WaldTestResult

Bases: Link

The identity link function g(x) = x.

derivative(mu)

Get the derivative of the identity link, a vector of ones.

See superclass documentation.

Parameters

mu (array-like) –

inverse(lin_pred)

Compute the inverse link function h(lin_pred).

Gives the inverse relationship between linear predictor and the mean mu ≡ E(Y), i.e. h(linear predictor) = mu.

Parameters

lin_pred (array-like, shape (n_samples,)) – Usually the (fitted) linear predictor.

inverse_derivative(lin_pred)

Compute the derivative of the inverse link function h'(lin_pred).

Parameters

lin_pred (array-like, shape (n_samples,)) – Usually the (fitted) linear predictor.

inverse_derivative2(lin_pred)

Compute second derivative of the inverse link function h''(lin_pred).

Parameters

lin_pred (array-like, shape (n_samples,)) – Usually the (fitted) linear predictor.

See superclass documentation.

Parameters

mu (array-like) –

class glum.InverseGaussianDistribution

Class for the scaled Inverse Gaussian distribution.

deviance(y, mu, sample_weight=None)

Compute the deviance.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Sample weights.

Return type

float

deviance_derivative(y, mu, sample_weight=1)

Compute the derivative of the deviance with respect to mu.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

• sample_weight (array-like, shape (n_samples,) (default=1)) – Weights or exposure to which variance is inverse proportional.

Return type

array-like, shape (n_samples,)

dispersion(y, mu, sample_weight=None, ddof=1, method='pearson')

Estimate the dispersion parameter $$\phi$$.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

• sample_weight (array-like, shape (n_samples,), optional (default=None)) – Weights or exposure to which variance is inversely proportional.

• ddof (int, optional (default=1)) – Degrees of freedom consumed by the model for mu.

• {'pearson' (method =) – Whether to base the estimate on the Pearson residuals or the deviance.

• 'deviance'} – Whether to base the estimate on the Pearson residuals or the deviance.

• (default='pearson') (optional) – Whether to base the estimate on the Pearson residuals or the deviance.

Return type

float

eta_mu_deviance(link, factor, cur_eta, X_dot_d, y, sample_weight)

Compute eta, mu and the deviance.

Compute: * the linear predictor, eta, as cur_eta + factor * X_dot_d; * the link-function-transformed prediction, mu; * the deviance.

Returns

• numpy.ndarray, shape (X.shape[0],) – The linear predictor, eta.

• numpy.ndarray, shape (X.shape[0],) – The link-function-transformed prediction, mu.

• float – The deviance.

Parameters

• factor (float) –

• cur_eta (ndarray) –

• X_dot_d (ndarray) –

• y (ndarray) –

• sample_weight (ndarray) –

in_y_range(x)

Return True if x is in the valid range of the EDM.

Parameters

x (array-like, shape (n_samples,)) – Target values.

Return type

np.ndarray

property include_lower_bound: bool

Return whether lower_bound is allowed as a value of y.

log_likelihood(y, mu, sample_weight=None, dispersion=None)

Compute the log likelihood.

For 1 < power < 2, we use the series approximation by Dunn and Smyth (2005) to compute the normalization term.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Sample weights.

• dispersion (float, optional (default=None)) – Dispersion parameter $$\phi$$. Estimated if None.

Return type

float

property lower_bound: float

Return the lowest value of y allowed.

property power: float

Return the Tweedie power parameter.

Compute the gradient and negative Hessian of the log likelihood row-wise.

Returns

• numpy.ndarray, shape (X.shape[0],) – The gradient of the log likelihood, row-wise.

• numpy.ndarray, shape (X.shape[0],) – The negative Hessian of the log likelihood, row-wise.

Parameters

• coef (ndarray) –

• X (Union[MatrixBase, StandardizedMatrix]) –

• y (ndarray) –

• sample_weight (ndarray) –

• eta (ndarray) –

• mu (ndarray) –

• offset (ndarray) –

unit_deviance(y, mu)

Get the deviance of each observation.

unit_deviance_derivative(y, mu)

Compute the derivative of the unit deviance with respect to mu.

The derivative of the unit deviance is given by $$-2 \times (y - \mu) / v(\mu)$$, where $$v(\mu)$$ is the unit variance.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

Return type

array-like, shape (n_samples,)

unit_variance(mu)

Compute the unit variance of a Tweedie distribution v(mu) = mu^power.

Parameters

mu (array-like, shape (n_samples,)) – Predicted mean.

Return type

numpy.ndarray, shape (n_samples,)

unit_variance_derivative(mu)

Compute the derivative of the unit variance of a Tweedie distribution.

Equation: $$v(\mu) = p \times \mu^{(p-1)}$$.

Parameters

mu (array-like, shape (n_samples,)) – Predicted mean.

Return type

numpy.ndarray, shape (n_samples,)

variance(mu, dispersion=1, sample_weight=1)

Compute the variance function.

The variance of $$Y_i \sim \mathrm{EDM}(\mu_i, \phi / s_i)$$ is $$\mathrm{var}(Y_i) = (\phi / s_i) * v(\mu_i)$$, with unit variance $$v(\mu)$$ and weights $$s_i$$.

Parameters
• mu (array-like, shape (n_samples,)) – Predicted mean.

• dispersion (float, optional (default=1)) – Dispersion parameter $$\phi$$.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.

Return type

array-like, shape (n_samples,)

variance_derivative(mu, dispersion=1, sample_weight=1)

Compute the derivative of the variance with respect to mu.

The derivative of the variance is equal to $$(\phi / s_i) * v'(\mu_i)$$, where $$v(\mu)$$ is the unit variance and $$s_i$$ are weights.

Parameters
• mu (array-like, shape (n_samples,)) – Predicted mean.

• dispersion (float, optional (default=1)) – Dispersion parameter $$\phi$$.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.

Return type

array-like, shape (n_samples,)

Bases: object

Abstract base class for Link functions.

abstract derivative(mu)

Compute the derivative of the link g'(mu).

Parameters

mu (array-like, shape (n_samples,)) – Usually the (predicted) mean.

abstract inverse(lin_pred)

Compute the inverse link function h(lin_pred).

Gives the inverse relationship between linear predictor, lin_pred ≡ X * w, and the mean, mu ≡ E(Y), i.e. h(lin_pred) = mu.

Parameters

lin_pred (array-like, shape (n_samples,)) – Usually the (fitted) linear predictor.

abstract inverse_derivative(lin_pred)

Compute the derivative of the inverse link function h'(lin_pred).

Parameters

lin_pred (array-like, shape (n_samples,)) – Usually the (fitted) linear predictor.

abstract inverse_derivative2(lin_pred)

Compute second derivative of the inverse link function h''(lin_pred).

Parameters

lin_pred (array-like, shape (n_samples,)) – Usually the (fitted) linear predictor.

Compute the link function g(mu).

The link function links the mean, mu ≡ E(Y), to the linear predictor X * w, i.e. g(mu) is equal to the linear predictor.

Parameters

mu (array-like, shape (n_samples,)) – Usually the (predicted) mean.

Bases: Link

The log link function log(x).

derivative(mu)

Get the derivative of the log link, one over mu.

Parameters

mu (array-like) –

Return type

numpy.ndarray

inverse(lin_pred)

Get the inverse of the log link, the exponential function.

See superclass documentation.

Parameters

lin_pred (array-like) –

Return type

numpy.ndarray

inverse_derivative(lin_pred)

Compute the derivative of the inverse link function h'(lin_pred).

Parameters

lin_pred (array-like, shape (n_samples,)) – Usually the (fitted) linear predictor.

inverse_derivative2(lin_pred)

Compute 2nd derivative of the inverse link function h''(lin_pred).

Parameters

lin_pred (array-like, shape (n_samples,)) – Usually the (fitted) linear predictor.

Get the log of mu.

See superclass documentation.

Parameters

mu (array-like) –

Return type

numpy.ndarray

Bases: Link

The logit link function logit(x).

derivative(mu)

Get the derivative of the logit link.

See superclass documentation.

Parameters

mu (array-like) –

Return type

array-like

inverse(lin_pred)

Get the inverse of the logit link.

See superclass documentation.

Note: since passing a very large value might result in an output of one, this function bounds the output to be between [50*eps, 1 - 50*eps], where eps is floating point epsilon.

Parameters

lin_pred (array-like) –

Return type

array-like

inverse_derivative(lin_pred)

Compute the derivative of the inverse link function h'(lin_pred).

Parameters

lin_pred (array, shape (n_samples,)) – Usually the (fitted) linear predictor.

inverse_derivative2(lin_pred)

Compute 2nd derivative of the inverse link function h''(lin_pred).

Parameters

lin_pred (array, shape (n_samples,)) – Usually the (fitted) linear predictor.

Get the logit function of mu.

See superclass documentation.

Parameters

mu (array-like) –

Return type

numpy.ndarray

class glum.NegativeBinomialDistribution(theta=1.0)

A class for the Negative Binomial distribution.

A Negative Binomial distribution with mean $$\mu = \mathrm{E}(Y)$$ is uniquely defined by its mean-variance relationship $$\mathrm{var}(Y) \propto \mu + \theta * \mu^2$$.

Parameters

theta (float, optional (default=1.0)) – The dispersion parameter from unit_variance $$v(\mu) = \mu + \theta * \mu^2$$. For $$\theta <= 0$$, no distribution exists.

References

For the log-likelihood and deviance:
deviance(y, mu, sample_weight=None)

Compute the deviance.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Sample weights.

Return type

float

deviance_derivative(y, mu, sample_weight=1)

Compute the derivative of the deviance with respect to mu.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

• sample_weight (array-like, shape (n_samples,) (default=1)) – Weights or exposure to which variance is inverse proportional.

Return type

array-like, shape (n_samples,)

dispersion(y, mu, sample_weight=None, ddof=1, method='pearson')

Estimate the dispersion parameter $$\phi$$.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

• sample_weight (array-like, shape (n_samples,), optional (default=None)) – Weights or exposure to which variance is inversely proportional.

• ddof (int, optional (default=1)) – Degrees of freedom consumed by the model for mu.

• {'pearson' (method =) – Whether to base the estimate on the Pearson residuals or the deviance.

• 'deviance'} – Whether to base the estimate on the Pearson residuals or the deviance.

• (default='pearson') (optional) – Whether to base the estimate on the Pearson residuals or the deviance.

Return type

float

eta_mu_deviance(link, factor, cur_eta, X_dot_d, y, sample_weight)

Compute eta, mu and the deviance.

Compute: * the linear predictor, eta, as cur_eta + factor * X_dot_d; * the link-function-transformed prediction, mu; * the deviance.

Returns

• numpy.ndarray, shape (X.shape[0],) – The linear predictor, eta.

• numpy.ndarray, shape (X.shape[0],) – The link-function-transformed prediction, mu.

• float – The deviance.

Parameters

• factor (float) –

• cur_eta (ndarray) –

• X_dot_d (ndarray) –

• y (ndarray) –

• sample_weight (ndarray) –

in_y_range(x)

Return True if x is in the valid range of the EDM.

Parameters

x (array-like, shape (n_samples,)) – Target values.

Return type

np.ndarray

log_likelihood(y, mu, sample_weight=None, dispersion=1)

Compute the log likelihood.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Sample weights.

• dispersion (float, optional (default=1.0)) – Ignored.

Return type

float

Compute the gradient and negative Hessian of the log likelihood row-wise.

Returns

• numpy.ndarray, shape (X.shape[0],) – The gradient of the log likelihood, row-wise.

• numpy.ndarray, shape (X.shape[0],) – The negative Hessian of the log likelihood, row-wise.

Parameters

• coef (ndarray) –

• X (Union[MatrixBase, StandardizedMatrix]) –

• y (ndarray) –

• sample_weight (ndarray) –

• eta (ndarray) –

• mu (ndarray) –

• offset (ndarray) –

property theta: float

Return the Negative Binomial theta parameter.

unit_deviance(y, mu)

Get the deviance of each observation.

Parameters
• y (ndarray) –

• mu (ndarray) –

Return type

ndarray

unit_deviance_derivative(y, mu)

Compute the derivative of the unit deviance with respect to mu.

The derivative of the unit deviance is given by $$-2 \times (y - \mu) / v(\mu)$$, where $$v(\mu)$$ is the unit variance.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

Return type

array-like, shape (n_samples,)

unit_variance(mu)

Compute the unit variance of a Negative Binomial distribution v(mu) = mu + theta * mu^2.

Parameters

mu (array-like, shape (n_samples,)) – Predicted mean.

Return type

numpy.ndarray, shape (n_samples,)

unit_variance_derivative(mu)

Compute the derivative of the unit variance of a Negative Binomial distribution.

Equation: $$v(\mu) = 1 + 2 \times \theta \times \mu$$.

Parameters

mu (array-like, shape (n_samples,)) – Predicted mean.

Return type

numpy.ndarray, shape (n_samples,)

variance(mu, dispersion=1, sample_weight=1)

Compute the variance function.

The variance of $$Y_i \sim \mathrm{EDM}(\mu_i, \phi / s_i)$$ is $$\mathrm{var}(Y_i) = (\phi / s_i) * v(\mu_i)$$, with unit variance $$v(\mu)$$ and weights $$s_i$$.

Parameters
• mu (array-like, shape (n_samples,)) – Predicted mean.

• dispersion (float, optional (default=1)) – Dispersion parameter $$\phi$$.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.

Return type

array-like, shape (n_samples,)

variance_derivative(mu, dispersion=1, sample_weight=1)

Compute the derivative of the variance with respect to mu.

The derivative of the variance is equal to $$(\phi / s_i) * v'(\mu_i)$$, where $$v(\mu)$$ is the unit variance and $$s_i$$ are weights.

Parameters
• mu (array-like, shape (n_samples,)) – Predicted mean.

• dispersion (float, optional (default=1)) – Dispersion parameter $$\phi$$.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.

Return type

array-like, shape (n_samples,)

class glum.NormalDistribution

Class for the Normal (a.k.a. Gaussian) distribution.

deviance(y, mu, sample_weight=None)

Compute the deviance.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Sample weights.

Return type

float

deviance_derivative(y, mu, sample_weight=1)

Compute the derivative of the deviance with respect to mu.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

• sample_weight (array-like, shape (n_samples,) (default=1)) – Weights or exposure to which variance is inverse proportional.

Return type

array-like, shape (n_samples,)

dispersion(y, mu, sample_weight=None, ddof=1, method='pearson')

Estimate the dispersion parameter $$\phi$$.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

• sample_weight (array-like, shape (n_samples,), optional (default=None)) – Weights or exposure to which variance is inversely proportional.

• ddof (int, optional (default=1)) – Degrees of freedom consumed by the model for mu.

• {'pearson' (method =) – Whether to base the estimate on the Pearson residuals or the deviance.

• 'deviance'} – Whether to base the estimate on the Pearson residuals or the deviance.

• (default='pearson') (optional) – Whether to base the estimate on the Pearson residuals or the deviance.

Return type

float

eta_mu_deviance(link, factor, cur_eta, X_dot_d, y, sample_weight)

Compute eta, mu and the deviance.

Compute: * the linear predictor, eta, as cur_eta + factor * X_dot_d; * the link-function-transformed prediction, mu; * the deviance.

Returns

• numpy.ndarray, shape (X.shape[0],) – The linear predictor, eta.

• numpy.ndarray, shape (X.shape[0],) – The link-function-transformed prediction, mu.

• float – The deviance.

Parameters

• factor (float) –

• cur_eta (ndarray) –

• X_dot_d (ndarray) –

• y (ndarray) –

• sample_weight (ndarray) –

in_y_range(x)

Return True if x is in the valid range of the EDM.

Parameters

x (array-like, shape (n_samples,)) – Target values.

Return type

np.ndarray

property include_lower_bound: bool

Return whether lower_bound is allowed as a value of y.

log_likelihood(y, mu, sample_weight=None, dispersion=None)

Compute the log likelihood.

For 1 < power < 2, we use the series approximation by Dunn and Smyth (2005) to compute the normalization term.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Sample weights.

• dispersion (float, optional (default=None)) – Dispersion parameter $$\phi$$. Estimated if None.

Return type

float

property lower_bound: float

Return the lowest value of y allowed.

property power: float

Return the Tweedie power parameter.

Compute the gradient and negative Hessian of the log likelihood row-wise.

Returns

• numpy.ndarray, shape (X.shape[0],) – The gradient of the log likelihood, row-wise.

• numpy.ndarray, shape (X.shape[0],) – The negative Hessian of the log likelihood, row-wise.

Parameters

• coef (ndarray) –

• X (Union[MatrixBase, StandardizedMatrix]) –

• y (ndarray) –

• sample_weight (ndarray) –

• eta (ndarray) –

• mu (ndarray) –

• offset (ndarray) –

unit_deviance(y, mu)

Get the deviance of each observation.

unit_deviance_derivative(y, mu)

Compute the derivative of the unit deviance with respect to mu.

The derivative of the unit deviance is given by $$-2 \times (y - \mu) / v(\mu)$$, where $$v(\mu)$$ is the unit variance.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

Return type

array-like, shape (n_samples,)

unit_variance(mu)

Compute the unit variance of a Tweedie distribution v(mu) = mu^power.

Parameters

mu (array-like, shape (n_samples,)) – Predicted mean.

Return type

numpy.ndarray, shape (n_samples,)

unit_variance_derivative(mu)

Compute the derivative of the unit variance of a Tweedie distribution.

Equation: $$v(\mu) = p \times \mu^{(p-1)}$$.

Parameters

mu (array-like, shape (n_samples,)) – Predicted mean.

Return type

numpy.ndarray, shape (n_samples,)

variance(mu, dispersion=1, sample_weight=1)

Compute the variance function.

The variance of $$Y_i \sim \mathrm{EDM}(\mu_i, \phi / s_i)$$ is $$\mathrm{var}(Y_i) = (\phi / s_i) * v(\mu_i)$$, with unit variance $$v(\mu)$$ and weights $$s_i$$.

Parameters
• mu (array-like, shape (n_samples,)) – Predicted mean.

• dispersion (float, optional (default=1)) – Dispersion parameter $$\phi$$.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.

Return type

array-like, shape (n_samples,)

variance_derivative(mu, dispersion=1, sample_weight=1)

Compute the derivative of the variance with respect to mu.

The derivative of the variance is equal to $$(\phi / s_i) * v'(\mu_i)$$, where $$v(\mu)$$ is the unit variance and $$s_i$$ are weights.

Parameters
• mu (array-like, shape (n_samples,)) – Predicted mean.

• dispersion (float, optional (default=1)) – Dispersion parameter $$\phi$$.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.

Return type

array-like, shape (n_samples,)

class glum.PoissonDistribution

Class for the scaled Poisson distribution.

deviance(y, mu, sample_weight=None)

Compute the deviance.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Sample weights.

Return type

float

deviance_derivative(y, mu, sample_weight=1)

Compute the derivative of the deviance with respect to mu.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

• sample_weight (array-like, shape (n_samples,) (default=1)) – Weights or exposure to which variance is inverse proportional.

Return type

array-like, shape (n_samples,)

dispersion(y, mu, sample_weight=None, ddof=1, method='pearson')

Estimate the dispersion parameter $$\phi$$.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

• sample_weight (array-like, shape (n_samples,), optional (default=None)) – Weights or exposure to which variance is inversely proportional.

• ddof (int, optional (default=1)) – Degrees of freedom consumed by the model for mu.

• {'pearson' (method =) – Whether to base the estimate on the Pearson residuals or the deviance.

• 'deviance'} – Whether to base the estimate on the Pearson residuals or the deviance.

• (default='pearson') (optional) – Whether to base the estimate on the Pearson residuals or the deviance.

Return type

float

eta_mu_deviance(link, factor, cur_eta, X_dot_d, y, sample_weight)

Compute eta, mu and the deviance.

Compute: * the linear predictor, eta, as cur_eta + factor * X_dot_d; * the link-function-transformed prediction, mu; * the deviance.

Returns

• numpy.ndarray, shape (X.shape[0],) – The linear predictor, eta.

• numpy.ndarray, shape (X.shape[0],) – The link-function-transformed prediction, mu.

• float – The deviance.

Parameters

• factor (float) –

• cur_eta (ndarray) –

• X_dot_d (ndarray) –

• y (ndarray) –

• sample_weight (ndarray) –

in_y_range(x)

Return True if x is in the valid range of the EDM.

Parameters

x (array-like, shape (n_samples,)) – Target values.

Return type

np.ndarray

property include_lower_bound: bool

Return whether lower_bound is allowed as a value of y.

log_likelihood(y, mu, sample_weight=None, dispersion=None)

Compute the log likelihood.

For 1 < power < 2, we use the series approximation by Dunn and Smyth (2005) to compute the normalization term.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Sample weights.

• dispersion (float, optional (default=None)) – Dispersion parameter $$\phi$$. Estimated if None.

Return type

float

property lower_bound: float

Return the lowest value of y allowed.

property power: float

Return the Tweedie power parameter.

Compute the gradient and negative Hessian of the log likelihood row-wise.

Returns

• numpy.ndarray, shape (X.shape[0],) – The gradient of the log likelihood, row-wise.

• numpy.ndarray, shape (X.shape[0],) – The negative Hessian of the log likelihood, row-wise.

Parameters

• coef (ndarray) –

• X (Union[MatrixBase, StandardizedMatrix]) –

• y (ndarray) –

• sample_weight (ndarray) –

• eta (ndarray) –

• mu (ndarray) –

• offset (ndarray) –

unit_deviance(y, mu)

Get the deviance of each observation.

unit_deviance_derivative(y, mu)

Compute the derivative of the unit deviance with respect to mu.

The derivative of the unit deviance is given by $$-2 \times (y - \mu) / v(\mu)$$, where $$v(\mu)$$ is the unit variance.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

Return type

array-like, shape (n_samples,)

unit_variance(mu)

Compute the unit variance of a Tweedie distribution v(mu) = mu^power.

Parameters

mu (array-like, shape (n_samples,)) – Predicted mean.

Return type

numpy.ndarray, shape (n_samples,)

unit_variance_derivative(mu)

Compute the derivative of the unit variance of a Tweedie distribution.

Equation: $$v(\mu) = p \times \mu^{(p-1)}$$.

Parameters

mu (array-like, shape (n_samples,)) – Predicted mean.

Return type

numpy.ndarray, shape (n_samples,)

variance(mu, dispersion=1, sample_weight=1)

Compute the variance function.

The variance of $$Y_i \sim \mathrm{EDM}(\mu_i, \phi / s_i)$$ is $$\mathrm{var}(Y_i) = (\phi / s_i) * v(\mu_i)$$, with unit variance $$v(\mu)$$ and weights $$s_i$$.

Parameters
• mu (array-like, shape (n_samples,)) – Predicted mean.

• dispersion (float, optional (default=1)) – Dispersion parameter $$\phi$$.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.

Return type

array-like, shape (n_samples,)

variance_derivative(mu, dispersion=1, sample_weight=1)

Compute the derivative of the variance with respect to mu.

The derivative of the variance is equal to $$(\phi / s_i) * v'(\mu_i)$$, where $$v(\mu)$$ is the unit variance and $$s_i$$ are weights.

Parameters
• mu (array-like, shape (n_samples,)) – Predicted mean.

• dispersion (float, optional (default=1)) – Dispersion parameter $$\phi$$.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.

Return type

array-like, shape (n_samples,)

class glum.TweedieDistribution(power=0)

A class for the Tweedie distribution.

A Tweedie distribution with mean $$\mu = \mathrm{E}(Y)$$ is uniquely defined by its mean-variance relationship $$\mathrm{var}(Y) \propto \mu^{\mathrm{power}}$$.

Special cases are:

Power

Distribution

0

Normal

1

Poisson

(1, 2)

Compound Poisson

2

Gamma

3

Inverse Gaussian

Parameters

power (float, optional (default=0)) – The variance power of the unit_variance $$v(\mu) = \mu^{\mathrm{power}}$$. For $$0 < \mathrm{power} < 1$$, no distribution exists.

deviance(y, mu, sample_weight=None)

Compute the deviance.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Sample weights.

Return type

float

deviance_derivative(y, mu, sample_weight=1)

Compute the derivative of the deviance with respect to mu.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

• sample_weight (array-like, shape (n_samples,) (default=1)) – Weights or exposure to which variance is inverse proportional.

Return type

array-like, shape (n_samples,)

dispersion(y, mu, sample_weight=None, ddof=1, method='pearson')

Estimate the dispersion parameter $$\phi$$.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

• sample_weight (array-like, shape (n_samples,), optional (default=None)) – Weights or exposure to which variance is inversely proportional.

• ddof (int, optional (default=1)) – Degrees of freedom consumed by the model for mu.

• {'pearson' (method =) – Whether to base the estimate on the Pearson residuals or the deviance.

• 'deviance'} – Whether to base the estimate on the Pearson residuals or the deviance.

• (default='pearson') (optional) – Whether to base the estimate on the Pearson residuals or the deviance.

Return type

float

eta_mu_deviance(link, factor, cur_eta, X_dot_d, y, sample_weight)

Compute eta, mu and the deviance.

Compute: * the linear predictor, eta, as cur_eta + factor * X_dot_d; * the link-function-transformed prediction, mu; * the deviance.

Returns

• numpy.ndarray, shape (X.shape[0],) – The linear predictor, eta.

• numpy.ndarray, shape (X.shape[0],) – The link-function-transformed prediction, mu.

• float – The deviance.

Parameters

• factor (float) –

• cur_eta (ndarray) –

• X_dot_d (ndarray) –

• y (ndarray) –

• sample_weight (ndarray) –

in_y_range(x)

Return True if x is in the valid range of the EDM.

Parameters

x (array-like, shape (n_samples,)) – Target values.

Return type

np.ndarray

property include_lower_bound: bool

Return whether lower_bound is allowed as a value of y.

log_likelihood(y, mu, sample_weight=None, dispersion=None)

Compute the log likelihood.

For 1 < power < 2, we use the series approximation by Dunn and Smyth (2005) to compute the normalization term.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Sample weights.

• dispersion (float, optional (default=None)) – Dispersion parameter $$\phi$$. Estimated if None.

Return type

float

property lower_bound: float

Return the lowest value of y allowed.

property power: float

Return the Tweedie power parameter.

Compute the gradient and negative Hessian of the log likelihood row-wise.

Returns

• numpy.ndarray, shape (X.shape[0],) – The gradient of the log likelihood, row-wise.

• numpy.ndarray, shape (X.shape[0],) – The negative Hessian of the log likelihood, row-wise.

Parameters

• coef (ndarray) –

• X (Union[MatrixBase, StandardizedMatrix]) –

• y (ndarray) –

• sample_weight (ndarray) –

• eta (ndarray) –

• mu (ndarray) –

• offset (ndarray) –

unit_deviance(y, mu)

Get the deviance of each observation.

unit_deviance_derivative(y, mu)

Compute the derivative of the unit deviance with respect to mu.

The derivative of the unit deviance is given by $$-2 \times (y - \mu) / v(\mu)$$, where $$v(\mu)$$ is the unit variance.

Parameters
• y (array-like, shape (n_samples,)) – Target values.

• mu (array-like, shape (n_samples,)) – Predicted mean.

Return type

array-like, shape (n_samples,)

unit_variance(mu)

Compute the unit variance of a Tweedie distribution v(mu) = mu^power.

Parameters

mu (array-like, shape (n_samples,)) – Predicted mean.

Return type

numpy.ndarray, shape (n_samples,)

unit_variance_derivative(mu)

Compute the derivative of the unit variance of a Tweedie distribution.

Equation: $$v(\mu) = p \times \mu^{(p-1)}$$.

Parameters

mu (array-like, shape (n_samples,)) – Predicted mean.

Return type

numpy.ndarray, shape (n_samples,)

variance(mu, dispersion=1, sample_weight=1)

Compute the variance function.

The variance of $$Y_i \sim \mathrm{EDM}(\mu_i, \phi / s_i)$$ is $$\mathrm{var}(Y_i) = (\phi / s_i) * v(\mu_i)$$, with unit variance $$v(\mu)$$ and weights $$s_i$$.

Parameters
• mu (array-like, shape (n_samples,)) – Predicted mean.

• dispersion (float, optional (default=1)) – Dispersion parameter $$\phi$$.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.

Return type

array-like, shape (n_samples,)

variance_derivative(mu, dispersion=1, sample_weight=1)

Compute the derivative of the variance with respect to mu.

The derivative of the variance is equal to $$(\phi / s_i) * v'(\mu_i)$$, where $$v(\mu)$$ is the unit variance and $$s_i$$ are weights.

Parameters
• mu (array-like, shape (n_samples,)) – Predicted mean.

• dispersion (float, optional (default=1)) – Dispersion parameter $$\phi$$.

• sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.

Return type

array-like, shape (n_samples,)

Bases: Link

The Tweedie link function x^(1-p) if p≠1 and log(x) if p=1.

Parameters

p (float) –

derivative(mu)

Get the derivative of the Tweedie link.

See superclass documentation.

Parameters

mu (array-like) –

inverse(**kwargs)

Compute the inverse link function h(lin_pred).

Gives the inverse relationship between linear predictor, lin_pred ≡ X * w, and the mean, mu ≡ E(Y), i.e. h(lin_pred) = mu.

Parameters

lin_pred (array-like, shape (n_samples,)) – Usually the (fitted) linear predictor.

inverse_derivative(**kwargs)

Compute the derivative of the inverse link function h'(lin_pred).

Parameters

lin_pred (array-like, shape (n_samples,)) – Usually the (fitted) linear predictor.

inverse_derivative2(**kwargs)

Compute second derivative of the inverse link function h''(lin_pred).

Parameters

lin_pred (array-like, shape (n_samples,)) – Usually the (fitted) linear predictor.

See superclass documentation.

Parameters

mu (array-like) –

For the Tweedie distribution, this code follows actuarial best practices regarding link functions. Note that these links are sometimes not canonical:

• identity for normal (p=0);

• no convention for p < 0, so let’s leave it as identity;

• log otherwise.

Parameters
Return type