glum package
The two main classes in glum are GeneralizedLinearRegressor and GeneralizedLinearRegressorCV. Most users will use fit() and predict()
- class glum.BinomialDistribution
Bases:
ExponentialDispersionModelA class for the Binomial distribution.
The Binomial distribution models outcomes
yin[0, 1].See the documentation of the superclass,
ExponentialDispersionModel, for details.- deviance(y, mu, sample_weight=1)
Compute the deviance.
The deviance is a weighted sum of the unit deviances. In terms of the unit log likelihood \(\ell\), it equals \(2 \sum_i [\ell(y_i, y_i, \phi) - \ell(y_i, \mu, \phi)]\), i.e. twice the difference between the log likelihood of a saturated model (with one parameter per observation) and the model at hand.
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which the variance is inversely proportional.
- Return type:
float
- deviance_derivative(y, mu, sample_weight=1)
Compute the derivative of the deviance with respect to
mu.- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,) (default=1)) – Weights or exposure to which variance is inverse proportional.
- Return type:
array-like, shape (n_samples,)
- dispersion(y, mu, sample_weight=None, ddof=1, method='pearson')
Estimate the dispersion parameter \(\phi\).
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inversely proportional.
ddof (int, optional (default=1)) – Degrees of freedom consumed by the model for
mu.{'pearson' (method =) – Whether to base the estimate on the Pearson residuals or the deviance.
'deviance'} – Whether to base the estimate on the Pearson residuals or the deviance.
(default='pearson') (optional) – Whether to base the estimate on the Pearson residuals or the deviance.
- Return type:
float
- eta_mu_deviance(link, factor, cur_eta, X_dot_d, y, sample_weight)
Compute
eta,muand the deviance.- Returns:
numpy.ndarray, shape (X.shape[0],) – The linear predictor,
eta, ascur_eta + factor * X_dot_d.numpy.ndarray, shape (X.shape[0],) – The link-function-transformed prediction,
mu.float – The deviance.
- Parameters:
link (Link)
factor (float)
- Return type:
tuple[ndarray, ndarray, float]
- in_y_range(x)
Return
Trueifxis in the valid range of the EDM.- Return type:
ndarray
- log_likelihood(y, mu, sample_weight=None, dispersion=1)
Compute the log likelihood.
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Sample weights.
dispersion (float, optional (default=1)) – Ignored.
- Return type:
float
- rowwise_gradient_hessian(link, coef, dispersion, X, y, sample_weight, eta, mu, offset=None)
Compute the gradient and negative Hessian of the log likelihood row-wise.
- Returns:
numpy.ndarray, shape (X.shape[0],) – The gradient of the log likelihood, row-wise.
numpy.ndarray, shape (X.shape[0],) – The negative Hessian of the log likelihood, row-wise.
- Parameters:
link (Link)
X (MatrixBase | StandardizedMatrix)
- to_tweedie(safe=True)
Return the Tweedie representation of a distribution if it exists.
- unit_deviance(y, mu)
Compute the unit deviance.
In terms of the unit log likelihood \(\ell\), the unit deviance is \(2 [\ell(y_i, y_i, \phi) - \ell(y_i, \mu, \phi)]\), i.e. twice the difference between the log likelihood of a saturated model (with one parameter per observation) and the model at hand.
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
- unit_deviance_derivative(y, mu)
Compute the derivative of the unit deviance with respect to
mu.The derivative of the unit deviance is given by \(2 \times (\mu - y) / v(\mu)\), where \(v(\mu)\) is the unit variance.
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
- Return type:
array-like, shape (n_samples,)
- unit_variance(mu)
Compute the unit variance.
The unit variance, \(v(\mu) \equiv b''((b')^{-1} (\mu))\), determines the variance as a function of the mean \(\mu\) by \(\mathrm{var}(y_i) = v(\mu_i) \times \phi / w_i\). It can also be derived from the unit deviance \(d(y, \mu)\) as
\[v(\mu) = 2 \div \frac{\partial^2 d(y, \mu)}{\partial\mu^2} \big| _{y=\mu}.\]See also
variance().- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
- unit_variance_derivative(mu)
Compute the derivative of the unit variance with respect to
mu.- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
- variance(mu, dispersion=1, sample_weight=1)
Compute the variance function.
The variance of \(Y_i \sim \mathrm{EDM}(\mu_i, \phi / w_i)\) takes the form \(v(\mu_i) \times \phi / w_i\), where \(v(\mu)\) is the unit variance and \(w_i\) are weights.
- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
dispersion (float, optional (default=1)) – Dispersion parameter \(\phi\).
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.
- Return type:
array-like, shape (n_samples,)
- variance_derivative(mu, dispersion=1, sample_weight=1)
Compute the derivative of the variance with respect to
mu.The derivative of the variance is equal to \(v(\mu_i) \times \phi / w_i\), where \(v(\mu)\) is the unit variance and \(ws_i\) are weights.
- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
dispersion (float, optional (default=1)) – Dispersion parameter \(\phi\).
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.
- Return type:
array-like, shape (n_samples,)
- class glum.CloglogLink
Bases:
LinkThe complementary log-log link function
log(-log(1-mu)).- derivative(mu)
Compute the derivative of the link function.
- Parameters:
mu (array-like, shape (n_samples,)) – Usually the (predicted) mean.
- inverse(lin_pred)
Compute the inverse link function.
The inverse link function
hgives the inverse relationship between the linear predictor,X * w, and the mean,mu ≡ E(Y), so thath(X * w) = mu.- Parameters:
lin_pred (array-like, shape (n_samples,)) – Usually the (fitted) linear predictor.
- inverse_derivative(lin_pred)
Compute the derivative of the inverse link function.
- Parameters:
lin_pred (array-like, shape (n_samples,)) – Usually the (fitted) linear predictor.
- inverse_derivative2(lin_pred)
Compute second derivative of the inverse link function.
- Parameters:
lin_pred (array-like, shape (n_samples,)) – Usually the (fitted) linear predictor.
- link(mu)
Compute the link function.
The link function
glinks the mean,mu ≡ E(Y), to the linear predictor,X * w, so thatg(mu)is equal to the linear predictor.- Parameters:
mu (array-like, shape (n_samples,)) – Usually the (predicted) mean.
- to_tweedie(safe=True)
Return the Tweedie representation of a link function if it exists.
- class glum.ExponentialDispersionModel
Bases:
objectBase class for reproductive Exponential Dispersion Models (EDM).
The PDF of \(Y \sim \mathrm{EDM}(\theta, \phi)\) is given by
\[p(y \mid \theta, \phi) &= \exp \left(\frac{y \theta - b(\theta)}{\phi / w} + c(y; w / \phi) \right),\]where \(\theta\) is the scale parameter, \(\phi\) is the dispersion parameter, \(w\) is a given weight, \(b\) is the cumulant function and \(c\) is a normalization term.
It can be shown that \(\mathrm{E}(Y) = b'(\theta)\) and \(\mathrm{var}(Y) = b''(\theta) \times \phi / w\).
References
< https://en.wikipedia.org/wiki/Exponential_dispersion_model >.
- deviance(y, mu, sample_weight=1)
Compute the deviance.
The deviance is a weighted sum of the unit deviances. In terms of the unit log likelihood \(\ell\), it equals \(2 \sum_i [\ell(y_i, y_i, \phi) - \ell(y_i, \mu, \phi)]\), i.e. twice the difference between the log likelihood of a saturated model (with one parameter per observation) and the model at hand.
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which the variance is inversely proportional.
- Return type:
float
- deviance_derivative(y, mu, sample_weight=1)
Compute the derivative of the deviance with respect to
mu.- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,) (default=1)) – Weights or exposure to which variance is inverse proportional.
- Return type:
array-like, shape (n_samples,)
- dispersion(y, mu, sample_weight=None, ddof=1, method='pearson')
Estimate the dispersion parameter \(\phi\).
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inversely proportional.
ddof (int, optional (default=1)) – Degrees of freedom consumed by the model for
mu.{'pearson' (method =) – Whether to base the estimate on the Pearson residuals or the deviance.
'deviance'} – Whether to base the estimate on the Pearson residuals or the deviance.
(default='pearson') (optional) – Whether to base the estimate on the Pearson residuals or the deviance.
- Return type:
float
- eta_mu_deviance(link, factor, cur_eta, X_dot_d, y, sample_weight)
Compute
eta,muand the deviance.- Returns:
numpy.ndarray, shape (X.shape[0],) – The linear predictor,
eta, ascur_eta + factor * X_dot_d.numpy.ndarray, shape (X.shape[0],) – The link-function-transformed prediction,
mu.float – The deviance.
- Parameters:
link (Link)
factor (float)
- Return type:
tuple[ndarray, ndarray, float]
- in_y_range(x)
Return
Trueifxis in the valid range of the EDM.- Return type:
ndarray
- abstract property include_lower_bound: bool
Return whether
lower_boundis allowed as a value ofy.
- abstract property include_upper_bound: bool
Return whether
upper_boundis allowed as a value ofy.
- abstract property lower_bound: float
Get the lower bound of values for the EDM.
- rowwise_gradient_hessian(link, coef, dispersion, X, y, sample_weight, eta, mu, offset=None)
Compute the gradient and negative Hessian of the log likelihood row-wise.
- Returns:
numpy.ndarray, shape (X.shape[0],) – The gradient of the log likelihood, row-wise.
numpy.ndarray, shape (X.shape[0],) – The negative Hessian of the log likelihood, row-wise.
- Parameters:
link (Link)
X (MatrixBase | StandardizedMatrix)
- to_tweedie(safe=True)
Return the Tweedie representation of a distribution if it exists.
- abstractmethod unit_deviance(y, mu)
Compute the unit deviance.
In terms of the unit log likelihood \(\ell\), the unit deviance is \(2 [\ell(y_i, y_i, \phi) - \ell(y_i, \mu, \phi)]\), i.e. twice the difference between the log likelihood of a saturated model (with one parameter per observation) and the model at hand.
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
- unit_deviance_derivative(y, mu)
Compute the derivative of the unit deviance with respect to
mu.The derivative of the unit deviance is given by \(2 \times (\mu - y) / v(\mu)\), where \(v(\mu)\) is the unit variance.
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
- Return type:
array-like, shape (n_samples,)
- abstractmethod unit_variance(mu)
Compute the unit variance.
The unit variance, \(v(\mu) \equiv b''((b')^{-1} (\mu))\), determines the variance as a function of the mean \(\mu\) by \(\mathrm{var}(y_i) = v(\mu_i) \times \phi / w_i\). It can also be derived from the unit deviance \(d(y, \mu)\) as
\[v(\mu) = 2 \div \frac{\partial^2 d(y, \mu)}{\partial\mu^2} \big| _{y=\mu}.\]See also
variance().- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
- abstractmethod unit_variance_derivative(mu)
Compute the derivative of the unit variance with respect to
mu.- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
- abstract property upper_bound: float
Get the upper bound of values for the EDM.
- variance(mu, dispersion=1, sample_weight=1)
Compute the variance function.
The variance of \(Y_i \sim \mathrm{EDM}(\mu_i, \phi / w_i)\) takes the form \(v(\mu_i) \times \phi / w_i\), where \(v(\mu)\) is the unit variance and \(w_i\) are weights.
- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
dispersion (float, optional (default=1)) – Dispersion parameter \(\phi\).
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.
- Return type:
array-like, shape (n_samples,)
- variance_derivative(mu, dispersion=1, sample_weight=1)
Compute the derivative of the variance with respect to
mu.The derivative of the variance is equal to \(v(\mu_i) \times \phi / w_i\), where \(v(\mu)\) is the unit variance and \(ws_i\) are weights.
- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
dispersion (float, optional (default=1)) – Dispersion parameter \(\phi\).
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.
- Return type:
array-like, shape (n_samples,)
- class glum.GammaDistribution
Bases:
ExponentialDispersionModelClass for the gamma distribution.
The gamma distribution models outcomes
yin(0, +∞).See the documentation of the superclass,
ExponentialDispersionModel, for details.- deviance(y, mu, sample_weight=None)
Compute the deviance.
The deviance is a weighted sum of the unit deviances. In terms of the unit log likelihood \(\ell\), it equals \(2 \sum_i [\ell(y_i, y_i, \phi) - \ell(y_i, \mu, \phi)]\), i.e. twice the difference between the log likelihood of a saturated model (with one parameter per observation) and the model at hand.
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which the variance is inversely proportional.
- Return type:
float
- deviance_derivative(y, mu, sample_weight=1)
Compute the derivative of the deviance with respect to
mu.- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,) (default=1)) – Weights or exposure to which variance is inverse proportional.
- Return type:
array-like, shape (n_samples,)
- dispersion(y, mu, sample_weight=None, ddof=1, method='pearson')
Estimate the dispersion parameter \(\phi\).
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inversely proportional.
ddof (int, optional (default=1)) – Degrees of freedom consumed by the model for
mu.{'pearson' (method =) – Whether to base the estimate on the Pearson residuals or the deviance.
'deviance'} – Whether to base the estimate on the Pearson residuals or the deviance.
(default='pearson') (optional) – Whether to base the estimate on the Pearson residuals or the deviance.
- Return type:
float
- eta_mu_deviance(link, factor, cur_eta, X_dot_d, y, sample_weight)
Compute
eta,muand the deviance.- Returns:
numpy.ndarray, shape (X.shape[0],) – The linear predictor,
eta, ascur_eta + factor * X_dot_d.numpy.ndarray, shape (X.shape[0],) – The link-function-transformed prediction,
mu.float – The deviance.
- Parameters:
link (Link)
factor (float)
- Return type:
tuple[ndarray, ndarray, float]
- in_y_range(x)
Return
Trueifxis in the valid range of the EDM.- Return type:
ndarray
- log_likelihood(y, mu, sample_weight=None, dispersion=None)
Compute the log likelihood.
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Sample weights.
dispersion (float, optional (default=None)) – Dispersion parameter \(\phi\). Estimated if
None.
- Return type:
float
- rowwise_gradient_hessian(link, coef, dispersion, X, y, sample_weight, eta, mu, offset=None)
Compute the gradient and negative Hessian of the log likelihood row-wise.
- Returns:
numpy.ndarray, shape (X.shape[0],) – The gradient of the log likelihood, row-wise.
numpy.ndarray, shape (X.shape[0],) – The negative Hessian of the log likelihood, row-wise.
- Parameters:
link (Link)
X (MatrixBase | StandardizedMatrix)
- to_tweedie(safe=True)
Return the Tweedie representation of a distribution if it exists.
- unit_deviance(y, mu)
Compute the unit deviance.
In terms of the unit log likelihood \(\ell\), the unit deviance is \(2 [\ell(y_i, y_i, \phi) - \ell(y_i, \mu, \phi)]\), i.e. twice the difference between the log likelihood of a saturated model (with one parameter per observation) and the model at hand.
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
- unit_deviance_derivative(y, mu)
Compute the derivative of the unit deviance with respect to
mu.The derivative of the unit deviance is given by \(2 \times (\mu - y) / v(\mu)\), where \(v(\mu)\) is the unit variance.
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
- Return type:
array-like, shape (n_samples,)
- unit_variance(mu)
Compute the unit variance.
The unit variance, \(v(\mu) \equiv b''((b')^{-1} (\mu))\), determines the variance as a function of the mean \(\mu\) by \(\mathrm{var}(y_i) = v(\mu_i) \times \phi / w_i\). It can also be derived from the unit deviance \(d(y, \mu)\) as
\[v(\mu) = 2 \div \frac{\partial^2 d(y, \mu)}{\partial\mu^2} \big| _{y=\mu}.\]See also
variance().- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
- Return type:
ndarray
- unit_variance_derivative(mu)
Compute the derivative of the unit variance with respect to
mu.- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
- Return type:
ndarray
- variance(mu, dispersion=1, sample_weight=1)
Compute the variance function.
The variance of \(Y_i \sim \mathrm{EDM}(\mu_i, \phi / w_i)\) takes the form \(v(\mu_i) \times \phi / w_i\), where \(v(\mu)\) is the unit variance and \(w_i\) are weights.
- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
dispersion (float, optional (default=1)) – Dispersion parameter \(\phi\).
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.
- Return type:
array-like, shape (n_samples,)
- variance_derivative(mu, dispersion=1, sample_weight=1)
Compute the derivative of the variance with respect to
mu.The derivative of the variance is equal to \(v(\mu_i) \times \phi / w_i\), where \(v(\mu)\) is the unit variance and \(ws_i\) are weights.
- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
dispersion (float, optional (default=1)) – Dispersion parameter \(\phi\).
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.
- Return type:
array-like, shape (n_samples,)
- class glum.GeneralizedHyperbolicSecant
Bases:
ExponentialDispersionModelA class for the Generalized Hyperbolic Secant (GHS) distribution.
The GHS distribution models outcomes
yin(-∞, +∞).See the documentation of the superclass,
ExponentialDispersionModel, for details.- deviance(y, mu, sample_weight=1)
Compute the deviance.
The deviance is a weighted sum of the unit deviances. In terms of the unit log likelihood \(\ell\), it equals \(2 \sum_i [\ell(y_i, y_i, \phi) - \ell(y_i, \mu, \phi)]\), i.e. twice the difference between the log likelihood of a saturated model (with one parameter per observation) and the model at hand.
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which the variance is inversely proportional.
- Return type:
float
- deviance_derivative(y, mu, sample_weight=1)
Compute the derivative of the deviance with respect to
mu.- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,) (default=1)) – Weights or exposure to which variance is inverse proportional.
- Return type:
array-like, shape (n_samples,)
- dispersion(y, mu, sample_weight=None, ddof=1, method='pearson')
Estimate the dispersion parameter \(\phi\).
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inversely proportional.
ddof (int, optional (default=1)) – Degrees of freedom consumed by the model for
mu.{'pearson' (method =) – Whether to base the estimate on the Pearson residuals or the deviance.
'deviance'} – Whether to base the estimate on the Pearson residuals or the deviance.
(default='pearson') (optional) – Whether to base the estimate on the Pearson residuals or the deviance.
- Return type:
float
- eta_mu_deviance(link, factor, cur_eta, X_dot_d, y, sample_weight)
Compute
eta,muand the deviance.- Returns:
numpy.ndarray, shape (X.shape[0],) – The linear predictor,
eta, ascur_eta + factor * X_dot_d.numpy.ndarray, shape (X.shape[0],) – The link-function-transformed prediction,
mu.float – The deviance.
- Parameters:
link (Link)
factor (float)
- Return type:
tuple[ndarray, ndarray, float]
- in_y_range(x)
Return
Trueifxis in the valid range of the EDM.- Return type:
ndarray
- rowwise_gradient_hessian(link, coef, dispersion, X, y, sample_weight, eta, mu, offset=None)
Compute the gradient and negative Hessian of the log likelihood row-wise.
- Returns:
numpy.ndarray, shape (X.shape[0],) – The gradient of the log likelihood, row-wise.
numpy.ndarray, shape (X.shape[0],) – The negative Hessian of the log likelihood, row-wise.
- Parameters:
link (Link)
X (MatrixBase | StandardizedMatrix)
- to_tweedie(safe=True)
Return the Tweedie representation of a distribution if it exists.
- unit_deviance(y, mu)
Compute the unit deviance.
In terms of the unit log likelihood \(\ell\), the unit deviance is \(2 [\ell(y_i, y_i, \phi) - \ell(y_i, \mu, \phi)]\), i.e. twice the difference between the log likelihood of a saturated model (with one parameter per observation) and the model at hand.
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
- Return type:
ndarray
- unit_deviance_derivative(y, mu)
Compute the derivative of the unit deviance with respect to
mu.The derivative of the unit deviance is given by \(2 \times (\mu - y) / v(\mu)\), where \(v(\mu)\) is the unit variance.
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
- Return type:
array-like, shape (n_samples,)
- unit_variance(mu)
Compute the unit variance.
The unit variance, \(v(\mu) \equiv b''((b')^{-1} (\mu))\), determines the variance as a function of the mean \(\mu\) by \(\mathrm{var}(y_i) = v(\mu_i) \times \phi / w_i\). It can also be derived from the unit deviance \(d(y, \mu)\) as
\[v(\mu) = 2 \div \frac{\partial^2 d(y, \mu)}{\partial\mu^2} \big| _{y=\mu}.\]See also
variance().- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
- Return type:
ndarray
- unit_variance_derivative(mu)
Compute the derivative of the unit variance with respect to
mu.- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
- Return type:
ndarray
- variance(mu, dispersion=1, sample_weight=1)
Compute the variance function.
The variance of \(Y_i \sim \mathrm{EDM}(\mu_i, \phi / w_i)\) takes the form \(v(\mu_i) \times \phi / w_i\), where \(v(\mu)\) is the unit variance and \(w_i\) are weights.
- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
dispersion (float, optional (default=1)) – Dispersion parameter \(\phi\).
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.
- Return type:
array-like, shape (n_samples,)
- variance_derivative(mu, dispersion=1, sample_weight=1)
Compute the derivative of the variance with respect to
mu.The derivative of the variance is equal to \(v(\mu_i) \times \phi / w_i\), where \(v(\mu)\) is the unit variance and \(ws_i\) are weights.
- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
dispersion (float, optional (default=1)) – Dispersion parameter \(\phi\).
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.
- Return type:
array-like, shape (n_samples,)
- class glum.GeneralizedLinearRegressor(*, alpha=None, l1_ratio=0, P1='identity', P2='identity', fit_intercept=True, family='normal', link='auto', solver='auto', max_iter=100, max_inner_iter=100000, gradient_tol=None, step_size_tol=None, hessian_approx=0.0, warm_start=False, alpha_search=False, alphas=None, n_alphas=100, min_alpha_ratio=None, min_alpha=None, start_params=None, selection='cyclic', random_state=None, copy_X=None, check_input=True, verbose=0, scale_predictors=False, lower_bounds=None, upper_bounds=None, A_ineq=None, b_ineq=None, monotonic_constraints=None, force_all_finite=True, drop_first=False, robust=True, expected_information=False, formula=None, interaction_separator=':', categorical_format='{name}[{category}]', cat_missing_method='fail', cat_missing_name='(MISSING)')
Bases:
GeneralizedLinearRegressorBaseRegression via a Generalized Linear Model (GLM) with penalties.
GLMs based on a reproductive Exponential Dispersion Model (EDM) aimed at fitting and predicting the mean of the target
yasmu=h(X*w). Therefore, the fit minimizes the following objective function with combined L1 and L2 priors as regularizer:1/(2*sum(s)) * deviance(y, h(X*w); s) + alpha * l1_ratio * ||P1*w||_1 + 1/2 * alpha * (1 - l1_ratio) * w*P2*w
with inverse link function
hands=sample_weight. Note that, foralpha=0the unregularized GLM is recovered. This is not the default behavior (seealphaparameter description for details). Additionally, forsample_weight=None, one hass_i=1andsum(s)=n_samples. ForP1=P2='identity', the penalty is the elastic net:alpha * l1_ratio * ||w||_1 + 1/2 * alpha * (1 - l1_ratio) * ||w||_2^2.
If you are interested in controlling the L1 and L2 penalties separately, keep in mind that this is equivalent to:
a * L1 + b * L2,
where:
alpha = a + b and l1_ratio = a / (a + b).
The parameter
l1_ratiocorresponds to alpha in the R package glmnet, whilealphacorresponds to the lambda parameter in glmnet. Specifically,l1_ratio = 1is the lasso penalty.Read more in background.
- Parameters:
alpha ({float, array-like}, optional (default=None)) – Constant that multiplies the penalty terms and thus determines the regularization strength. If
alpha_searchisFalse(the default), thenalphamust be a scalar or None (equivalent toalpha=0). Ifalpha_searchisTrue, thenalphamust be an iterable orNone. Seealpha_searchto find how the regularization path is set ifalphaisNone. See the notes for the exact mathematical meaning of this parameter.alpha=0is equivalent to unpenalized GLMs. In this case, the design matrixXmust have full column rank (no collinearities).l1_ratio (float, optional (default=0)) – The elastic net mixing parameter, with
0 <= l1_ratio <= 1. Forl1_ratio = 0, the penalty is an L2 penalty.For l1_ratio = 1, it is an L1 penalty. For0 < l1_ratio < 1, the penalty is a combination of L1 and L2.P1 ({'identity', array-like, None}, shape (n_features,), optional) – (default=’identity’) This array controls the strength of the regularization for each coefficient independently. A high value will lead to higher regularization while a value of zero will remove the regularization on this parameter. Note that
n_features = X.shape[1]. IfXis a pandas DataFrame with a categorical dtype and P1 has the same size as the number of columns, the penalty of the categorical column will be applied to all the levels of the categorical.P2 ({'identity', array-like, sparse matrix, None}, shape (n_features,) or (n_features, n_features), optional (default='identity')) – With this option, you can set the P2 matrix in the L2 penalty
w*P2*w. This gives a fine control over this penalty (Tikhonov regularization). A 2d array is directly used as the square matrix P2. A 1d array is interpreted as diagonal (square) matrix. The default'identity'andNoneset the identity matrix, which gives the usual squared L2-norm. If you just want to exclude certain coefficients, pass a 1d array filled with 1 and 0 for the coefficients to be excluded. Note that P2 must be positive semi-definite. IfXis a pandas DataFrame with a categorical dtype and P2 has the same size as the number of columns, the penalty of the categorical column will be applied to all the levels of the categorical. Note that if P2 is two-dimensional, its size needs to be of the same length as the expandedXmatrix.fit_intercept (bool, optional (default=True)) – Specifies if a constant (a.k.a. bias or intercept) should be added to the linear predictor (
X * coef + intercept).family (str or ExponentialDispersionModel, optional (default='normal')) – The distributional assumption of the GLM, i.e. the loss function to minimize. If a string, one of:
'binomial','gamma','gaussian','inverse.gaussian','normal','poisson','tweedie'or'negative.binomial'. Note that'tweedie'sets the power of the Tweedie distribution to 1.5; to use another value, specify it in parentheses (e.g.,'tweedie (1.5)'). The same applies for'negative.binomial'and theta parameter.link ({'auto', 'identity', 'log', 'logit', 'cloglog'} oe Link, optional (default='auto')) –
The link function of the GLM, i.e. mapping from linear predictor (
X * coef) to expectation (mu). Option'auto'sets the link depending on the chosen family as follows:'identity'for family'normal''log'for families'poisson','gamma','inverse.gaussian'and'negative.binomial'.'logit'for family'binomial'
solver ({'auto', 'closed-form', 'irls-cd', 'irls-ls', 'lbfgs', 'trust-constr'}, optional (default='auto')) –
Algorithm to use in the optimization problem:
'auto':'closed-form'for eligible Gaussian identity-link problems without L1 regularization,'irls-ls'for other pure-L2 cases, and'irls-cd'otherwise.'closed-form': Direct linear solve for eligible Gaussian identity-link problems (ridge/OLS/WLS).'irls-cd': Iteratively reweighted least squares with a coordinate descent inner solver. This can deal with L1 as well as L2 penalties. Note that in order to avoid unnecessary memory duplication of X in thefitmethod,Xshould be directly passed as a Fortran-contiguous Numpy array or sparse CSC matrix.'irls-ls': Iteratively reweighted least squares with a least squares inner solver. This algorithm cannot deal with L1 penalties.'lbfgs': Scipy’s L-BFGS-B optimizer. It cannot deal with L1 penalties.'trust-constr': Callsscipy.optimize.minimize(method='trust-constr'). It cannot deal with L1 penalties. This solver can optimize problems with inequality constraints, passed viaA_ineqandb_ineq. It will be selected automatically when inequality constraints are set andsolver='auto'. Note that using this method can lead to significantly increased runtimes by a factor of ten or higher.
max_iter (int, optional (default=100)) – The maximal number of iterations for solver algorithms.
max_inner_iter (int, optional (default=100000)) – The maximal number of iterations for the inner solver in the IRLS-CD algorithm. This parameter is only used when
solver='irls-cd'.gradient_tol (float, optional (default=None)) –
Stopping criterion. If
None, solver-specific defaults will be used. The default value for most solvers is1e-4, except for'trust-constr', which requires more conservative convergence settings and has a default value of1e-8.For the IRLS-LS, L-BFGS and trust-constr solvers, the iteration will stop when
max{|g_i|, i = 1, ..., n} <= tol, whereg_iis thei-th component of the gradient (derivative) of the objective function. For the CD solver, convergence is reached whensum_i(|minimum norm of g_i|), whereg_iis the subgradient of the objective and the minimum norm ofg_iis the element of the subgradient with the smallest L2 norm.If you wish to only use a step-size tolerance, set
gradient_tolto a very small number.step_size_tol (float, optional (default=None)) – Alternative stopping criterion. For the IRLS-LS and IRLS-CD solvers, the iteration will stop when the L2 norm of the step size is less than
step_size_tol. This stopping criterion is disabled whenstep_size_tolisNone.hessian_approx (float, optional (default=0.0)) – The threshold below which data matrix rows will be ignored for updating the Hessian. See the algorithm documentation for the IRLS algorithm for further details.
warm_start (bool, optional (default=False)) – Whether to reuse the solution of the previous call to
fitas initialization forcoef_andintercept_(supersedesstart_params). IfFalseor if the attributecoef_does not exist (first call tofit),start_paramssets the start values forcoef_andintercept_.alpha_search (bool, optional (default=False)) –
Whether to search along the regularization path for the best alpha. When set to
True,alphashould either beNoneor an iterable. To determine the regularization path, the following sequence is used:If
alphais an iterable, use it directly. All other parameters governing the regularization path are ignored.If
min_alphais set, create a path frommin_alphato the lowest alpha such that all coefficients are zero.If
min_alpha_ratiois set, create a path where the ratio ofmin_alpha / max_alpha = min_alpha_ratio.If none of the above parameters are set, use a
min_alpha_ratioof1e-6ifn_samples >= n_features, else1e-2.
alphas (DEPRECATED. Use
alphainstead.)n_alphas (int, optional (default=100)) – Number of alphas along the regularization path
min_alpha_ratio (float, optional (default=None)) – Length of the path.
min_alpha_ratio=1e-6means thatmin_alpha / max_alpha = 1e-6. IfNone,1e-6is used whenn_samples >= n_features, else1e-2.min_alpha (float, optional (default=None)) – Minimum alpha to estimate the model with. The grid will then be created over
[max_alpha, min_alpha].start_params (array-like, shape (n_features*,), optional (default=None)) – Relevant only if
warm_startisFalseor iffitis called for the first time (so thatself.coef_does not exist yet). IfNone, all coefficients are set to zero and the start value for the intercept is the weighted average ofy(Iffit_interceptisTrue). If an array, used directly as start values; iffit_interceptisTrue, its first element is assumed to be the start value for theintercept_. Note thatn_features* = X.shape[1] + fit_intercept, i.e. it includes the intercept.selection (str, optional (default='cyclic')) – For the CD solver ‘cd’, the coordinates (features) can be updated in either cyclic or random order. If set to
'random', a random coefficient is updated every iteration rather than looping over features sequentially in the same order, which often leads to significantly faster convergence, especially whengradient_tolis higher than1e-4.random_state (int or RandomState, optional (default=None)) – The seed of the pseudo random number generator that selects a random feature to be updated for the CD solver. If an integer,
random_stateis the seed used by the random number generator; if aRandomStateinstance,random_stateis the random number generator; ifNone, the random number generator is theRandomStateinstance used bynp.random. Used whenselectionis'random'.copy_X (bool, optional (default=None)) – Whether to copy
X. SinceXis never modified byGeneralizedLinearRegressor, this is unlikely to be needed; this option exists mainly for compatibility with other scikit-learn estimators. IfFalse,Xwill not be copied and there will be an error if you pass anXin the wrong format, such as providing integerXand floaty(only guaranteed for numpy arrays and pandas data frames). IfNone,Xwill not be copied unless it is in the wrong format.check_input (bool, optional (default=True)) – Whether to bypass several checks on input:
yvalues in range offamily,sample_weightnon-negative,P2positive semi-definite. Don’t use this parameter unless you know what you are doing.verbose (int, optional (default=0)) – For the IRLS solver, any positive number will result in a pretty progress bar showing convergence. This features requires having the tqdm package installed. For the L-BFGS and
'trust-constr'solvers, setverboseto any positive number for verbosity.scale_predictors (bool, optional (default=False)) –
If
True, scale all predictors to have standard deviation one. Should be set toTrueifalpha > 0and if you want coefficients to be penalized equally.Reported coefficient estimates are always at the original scale.
Advanced developer note: Internally, predictors are always rescaled for computational reasons, but this only affects results if
scale_predictorsisTrue.lower_bounds (array-like, shape (n_features,), optional (default=None)) – Set a lower bound for the coefficients. Setting bounds forces the use of the coordinate descent solver (
'irls-cd').upper_bounds (array-like, shape=(n_features,), optional (default=None)) – See
lower_bounds.A_ineq (array-like, shape=(n_constraints, n_features), optional (default=None)) – Constraint matrix for linear inequality constraints of the form
A_ineq w <= b_ineq. Setting inequality constraints forces the use of the local gradient-based solver'trust-constr', which may increase runtime significantly. Note that the constraints only apply to coefficients related to features inX. If you want to constrain the intercept, add it to the feature matrixXmanually and setfit_intercept==False.b_ineq (array-like, shape=(n_constraints,), optional (default=None)) – Constraint vector for linear inequality constraints of the form
A_ineq w <= b_ineq. Refer to the documentation ofA_ineqfor details.drop_first (bool, optional (default = False)) – If
True, drop the first column when encoding categorical variables. Set this to True whenalpha=0andsolver='auto'to prevent an error due to a singular feature matrix. In the case of using a formula with interactions, setting this argument toTrueensures structural full-rankness (it is equivalent toensure_full_rankin formulaic and tabmat).robust (bool, optional (default = False)) – If true, then robust standard errors are computed by default.
expected_information (bool, optional (default = False)) – If true, then the expected information matrix is computed by default. Only relevant when computing robust standard errors.
formula (formulaic.FormulaSpec) – A formula accepted by formulaic. It can either be a one-sided formula, in which case
ymust be specified infit, or a two-sided formula, in which caseymust beNone.interaction_separator (str, default=":") – The separator between the names of interacted variables.
categorical_format (str, optional, default='{name}[{category}]') – Format string for categorical features. The format string should contain the placeholder
{name}for the feature name and{category}for the category name. Only used ifXis a pandas DataFrame.cat_missing_method (str {'fail'|'zero'|'convert'}, default='fail') –
How to handle missing values in categorical columns. Only used if
Xis a pandas data frame. - if ‘fail’, raise an error if there are missing values - if ‘zero’, missing values will represent all-zero indicator columns. - if ‘convert’, missing values will be converted to thecat_missing_namecategory.
cat_missing_name (str, default='(MISSING)') – Name of the category to which missing values will be converted if
cat_missing_method='convert'. Only used ifXis a pandas data frame.monotonic_constraints (Mapping[str, str] | None)
force_all_finite (bool)
- coef_
Estimated coefficients for the linear predictor (X*coef_+intercept_) in the GLM.
- Type:
numpy.array, shape (n_features,)
- intercept_
Intercept (a.k.a. bias) added to linear predictor.
- Type:
float
- n_iter_
Actual number of iterations used in solver.
- Type:
int
- col_means_
The means of the columns of the design matrix
X.- Type:
array, shape (n_features,)
- col_stds_
The standard deviations of the columns of the design matrix
X.- Type:
array, shape (n_features,)
Notes
The fit itself does not need outcomes to be from an EDM, but only assumes the first two moments to be \(\mu_i \equiv \mathrm{E}(y_i) = h(x_i' w)\) and \(\mathrm{var}(y_i) = (\phi / s_i) v(\mu_i)\). The unit variance function \(v(\mu_i)\) is a property of and given by the specific EDM; see background.
The parameters \(w\) (
coef_andintercept_) are estimated by minimizing the deviance plus penalty term, which is equivalent to (penalized) maximum likelihood estimation.If the target
yis a ratio, appropriate sample weightssshould be provided. As an example, consider Poisson distributed countsz(integers) and weightss = exposure(time, money, persons years, …). Then you fity ≡ z/s, i.e.GeneralizedLinearModel(family='poisson').fit(X, y, sample_weight=s). The weights are necessary for the right (finite sample) mean. Consider \(\bar{y} = \sum_i s_i y_i / \sum_i s_i\): in this case, one might say that \(y\) follows a ‘scaled’ Poisson distribution. The same holds for other distributions.References
- For the coordinate descent implementation:
Guo-Xun Yuan, Chia-Hua Ho, Chih-Jen Lin An Improved GLMNET for L1-regularized Logistic Regression, Journal of Machine Learning Research 13 (2012) 1999-2030 https://www.csie.ntu.edu.tw/~cjlin/papers/l1_glmnet/long-glmnet.pdf
- aic(X, y, sample_weight=None, *, context=None)
Akaike’s information criteria. Computed as: \(-2\log\hat{\mathcal{L}} + 2\hat{k}\) where \(\hat{\mathcal{L}}\) is the maximum likelihood estimate of the model, and \(\hat{k}\) is the effective number of parameters. See _compute_information_criteria for more information on the computation of \(\hat{k}\).
- Parameters:
X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Same data as used in ‘fit’
y (array-like, shape (n_samples,)) – Same data as used in ‘fit’
sample_weight (array-like, shape (n_samples,), optional (default=None)) – Same data as used in ‘fit’
context (Optional[Union[int, Mapping[str, Any]]], default=None) – The context to add to the evaluation context of the formula with, e.g., custom transforms. If an integer, the context is taken from the stack frame of the caller at the given depth. Otherwise, a mapping from variable names to values is expected. By default, no context is added. Set
context=0to make the calling scope available.
- aicc(X, y, sample_weight=None, *, context=None)
Second-order Akaike’s information criteria (or small sample AIC). Computed as: \(-2\log\hat{\mathcal{L}} + 2\hat{k} + \frac{2k(k+1)}{n-k-1}\) where \(\hat{\mathcal{L}}\) is the maximum likelihood estimate of the model, \(n\) is the number of training instances, and \(\hat{k}\) is the effective number of parameters. See _compute_information_criteria for more information on the computation of \(\hat{k}\).
- Parameters:
X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Same data as used in ‘fit’
y (array-like, shape (n_samples,)) – Same data as used in ‘fit’
sample_weight (array-like, shape (n_samples,), optional (default=None)) – Same data as used in ‘fit’
context (Optional[Union[int, Mapping[str, Any]]], default=None) – The context to add to the evaluation context of the formula with, e.g., custom transforms. If an integer, the context is taken from the stack frame of the caller at the given depth. Otherwise, a mapping from variable names to values is expected. By default, no context is added. Set
context=0to make the calling scope available.
- bic(X, y, sample_weight=None, *, context=None)
Bayesian information criterion. Computed as: \(-2\log\hat{\mathcal{L}} + k\log(n)\) where \(\hat{\mathcal{L}}\) is the maximum likelihood estimate of the model, \(n\) is the number of training instances, and \(\hat{k}\) is the effective number of parameters. See _compute_information_criteria for more information on the computation of \(\hat{k}\).
- Parameters:
X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Same data as used in ‘fit’
y (array-like, shape (n_samples,)) – Same data as used in ‘fit’
sample_weight (array-like, shape (n_samples,), optional (default=None)) – Same data as used in ‘fit’
context (Optional[Union[int, Mapping[str, Any]]], default=None) – The context to add to the evaluation context of the formula with, e.g., custom transforms. If an integer, the context is taken from the stack frame of the caller at the given depth. Otherwise, a mapping from variable names to values is expected. By default, no context is added. Set
context=0to make the calling scope available.
- coef_table(X=None, y=None, sample_weight=None, offset=None, *, confidence_level=0.95, mu=None, dispersion=None, robust=None, clusters=None, expected_information=None, context=None)
Get a table of of the regression coefficients.
Includes coefficient estimates, standard errors, t-values, p-values and confidence intervals.
- Parameters:
confidence_level (float, optional, default=0.95) – The confidence level for the confidence intervals.
X ({array-like, sparse matrix}, shape (n_samples, n_features), optional) – Training data. Can be omitted if a covariance matrix has already been computed or if standard errors, etc. are not desired.
y (array-like, shape (n_samples,), optional) – Target values. Can be omitted if a covariance matrix has already been computed.
mu (array-like, optional, default=None) – Array with predictions. Estimated if absent.
offset (array-like, optional, default=None) – Array with additive offsets.
sample_weight (array-like, shape (n_samples,), optional, default=None) – Individual weights for each sample.
dispersion (float, optional, default=None) – The dispersion parameter. Estimated if absent.
robust (boolean, optional, default=None) – Whether to compute robust standard errors instead of normal ones. If not specified, the model’s
robustattribute is used.clusters (array-like, optional, default=None) – Array with cluster membership. Clustered standard errors are computed if clusters is not None.
expected_information (boolean, optional, default=None) – Whether to use the expected or observed information matrix. Only relevant when computing robust standard errors. If not specified, the model’s
expected_informationattribute is used.context (Optional[Union[int, Mapping[str, Any]]], default=None) – The context to add to the evaluation context of the formula with, e.g., custom transforms. If an integer, the context is taken from the stack frame of the caller at the given depth. Otherwise, a mapping from variable names to values is expected. By default, no context is added. Set
context=0to make the calling scope available.
- Returns:
A table of the regression results.
- Return type:
pandas.DataFrame
- covariance_matrix(X=None, y=None, sample_weight=None, offset=None, *, mu=None, dispersion=None, robust=None, clusters=None, expected_information=None, store_covariance_matrix=False, skip_checks=False, context=None)
Calculate the covariance matrix for generalized linear models.
- Parameters:
X ({array-like, sparse matrix}, shape (n_samples, n_features), optional) – Training data. Can be omitted if a covariance matrix has already been computed.
y (array-like, shape (n_samples,), optional) – Target values. Can be omitted if a covariance matrix has already been computed.
mu (array-like, optional, default=None) – Array with predictions. Estimated if absent.
offset (array-like, optional, default=None) – Array with additive offsets.
sample_weight (array-like, shape (n_samples,), optional, default=None) – Individual weights for each sample.
dispersion (float, optional, default=None) – The dispersion parameter. Estimated if absent.
robust (boolean, optional, default=None) – Whether to compute robust standard errors instead of normal ones. If not specified, the model’s
robustattribute is used.clusters (array-like, optional, default=None) – Array with cluster membership. Clustered standard errors are computed if clusters is not None.
expected_information (boolean, optional, default=None) – Whether to use the expected or observed information matrix. Only relevant when computing robust standard errors. If not specified, the model’s
expected_informationattribute is used.store_covariance_matrix (boolean, optional, default=False) – Whether to store the covariance matrix in the model instance. If a covariance matrix has already been stored, it will be overwritten.
skip_checks (boolean, optional, default=False) – Whether to skip input validation. For internal use only.
context (Optional[Union[int, Mapping[str, Any]]], default=None) – The context to add to the evaluation context of the formula with, e.g., custom transforms. If an integer, the context is taken from the stack frame of the caller at the given depth. Otherwise, a mapping from variable names to values is expected. By default, no context is added. Set
context=0to make the calling scope available.
Notes
We support three types of covariance matrices:
non-robust
robust (HC-1)
clustered
For maximum-likelihood estimator, the covariance matrix takes the form \(\mathcal{H}^{-1}(\theta_0)\mathcal{I}(\theta_0) \mathcal{H}^{-1}(\theta_0)\) where \(\mathcal{H}^{-1}\) is the inverse Hessian and \(\mathcal{I}\) is the Information matrix. The different types of covariance matrices use different approximation of these quantities.
The non-robust covariance matrix is computed as the inverse of the Fisher information matrix. This assumes that the information matrix equality holds.
The robust (HC-1) covariance matrix takes the form \(\mathbf{H}^{−1} (\hat{\theta})\mathbf{G}^{T}(\hat{\theta})\mathbf{G}(\hat{\theta}) \mathbf{H}^{−1}(\hat{\theta})\) where \(\mathbf{H}\) is the empirical Hessian and \(\mathbf{G}\) is the gradient. We apply a finite-sample correction of \(\frac{N}{N-p}\).
The clustered covariance matrix uses a similar approach to the robust (HC-1) covariance matrix. However, instead of using \(\mathbf{G}^{T}( \hat{\theta}\mathbf{G}(\hat{\theta})\) directly, we first sum over all the groups first. The finite-sample correction is affected as well, becoming \(\frac{M}{M-1}\frac{N}{N-p}\) where \(M\) is the number of groups.
References
- property family_instance: ExponentialDispersionModel
Return an
ExponentialDispersionModel.
- fit(X, y=None, sample_weight=None, offset=None, *, store_covariance_matrix=False, clusters=None, weights_sum=None, context=None)
Fit a Generalized Linear Model.
- Parameters:
X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Training data. Note that a
float32matrix is acceptable and will result in the entire algorithm being run in 32-bit precision. However, for problems that are poorly conditioned, this might result in poor convergence or flawed parameter estimates. If a Pandas data frame is provided, it may contain categorical columns. In that case, a separate coefficient will be estimated for each category. No category is omitted. This means that some regularization is required to fit models with an intercept or models with several categorical columns.y (array-like, shape (n_samples,)) – Target values.
sample_weight (array-like, shape (n_samples,), optional (default=None)) – Individual weights w_i for each sample. Note that, for an Exponential Dispersion Model (EDM), one has \(\mathrm{var}(y_i) = \phi \times v(mu) / w_i\). If \(y_i \sim EDM(\mu, \phi / w_i)\), then \(\sum w_i y_i / \sum w_i \sim EDM(\mu, \phi / \sum w_i)\), i.e. the mean of \(y\) is a weighted average with weights equal to
sample_weight.offset (array-like, shape (n_samples,), optional (default=None)) – Added to linear predictor. An offset of 3 will increase expected
yby 3 if the link is linear and will multiply expectedyby 3 if the link is logarithmic.store_covariance_matrix (bool, optional (default=False)) – Whether to estimate and store the covariance matrix of the parameter estimates. If
True, the covariance matrix will be available in thecovariance_matrix_attribute after fitting.clusters (array-like, optional, default=None) – Array with cluster membership. Clustered standard errors are computed if clusters is not None.
context (Optional[Union[int, Mapping[str, Any]]], default=None) – The context to add to the evaluation context of the formula with, e.g., custom transforms. If an integer, the context is taken from the stack frame of the caller at the given depth. Otherwise, a mapping from variable names to values is expected. By default, no context is added. Set
context=0to make the calling scope available.weights_sum (float, optional (default=None))
- Return type:
self
- get_formatted_diagnostics(*, full_report=False, custom_columns=None)
Get formatted diagnostics which can be printed with report_diagnostics.
- Parameters:
full_report (bool, optional (default=False)) – Print all available information. When
Falseandcustom_columnsisNone, a restricted set of columns is printed out.custom_columns (iterable, optional (default=None)) – Print only the specified columns.
- Return type:
str | DataFrame
- get_metadata_routing()
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequestencapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)
Get parameters for this estimator.
- Parameters:
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
params – Parameter names mapped to their values.
- Return type:
dict
- linear_predictor(X, offset=None, *, alpha_index=None, alpha=None, context=None)
Compute the linear predictor,
X * coef_ + intercept_.If
alpha_searchisTrue, butalpha_indexandalphaare bothNone, the predictions are for the last alpha valueself._alphas[-1].- Parameters:
X (array-like, shape (n_samples, n_features)) – Observations.
Xmay be a pandas data frame with categorical types. IfXwas also a data frame with categorical types during fitting and a category wasn’t observed at that point, the corresponding prediction will benumpy.nan.offset (array-like, shape (n_samples,), optional (default=None)) – Offset added to the linear predictor.
alpha_index (int or sequence of int, optional (default=None)) – Index (or indices) into the fitted alpha path. Only valid when
alpha_searchisTrue. Incompatible withalpha.alpha (float or sequence of float, optional (default=None)) – Alpha value(s) to predict at, resolved to the closest index on the fitted alpha path. Only valid when
alpha_searchisTrue. Incompatible withalpha_index.context (int or mapping, optional (default=None)) – The context to add to the evaluation context of the formula with, e.g., custom transforms. If an integer, the context is taken from the stack frame of the caller at the given depth. Otherwise, a mapping from variable names to values is expected. By default, no context is added. Set
context=0to make the calling scope available.
- Returns:
Shape
(n_samples,)when noalpha_index/alphais given or when a scalar alpha is passed. Shape(n_samples, len(alpha_index))when a sequence is passed.- Return type:
np.ndarray
- predict(X, sample_weight=None, offset=None, *, alpha_index=None, alpha=None, context=None)
Predict using GLM with feature matrix
X.If
alpha_searchisTrue, butalpha_indexandalphaare bothNone, we use the last alpha valueself._alphas[-1].- Parameters:
X (array-like, shape (n_samples, n_features)) – Observations.
Xmay be a pandas data frame with categorical types. IfXwas also a data frame with categorical types during fitting and a category wasn’t observed at that point, the corresponding prediction will benumpy.nan.sample_weight (array-like, shape (n_samples,), optional (default=None)) – Sample weights to multiply predictions by.
offset (array-like, shape (n_samples,), optional (default=None))
alpha_index (int or list[int], optional (default=None)) – Sets the index of the alpha(s) to use in case
alpha_searchisTrue. Incompatible withalpha(see below).alpha (float or list[float], optional (default=None)) – Sets the alpha(s) to use in case
alpha_searchisTrue. Incompatible withalpha_index(see above).context (Optional[Union[int, Mapping[str, Any]]], default=None) – The context to add to the evaluation context of the formula with, e.g., custom transforms. If an integer, the context is taken from the stack frame of the caller at the given depth. Otherwise, a mapping from variable names to values is expected. By default, no context is added. Set
context=0to make the calling scope available.
- Returns:
Shape
(n_samples,)when noalpha_index/alphais given or when a scalar alpha is passed. Shape(n_samples, len(alpha_index))when a sequence is passed.- Return type:
np.ndarray
- report_diagnostics(*, full_report=False, custom_columns=None)
Print diagnostics to
stdout.- Parameters:
full_report (bool, optional (default=False)) – Print all available information. When
Falseandcustom_columnsisNone, a restricted set of columns is printed out.custom_columns (iterable, optional (default=None)) – Print only the specified columns.
- Return type:
None
- score(X, y, sample_weight=None, offset=None, *, context=None)
Compute \(D^2\), the percentage of deviance explained.
\(D^2\) is a generalization of the coefficient of determination \(R^2\). The \(R^2\) uses the squared error and the \(D^2\), the deviance. Note that those two are equal for
family='normal'.\(D^2\) is defined as \(D^2 = 1 - \frac{D(y_{\mathrm{true}}, y_{\mathrm{pred}})} {D_{\mathrm{null}}}\), \(D_{\mathrm{null}}\) is the null deviance, i.e. the deviance of a model with intercept alone. The best possible score is one and it can be negative.
- Parameters:
X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Test samples.
y (array-like, shape (n_samples,)) – True values of target.
sample_weight (array-like, shape (n_samples,), optional (default=None)) – Sample weights.
offset (array-like, shape (n_samples,), optional (default=None))
context (Optional[Union[int, Mapping[str, Any]]], default=None) – The context to add to the evaluation context of the formula with, e.g., custom transforms. If an integer, the context is taken from the stack frame of the caller at the given depth. Otherwise, a mapping from variable names to values is expected. By default, no context is added. Set
context=0to make the calling scope available.
- Returns:
D^2 of self.predict(X) w.r.t. y.
- Return type:
float
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- std_errors(X=None, y=None, sample_weight=None, offset=None, *, mu=None, dispersion=None, robust=None, clusters=None, expected_information=None, store_covariance_matrix=False, context=None)
Calculate standard errors for generalized linear models.
See covariance_matrix for an in-depth explanation of how the standard errors are computed.
- Parameters:
X ({array-like, sparse matrix}, shape (n_samples, n_features), optional) – Training data. Can be omitted if a covariance matrix has already been computed.
y (array-like, shape (n_samples,), optional) – Target values. Can be omitted if a covariance matrix has already been computed.
sample_weight (array-like, shape (n_samples,), optional, default=None) – Individual weights for each sample.
offset (array-like, optional, default=None) – Array with additive offsets.
mu (array-like, optional, default=None) – Array with predictions. Estimated if absent.
dispersion (float, optional, default=None) – The dispersion parameter. Estimated if absent.
robust (boolean, optional, default=None) – Whether to compute robust standard errors instead of normal ones. If not specified, the model’s
robustattribute is used.clusters (array-like, optional, default=None) – Array with cluster membership. Clustered standard errors are computed if clusters is not None.
expected_information (boolean, optional, default=None) – Whether to use the expected or observed information matrix. Only relevant when computing robust standard errors. If not specified, the model’s
expected_informationattribute is used.store_covariance_matrix (boolean, optional, default=False) – Whether to store the covariance matrix in the model instance. If a covariance matrix has already been stored, it will be overwritten.
context (Optional[Union[int, Mapping[str, Any]]], default=None) – The context to add to the evaluation context of the formula with, e.g., custom transforms. If an integer, the context is taken from the stack frame of the caller at the given depth. Otherwise, a mapping from variable names to values is expected. By default, no context is added. Set
context=0to make the calling scope available.
- wald_test(X=None, y=None, sample_weight=None, offset=None, *, R=None, features=None, terms=None, formula=None, r=None, mu=None, dispersion=None, robust=None, clusters=None, expected_information=None, context=None)
Compute the Wald test statistic and p-value for a linear hypothesis.
The left hand side of the hypothesis may be specified in the following ways:
R: The restriction matrix representing the linear combination of coefficients to test.features: The name of a feature or a list of features to test.terms: The name of a term or a list of terms to test.formula: A formula string specifying the hypothesis to test.
The right hand side of the tested hypothesis is specified by
r. In the case of aterms-based test, the null hypothesis is that each coefficient relating to a term equals the corresponding value inr.- Parameters:
X ({array-like, sparse matrix}, shape (n_samples, n_features), optional) – Training data. Can be omitted if a covariance matrix has already been computed.
y (array-like, shape (n_samples,), optional) – Target values. Can be omitted if a covariance matrix has already been computed.
sample_weight (array-like, shape (n_samples,), optional, default=None) – Individual weights for each sample.
offset (array-like, optional, default=None) – Array with additive offsets.
R (np.ndarray, optional, default=None) – The restriction matrix representing the linear combination of coefficients to test.
features (Union[str, list[str]], optional, default=None) – The name of a feature or a list of features to test.
terms (Union[str, list[str]], optional, default=None) – The name of a term or a list of terms to test. It can cover one or more coefficients. In the case of a model based on a formula, a term is one of the expressions separated by
+signs. Otherwise, a term is one column in the input data. As categorical variables need not be one-hot encoded in glum, in their case, the hypothesis to be tested is that the coefficients of all categories are equal tor.r (Sequence, optional, default=None) – The vector representing the values of the linear combination. If None, the test is for whether the linear combinations of the coefficients are zero.
mu (array-like, optional, default=None) – Array with predictions. Estimated if absent.
dispersion (float, optional, default=None) – The dispersion parameter. Estimated if absent.
robust (boolean, optional, default=None) – Whether to compute robust standard errors instead of normal ones. If not specified, the model’s
robustattribute is used.clusters (array-like, optional, default=None) – Array with cluster membership. Clustered standard errors are computed if clusters is not None.
expected_information (boolean, optional, default=None) – Whether to use the expected or observed information matrix. Only relevant when computing robust standard errors. If not specified, the model’s
expected_informationattribute is used.context (Optional[Union[int, Mapping[str, Any]]], default=None) – The context to add to the evaluation context of the formula with, e.g., custom transforms. If an integer, the context is taken from the stack frame of the caller at the given depth. Otherwise, a mapping from variable names to values is expected. By default, no context is added. Set
context=0to make the calling scope available.formula (str | None)
- Returns:
NamedTuple with test statistic, p-value, and degrees of freedom.
- Return type:
WaldTestResult
- class glum.GeneralizedLinearRegressorCV(*, l1_ratio=0, P1='identity', P2='identity', fit_intercept=True, family='normal', link='auto', solver='auto', max_iter=100, max_inner_iter=100000, gradient_tol=None, step_size_tol=None, hessian_approx=0.0, warm_start=False, n_alphas=100, alphas=None, min_alpha_ratio=None, min_alpha=None, start_params=None, selection='cyclic', random_state=None, copy_X=None, check_input=True, verbose=0, scale_predictors=False, lower_bounds=None, upper_bounds=None, A_ineq=None, b_ineq=None, monotonic_constraints=None, force_all_finite=True, cv=None, n_jobs=None, drop_first=False, robust=True, expected_information=False, formula=None, interaction_separator=':', categorical_format='{name}[{category}]', cat_missing_method='fail', cat_missing_name='(MISSING)')
Bases:
GeneralizedLinearRegressorBaseGeneralized linear model with iterative fitting along a regularization path.
The best model is selected by cross-validation.
Cross-validated regression via a Generalized Linear Model (GLM) with penalties. For more on GLMs and on these parameters, see the documentation for
GeneralizedLinearRegressor.- Parameters:
l1_ratio (float or array of floats, optional (default=0)) – If you pass
l1_ratioas an array, thefitmethod will choose the best value ofl1_ratioand store it asself.l1_ratio.P1 ({'identity', array-like, None}, shape (n_features,), optional) – (default=’identity’) This array controls the strength of the regularization for each coefficient independently. A high value will lead to higher regularization while a value of zero will remove the regularization on this parameter. Note that
n_features = X.shape[1]. IfXis a pandas DataFrame with a categorical dtype and P1 has the same size as the number of columns, the penalty of the categorical column will be applied to all the levels of the categorical.P2 ({'identity', array-like, sparse matrix, None}, shape (n_features,) or (n_features, n_features), optional (default='identity')) – With this option, you can set the P2 matrix in the L2 penalty
w*P2*w. This gives a fine control over this penalty (Tikhonov regularization). A 2d array is directly used as the square matrix P2. A 1d array is interpreted as diagonal (square) matrix. The default'identity'andNoneset the identity matrix, which gives the usual squared L2-norm. If you just want to exclude certain coefficients, pass a 1d array filled with 1 and 0 for the coefficients to be excluded. Note that P2 must be positive semi-definite. IfXis a pandas DataFrame with a categorical dtype and P2 has the same size as the number of columns, the penalty of the categorical column will be applied to all the levels of the categorical. Note that if P2 is two-dimensional, its size needs to be of the same length as the expandedXmatrix.fit_intercept (bool, optional (default=True)) – Specifies if a constant (a.k.a. bias or intercept) should be added to the linear predictor (
X * coef + intercept).family (str or ExponentialDispersionModel, optional (default='normal')) – The distributional assumption of the GLM, i.e. the loss function to minimize. If a string, one of:
'binomial','gamma','gaussian','inverse.gaussian','normal','poisson','tweedie'or'negative.binomial'. Note that'tweedie'sets the power of the Tweedie distribution to 1.5; to use another value, specify it in parentheses (e.g.,'tweedie (1.5)'). The same applies for'negative.binomial'and theta parameter.link ({'auto', 'identity', 'log', 'logit', 'cloglog'}, Link or None, optional (default='auto')) –
The link function of the GLM, i.e. mapping from linear predictor (
X * coef) to expectation (mu). Option'auto'sets the link depending on the chosen family as follows:'identity'for family'normal''log'for families'poisson','gamma','inverse.gaussian'and'negative.binomial'.'logit'for family'binomial'
solver ({'auto', 'closed-form', 'irls-cd', 'irls-ls', 'lbfgs', 'trust-constr'}, optional (default='auto')) –
Algorithm to use in the optimization problem:
'auto':'closed-form'for eligible Gaussian identity-link problems without L1 regularization,'irls-ls'for other pure-L2 cases, and'irls-cd'otherwise.'closed-form': Direct linear solve for eligible Gaussian identity-link problems (ridge/OLS/WLS).'irls-cd': Iteratively reweighted least squares with a coordinate descent inner solver. This can deal with L1 as well as L2 penalties. Note that in order to avoid unnecessary memory duplication of X in thefitmethod,Xshould be directly passed as a Fortran-contiguous Numpy array or sparse CSC matrix.'irls-ls': Iteratively reweighted least squares with a least squares inner solver. This algorithm cannot deal with L1 penalties.'lbfgs': Scipy’s L-BFGS-B optimizer. It cannot deal with L1 penalties.
max_iter (int, optional (default=100)) – The maximal number of iterations for solver algorithms.
max_inner_iter (int, optional (default=100000)) – The maximal number of iterations for the inner solver in the IRLS-CD algorithm. This parameter is only used when
solver='irls-cd'.gradient_tol (float, optional (default=None)) –
Stopping criterion. If
None, solver-specific defaults will be used. The default value for most solvers is1e-4, except for'trust-constr', which requires more conservative convergence settings and has a default value of1e-8.For the IRLS-LS, L-BFGS and trust-constr solvers, the iteration will stop when
max{|g_i|, i = 1, ..., n} <= tol, whereg_iis thei-th component of the gradient (derivative) of the objective function. For the CD solver, convergence is reached whensum_i(|minimum norm of g_i|), whereg_iis the subgradient of the objective and the minimum norm ofg_iis the element of the subgradient with the smallest L2 norm.If you wish to only use a step-size tolerance, set
gradient_tolto a very small number.step_size_tol (float, optional (default=None)) – Alternative stopping criterion. For the IRLS-LS and IRLS-CD solvers, the iteration will stop when the L2 norm of the step size is less than
step_size_tol. This stopping criterion is disabled whenstep_size_tolisNone.hessian_approx (float, optional (default=0.0)) – The threshold below which data matrix rows will be ignored for updating the Hessian. See the algorithm documentation for the IRLS algorithm for further details.
warm_start (bool, optional (default=False)) – Whether to reuse the solution of the previous call to
fitas initialization forcoef_andintercept_(supersedesstart_params). IfFalseor if the attributecoef_does not exist (first call tofit),start_paramssets the start values forcoef_andintercept_.n_alphas (int, optional (default=100)) – Number of alphas along the regularization path
alphas (array-like, optional (default=None)) – List of alphas for which to compute the models. If
None, the alphas are set automatically. SettingNoneis preferred.min_alpha_ratio (float, optional (default=None)) – Length of the path.
min_alpha_ratio=1e-6means thatmin_alpha / max_alpha = 1e-6. IfNone,1e-6is used whenn_samples >= n_features, else1e-2.min_alpha (float, optional (default=None)) – Minimum alpha to estimate the model with. The grid will then be created over
[max_alpha, min_alpha].start_params (array-like, shape (n_features*,), optional (default=None)) – Relevant only if
warm_startisFalseor iffitis called for the first time (so thatself.coef_does not exist yet). IfNone, all coefficients are set to zero and the start value for the intercept is the weighted average ofy(Iffit_interceptisTrue). If an array, used directly as start values; iffit_interceptisTrue, its first element is assumed to be the start value for theintercept_. Note thatn_features* = X.shape[1] + fit_intercept, i.e. it includes the intercept.selection (str, optional (default='cyclic')) – For the CD solver ‘cd’, the coordinates (features) can be updated in either cyclic or random order. If set to
'random', a random coefficient is updated every iteration rather than looping over features sequentially in the same order, which often leads to significantly faster convergence, especially whengradient_tolis higher than1e-4.random_state (int or RandomState, optional (default=None)) – The seed of the pseudo random number generator that selects a random feature to be updated for the CD solver. If an integer,
random_stateis the seed used by the random number generator; if aRandomStateinstance,random_stateis the random number generator; ifNone, the random number generator is theRandomStateinstance used bynp.random. Used whenselectionis'random'.copy_X (bool, optional (default=None)) – Whether to copy
X. SinceXis never modified byGeneralizedLinearRegressor, this is unlikely to be needed; this option exists mainly for compatibility with other scikit-learn estimators. IfFalse,Xwill not be copied and there will be an error if you pass anXin the wrong format, such as providing integerXand floaty(only guaranteed for numpy arrays and pandas data frames). IfNone,Xwill not be copied unless it is in the wrong format.check_input (bool, optional (default=True)) – Whether to bypass several checks on input:
yvalues in range offamily,sample_weightnon-negative,P2positive semi-definite. Don’t use this parameter unless you know what you are doing.verbose (int, optional (default=0)) – For the IRLS solver, any positive number will result in a pretty progress bar showing convergence. This features requires having the tqdm package installed. For the L-BFGS solver, set
verboseto any positive number for verbosity.scale_predictors (bool, optional (default=False)) –
If
True, estimate a scaled model where all predictors have a standard deviation of 1. This can result in better estimates if predictors are on very different scales (for example, centimeters and kilometers).Advanced developer note: Internally, predictors are always rescaled for computational reasons, but this only affects results if
scale_predictorsisTrue.lower_bounds (array-like, shape (n_features), optional (default=None)) – Set a lower bound for the coefficients. Setting bounds forces the use of the coordinate descent solver (
'irls-cd').upper_bounds (array-like, shape=(n_features), optional (default=None)) – See
lower_bounds.A_ineq (array-like, shape=(n_constraints, n_features), optional (default=None)) – Constraint matrix for linear inequality constraints of the form
A_ineq w <= b_ineq.b_ineq (array-like, shape=(n_constraints,), optional (default=None)) – Constraint vector for linear inequality constraints of the form
A_ineq w <= b_ineq.cv (int, cross-validation generator or Iterable, optional (default=None)) –
Determines the cross-validation splitting strategy. One of:
None, to use the default 5-fold cross-validation,int, to specify the number of folds.Iterableyielding (train, test) splits as arrays of indices.
For integer/
Noneinputs,KFoldis usedn_jobs (int, optional (default=None)) – The maximum number of concurrently running jobs. The number of jobs that are needed is
len(l1_ratio)xn_folds.-1is the same as the number of CPU on your machine.Nonemeans1unless in ajoblib.parallel_backendcontext.drop_first (bool, optional (default = False)) – If
True, drop the first column when encoding categorical variables. Set this to True when alpha=0 and solver=’auto’ to prevent an error due to a singular feature matrix. In the case of using a formula with interactions, setting this argument toTrueensures structural full-rankness (it is equivalent toensure_full_rankin formulaic and tabmat).formula (FormulaSpec) – A formula accepted by formulaic. It can either be a one-sided formula, in which case
ymust be specified infit, or a two-sided formula, in which caseymust beNone.interaction_separator (str, default=":") – The separator between the names of interacted variables.
categorical_format (str, default="{name}[T.{category}]") – The format string used to generate the names of categorical variables. Has to include the placeholders
{name}and{category}. Only used ifformulais notNone.monotonic_constraints (Mapping[str, str] | None)
force_all_finite (bool)
robust (bool)
expected_information (bool)
cat_missing_method (str)
cat_missing_name (str)
- alpha_
The amount of regularization chosen by cross validation.
- Type:
float
- alphas_
Alphas used by the model.
- Type:
array, shape (n_l1_ratios, n_alphas)
- l1_ratio_
The compromise between L1 and L2 regularization chosen by cross validation.
- Type:
float
- coef_
Estimated coefficients for the linear predictor in the GLM at the optimal (
l1_ratio_,alpha_).- Type:
array, shape (n_features,)
- intercept_
Intercept (a.k.a. bias) added to linear predictor.
- Type:
float
- n_iter_
The number of iterations run by the CD solver to reach the specified tolerance for the optimal alpha.
- Type:
int
- coef_path_
Estimated coefficients for the linear predictor in the GLM at every point along the regularization path, per fold and l1_ratio.
- Type:
array, shape (n_folds, n_l1_ratios, n_alphas, n_features)
- intercept_path_
Estimated intercepts at every point along the regularization path, per fold and l1_ratio.
- Type:
array, shape (n_folds, n_l1_ratios, n_alphas, 1)
- deviance_path_
Deviance for the test set on each fold, varying alpha.
- Type:
array, shape(n_folds, n_l1_ratios, n_alphas)
- train_deviance_path_
Deviance for the training set on each fold, varying alpha.
- Type:
array, shape(n_folds, n_l1_ratios, n_alphas)
- robust
If true, then robust standard errors are computed by default.
- Type:
bool, optional (default = False)
- expected_information
If true, then the expected information matrix is computed by default. Only relevant when computing robust standard errors.
- Type:
bool, optional (default = False)
- categorical_format
Format string for categorical features. The format string should contain the placeholder
{name}for the feature name and{category}for the category name. Only used ifXis a pandas DataFrame.- Type:
str, optional (default = “{name}[{category}]”)
- cat_missing_method
How to handle missing values in categorical columns. Only used if
Xis a pandas data frame. - if ‘fail’, raise an error if there are missing values - if ‘zero’, missing values will represent all-zero indicator columns. - if ‘convert’, missing values will be converted to thecat_missing_namecategory.
- Type:
str {‘fail’|’zero’|’convert’}, default=’fail’
- cat_missing_name
Name of the category to which missing values will be converted if
cat_missing_method='convert'. Only used ifXis a pandas data frame.- Type:
str, default=’(MISSING)’
- col_means_
The means of the columns of the design matrix
X.- Type:
array, shape (n_features,)
- col_stds_
The standard deviations of the columns of the design matrix
X.- Type:
array, shape (n_features,)
- coef_table(X=None, y=None, sample_weight=None, offset=None, *, confidence_level=0.95, mu=None, dispersion=None, robust=None, clusters=None, expected_information=None, context=None)
Get a table of of the regression coefficients.
Includes coefficient estimates, standard errors, t-values, p-values and confidence intervals.
- Parameters:
confidence_level (float, optional, default=0.95) – The confidence level for the confidence intervals.
X ({array-like, sparse matrix}, shape (n_samples, n_features), optional) – Training data. Can be omitted if a covariance matrix has already been computed or if standard errors, etc. are not desired.
y (array-like, shape (n_samples,), optional) – Target values. Can be omitted if a covariance matrix has already been computed.
mu (array-like, optional, default=None) – Array with predictions. Estimated if absent.
offset (array-like, optional, default=None) – Array with additive offsets.
sample_weight (array-like, shape (n_samples,), optional, default=None) – Individual weights for each sample.
dispersion (float, optional, default=None) – The dispersion parameter. Estimated if absent.
robust (boolean, optional, default=None) – Whether to compute robust standard errors instead of normal ones. If not specified, the model’s
robustattribute is used.clusters (array-like, optional, default=None) – Array with cluster membership. Clustered standard errors are computed if clusters is not None.
expected_information (boolean, optional, default=None) – Whether to use the expected or observed information matrix. Only relevant when computing robust standard errors. If not specified, the model’s
expected_informationattribute is used.context (Optional[Union[int, Mapping[str, Any]]], default=None) – The context to add to the evaluation context of the formula with, e.g., custom transforms. If an integer, the context is taken from the stack frame of the caller at the given depth. Otherwise, a mapping from variable names to values is expected. By default, no context is added. Set
context=0to make the calling scope available.
- Returns:
A table of the regression results.
- Return type:
pandas.DataFrame
- covariance_matrix(X=None, y=None, sample_weight=None, offset=None, *, mu=None, dispersion=None, robust=None, clusters=None, expected_information=None, store_covariance_matrix=False, skip_checks=False, context=None)
Calculate the covariance matrix for generalized linear models.
- Parameters:
X ({array-like, sparse matrix}, shape (n_samples, n_features), optional) – Training data. Can be omitted if a covariance matrix has already been computed.
y (array-like, shape (n_samples,), optional) – Target values. Can be omitted if a covariance matrix has already been computed.
mu (array-like, optional, default=None) – Array with predictions. Estimated if absent.
offset (array-like, optional, default=None) – Array with additive offsets.
sample_weight (array-like, shape (n_samples,), optional, default=None) – Individual weights for each sample.
dispersion (float, optional, default=None) – The dispersion parameter. Estimated if absent.
robust (boolean, optional, default=None) – Whether to compute robust standard errors instead of normal ones. If not specified, the model’s
robustattribute is used.clusters (array-like, optional, default=None) – Array with cluster membership. Clustered standard errors are computed if clusters is not None.
expected_information (boolean, optional, default=None) – Whether to use the expected or observed information matrix. Only relevant when computing robust standard errors. If not specified, the model’s
expected_informationattribute is used.store_covariance_matrix (boolean, optional, default=False) – Whether to store the covariance matrix in the model instance. If a covariance matrix has already been stored, it will be overwritten.
skip_checks (boolean, optional, default=False) – Whether to skip input validation. For internal use only.
context (Optional[Union[int, Mapping[str, Any]]], default=None) – The context to add to the evaluation context of the formula with, e.g., custom transforms. If an integer, the context is taken from the stack frame of the caller at the given depth. Otherwise, a mapping from variable names to values is expected. By default, no context is added. Set
context=0to make the calling scope available.
Notes
We support three types of covariance matrices:
non-robust
robust (HC-1)
clustered
For maximum-likelihood estimator, the covariance matrix takes the form \(\mathcal{H}^{-1}(\theta_0)\mathcal{I}(\theta_0) \mathcal{H}^{-1}(\theta_0)\) where \(\mathcal{H}^{-1}\) is the inverse Hessian and \(\mathcal{I}\) is the Information matrix. The different types of covariance matrices use different approximation of these quantities.
The non-robust covariance matrix is computed as the inverse of the Fisher information matrix. This assumes that the information matrix equality holds.
The robust (HC-1) covariance matrix takes the form \(\mathbf{H}^{−1} (\hat{\theta})\mathbf{G}^{T}(\hat{\theta})\mathbf{G}(\hat{\theta}) \mathbf{H}^{−1}(\hat{\theta})\) where \(\mathbf{H}\) is the empirical Hessian and \(\mathbf{G}\) is the gradient. We apply a finite-sample correction of \(\frac{N}{N-p}\).
The clustered covariance matrix uses a similar approach to the robust (HC-1) covariance matrix. However, instead of using \(\mathbf{G}^{T}( \hat{\theta}\mathbf{G}(\hat{\theta})\) directly, we first sum over all the groups first. The finite-sample correction is affected as well, becoming \(\frac{M}{M-1}\frac{N}{N-p}\) where \(M\) is the number of groups.
References
- property family_instance: ExponentialDispersionModel
Return an
ExponentialDispersionModel.
- fit(X, y=None, sample_weight=None, offset=None, *, store_covariance_matrix=False, clusters=None, context=None)
Choose the best model along a ‘regularization path’ by cross-validation.
- Parameters:
X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Training data. Note that a
float32matrix is acceptable and will result in the entire algorithm being run in 32-bit precision. However, for problems that are poorly conditioned, this might result in poor convergence or flawed parameter estimates. If a Pandas data frame is provided, it may contain categorical columns. In that case, a separate coefficient will be estimated for each category. No category is omitted. This means that some regularization is required to fit models with an intercept or models with several categorical columns.y (array-like, shape (n_samples,)) – Target values.
sample_weight (array-like, shape (n_samples,), optional (default=None)) – Individual weights w_i for each sample. Note that, for an Exponential Dispersion Model (EDM), one has \(\mathrm{var}(y_i) = \phi \times v(mu) / w_i\). If \(y_i \sim EDM(\mu, \phi / w_i)\), then \(\sum w_i y_i / \sum w_i \sim EDM(\mu, \phi / \sum w_i)\), i.e. the mean of \(y\) is a weighted average with weights equal to
sample_weight.offset (array-like, shape (n_samples,), optional (default=None)) – Added to linear predictor. An offset of 3 will increase expected
yby 3 if the link is linear and will multiply expectedyby 3 if the link is logarithmic.store_covariance_matrix (bool, optional (default=False)) – Whether to store the covariance matrix of the parameter estimates corresponding to the best best model.
clusters (array-like, optional, default=None) – Array with cluster membership. Clustered standard errors are computed if clusters is not None.
context (Optional[Union[int, Mapping[str, Any]]], default=None) – The context to add to the evaluation context of the formula with, e.g., custom transforms. If an integer, the context is taken from the stack frame of the caller at the given depth. Otherwise, a mapping from variable names to values is expected. By default, no context is added. Set
context=0to make the calling scope available.
- get_formatted_diagnostics(*, full_report=False, custom_columns=None)
Get formatted diagnostics which can be printed with report_diagnostics.
- Parameters:
full_report (bool, optional (default=False)) – Print all available information. When
Falseandcustom_columnsisNone, a restricted set of columns is printed out.custom_columns (iterable, optional (default=None)) – Print only the specified columns.
- Return type:
str | DataFrame
- get_metadata_routing()
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequestencapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)
Get parameters for this estimator.
- Parameters:
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
params – Parameter names mapped to their values.
- Return type:
dict
- linear_predictor(X, offset=None, *, alpha_index=None, alpha=None, context=None)
Compute the linear predictor,
X * coef_ + intercept_.When neither
alpha_indexnoralphaare given, predictions are for the best CV-selected(l1_ratio_, alpha_).When either
alpha_indexoralphaare specified, predictions are for the corresponding alpha values on the full-data refit path for the bestl1_ratio_.- Parameters:
X (array-like, shape (n_samples, n_features)) – Observations.
offset (array-like, shape (n_samples,), optional (default=None)) – Offset added to the linear predictor.
alpha_index (int or sequence of int, optional (default=None)) – Index (or indices) into the alpha path from the full-data refit. Incompatible with
alpha.alpha (float or sequence of float, optional (default=None)) – Alpha value(s) to predict at, resolved to the closest index on the refit alpha path. Incompatible with
alpha_index.context (int or mapping, optional (default=None)) – The context to add to the evaluation context of the formula with, e.g., custom transforms. If an integer, the context is taken from the stack frame of the caller at the given depth. Otherwise, a mapping from variable names to values is expected. By default, no context is added. Set
context=0to make the calling scope available.
- Returns:
Shape
(n_samples,)when noalpha_index/alphais given or when a scalar alpha is passed. Shape(n_samples, len(alpha_index))when a sequence is passed.- Return type:
np.ndarray
- predict(X, sample_weight=None, offset=None, *, alpha_index=None, alpha=None, context=None)
Predict using GLM with feature matrix
X.If
alpha_searchisTrue, butalpha_indexandalphaare bothNone, we use the last alpha valueself._alphas[-1].- Parameters:
X (array-like, shape (n_samples, n_features)) – Observations.
Xmay be a pandas data frame with categorical types. IfXwas also a data frame with categorical types during fitting and a category wasn’t observed at that point, the corresponding prediction will benumpy.nan.sample_weight (array-like, shape (n_samples,), optional (default=None)) – Sample weights to multiply predictions by.
offset (array-like, shape (n_samples,), optional (default=None))
alpha_index (int or list[int], optional (default=None)) – Sets the index of the alpha(s) to use in case
alpha_searchisTrue. Incompatible withalpha(see below).alpha (float or list[float], optional (default=None)) – Sets the alpha(s) to use in case
alpha_searchisTrue. Incompatible withalpha_index(see above).context (Optional[Union[int, Mapping[str, Any]]], default=None) – The context to add to the evaluation context of the formula with, e.g., custom transforms. If an integer, the context is taken from the stack frame of the caller at the given depth. Otherwise, a mapping from variable names to values is expected. By default, no context is added. Set
context=0to make the calling scope available.
- Returns:
Shape
(n_samples,)when noalpha_index/alphais given or when a scalar alpha is passed. Shape(n_samples, len(alpha_index))when a sequence is passed.- Return type:
np.ndarray
- report_diagnostics(*, full_report=False, custom_columns=None)
Print diagnostics to
stdout.- Parameters:
full_report (bool, optional (default=False)) – Print all available information. When
Falseandcustom_columnsisNone, a restricted set of columns is printed out.custom_columns (iterable, optional (default=None)) – Print only the specified columns.
- Return type:
None
- score(X, y, sample_weight=None, offset=None, *, context=None)
Compute \(D^2\), the percentage of deviance explained.
\(D^2\) is a generalization of the coefficient of determination \(R^2\). The \(R^2\) uses the squared error and the \(D^2\), the deviance. Note that those two are equal for
family='normal'.\(D^2\) is defined as \(D^2 = 1 - \frac{D(y_{\mathrm{true}}, y_{\mathrm{pred}})} {D_{\mathrm{null}}}\), \(D_{\mathrm{null}}\) is the null deviance, i.e. the deviance of a model with intercept alone. The best possible score is one and it can be negative.
- Parameters:
X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Test samples.
y (array-like, shape (n_samples,)) – True values of target.
sample_weight (array-like, shape (n_samples,), optional (default=None)) – Sample weights.
offset (array-like, shape (n_samples,), optional (default=None))
context (Optional[Union[int, Mapping[str, Any]]], default=None) – The context to add to the evaluation context of the formula with, e.g., custom transforms. If an integer, the context is taken from the stack frame of the caller at the given depth. Otherwise, a mapping from variable names to values is expected. By default, no context is added. Set
context=0to make the calling scope available.
- Returns:
D^2 of self.predict(X) w.r.t. y.
- Return type:
float
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- std_errors(X=None, y=None, sample_weight=None, offset=None, *, mu=None, dispersion=None, robust=None, clusters=None, expected_information=None, store_covariance_matrix=False, context=None)
Calculate standard errors for generalized linear models.
See covariance_matrix for an in-depth explanation of how the standard errors are computed.
- Parameters:
X ({array-like, sparse matrix}, shape (n_samples, n_features), optional) – Training data. Can be omitted if a covariance matrix has already been computed.
y (array-like, shape (n_samples,), optional) – Target values. Can be omitted if a covariance matrix has already been computed.
sample_weight (array-like, shape (n_samples,), optional, default=None) – Individual weights for each sample.
offset (array-like, optional, default=None) – Array with additive offsets.
mu (array-like, optional, default=None) – Array with predictions. Estimated if absent.
dispersion (float, optional, default=None) – The dispersion parameter. Estimated if absent.
robust (boolean, optional, default=None) – Whether to compute robust standard errors instead of normal ones. If not specified, the model’s
robustattribute is used.clusters (array-like, optional, default=None) – Array with cluster membership. Clustered standard errors are computed if clusters is not None.
expected_information (boolean, optional, default=None) – Whether to use the expected or observed information matrix. Only relevant when computing robust standard errors. If not specified, the model’s
expected_informationattribute is used.store_covariance_matrix (boolean, optional, default=False) – Whether to store the covariance matrix in the model instance. If a covariance matrix has already been stored, it will be overwritten.
context (Optional[Union[int, Mapping[str, Any]]], default=None) – The context to add to the evaluation context of the formula with, e.g., custom transforms. If an integer, the context is taken from the stack frame of the caller at the given depth. Otherwise, a mapping from variable names to values is expected. By default, no context is added. Set
context=0to make the calling scope available.
- wald_test(X=None, y=None, sample_weight=None, offset=None, *, R=None, features=None, terms=None, formula=None, r=None, mu=None, dispersion=None, robust=None, clusters=None, expected_information=None, context=None)
Compute the Wald test statistic and p-value for a linear hypothesis.
The left hand side of the hypothesis may be specified in the following ways:
R: The restriction matrix representing the linear combination of coefficients to test.features: The name of a feature or a list of features to test.terms: The name of a term or a list of terms to test.formula: A formula string specifying the hypothesis to test.
The right hand side of the tested hypothesis is specified by
r. In the case of aterms-based test, the null hypothesis is that each coefficient relating to a term equals the corresponding value inr.- Parameters:
X ({array-like, sparse matrix}, shape (n_samples, n_features), optional) – Training data. Can be omitted if a covariance matrix has already been computed.
y (array-like, shape (n_samples,), optional) – Target values. Can be omitted if a covariance matrix has already been computed.
sample_weight (array-like, shape (n_samples,), optional, default=None) – Individual weights for each sample.
offset (array-like, optional, default=None) – Array with additive offsets.
R (np.ndarray, optional, default=None) – The restriction matrix representing the linear combination of coefficients to test.
features (Union[str, list[str]], optional, default=None) – The name of a feature or a list of features to test.
terms (Union[str, list[str]], optional, default=None) – The name of a term or a list of terms to test. It can cover one or more coefficients. In the case of a model based on a formula, a term is one of the expressions separated by
+signs. Otherwise, a term is one column in the input data. As categorical variables need not be one-hot encoded in glum, in their case, the hypothesis to be tested is that the coefficients of all categories are equal tor.r (Sequence, optional, default=None) – The vector representing the values of the linear combination. If None, the test is for whether the linear combinations of the coefficients are zero.
mu (array-like, optional, default=None) – Array with predictions. Estimated if absent.
dispersion (float, optional, default=None) – The dispersion parameter. Estimated if absent.
robust (boolean, optional, default=None) – Whether to compute robust standard errors instead of normal ones. If not specified, the model’s
robustattribute is used.clusters (array-like, optional, default=None) – Array with cluster membership. Clustered standard errors are computed if clusters is not None.
expected_information (boolean, optional, default=None) – Whether to use the expected or observed information matrix. Only relevant when computing robust standard errors. If not specified, the model’s
expected_informationattribute is used.context (Optional[Union[int, Mapping[str, Any]]], default=None) – The context to add to the evaluation context of the formula with, e.g., custom transforms. If an integer, the context is taken from the stack frame of the caller at the given depth. Otherwise, a mapping from variable names to values is expected. By default, no context is added. Set
context=0to make the calling scope available.formula (str | None)
- Returns:
NamedTuple with test statistic, p-value, and degrees of freedom.
- Return type:
WaldTestResult
- class glum.IdentityLink
Bases:
LinkThe identity link function.
- derivative(mu)
Compute the derivative of the link function.
- Parameters:
mu (array-like, shape (n_samples,)) – Usually the (predicted) mean.
- inverse(lin_pred)
Compute the inverse link function.
The inverse link function
hgives the inverse relationship between the linear predictor,X * w, and the mean,mu ≡ E(Y), so thath(X * w) = mu.- Parameters:
lin_pred (array-like, shape (n_samples,)) – Usually the (fitted) linear predictor.
- inverse_derivative(lin_pred)
Compute the derivative of the inverse link function.
- Parameters:
lin_pred (array-like, shape (n_samples,)) – Usually the (fitted) linear predictor.
- inverse_derivative2(lin_pred)
Compute second derivative of the inverse link function.
- Parameters:
lin_pred (array-like, shape (n_samples,)) – Usually the (fitted) linear predictor.
- link(mu)
Compute the link function.
The link function
glinks the mean,mu ≡ E(Y), to the linear predictor,X * w, so thatg(mu)is equal to the linear predictor.- Parameters:
mu (array-like, shape (n_samples,)) – Usually the (predicted) mean.
- to_tweedie(safe=True)
Return the Tweedie representation of a link function if it exists.
- class glum.InverseGaussianDistribution
Bases:
ExponentialDispersionModelClass for the inverse Gaussian distribution.
The inverse Gaussian distribution models outcomes
yin(0, +∞).See the documentation of the superclass,
ExponentialDispersionModel, for details.- deviance(y, mu, sample_weight=None)
Compute the deviance.
The deviance is a weighted sum of the unit deviances. In terms of the unit log likelihood \(\ell\), it equals \(2 \sum_i [\ell(y_i, y_i, \phi) - \ell(y_i, \mu, \phi)]\), i.e. twice the difference between the log likelihood of a saturated model (with one parameter per observation) and the model at hand.
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which the variance is inversely proportional.
- Return type:
float
- deviance_derivative(y, mu, sample_weight=1)
Compute the derivative of the deviance with respect to
mu.- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,) (default=1)) – Weights or exposure to which variance is inverse proportional.
- Return type:
array-like, shape (n_samples,)
- dispersion(y, mu, sample_weight=None, ddof=1, method='pearson')
Estimate the dispersion parameter \(\phi\).
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inversely proportional.
ddof (int, optional (default=1)) – Degrees of freedom consumed by the model for
mu.{'pearson' (method =) – Whether to base the estimate on the Pearson residuals or the deviance.
'deviance'} – Whether to base the estimate on the Pearson residuals or the deviance.
(default='pearson') (optional) – Whether to base the estimate on the Pearson residuals or the deviance.
- Return type:
float
- eta_mu_deviance(link, factor, cur_eta, X_dot_d, y, sample_weight)
Compute
eta,muand the deviance.- Returns:
numpy.ndarray, shape (X.shape[0],) – The linear predictor,
eta, ascur_eta + factor * X_dot_d.numpy.ndarray, shape (X.shape[0],) – The link-function-transformed prediction,
mu.float – The deviance.
- Parameters:
link (Link)
factor (float)
- Return type:
tuple[ndarray, ndarray, float]
- in_y_range(x)
Return
Trueifxis in the valid range of the EDM.- Return type:
ndarray
- log_likelihood(y, mu, sample_weight=None, dispersion=None)
Compute the log likelihood.
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Sample weights.
dispersion (float, optional (default=None)) – Dispersion parameter \(\phi\). Estimated if
None.
- Return type:
float
- rowwise_gradient_hessian(link, coef, dispersion, X, y, sample_weight, eta, mu, offset=None)
Compute the gradient and negative Hessian of the log likelihood row-wise.
- Returns:
numpy.ndarray, shape (X.shape[0],) – The gradient of the log likelihood, row-wise.
numpy.ndarray, shape (X.shape[0],) – The negative Hessian of the log likelihood, row-wise.
- Parameters:
link (Link)
X (MatrixBase | StandardizedMatrix)
- to_tweedie(safe=True)
Return the Tweedie representation of a distribution if it exists.
- unit_deviance(y, mu)
Compute the unit deviance.
In terms of the unit log likelihood \(\ell\), the unit deviance is \(2 [\ell(y_i, y_i, \phi) - \ell(y_i, \mu, \phi)]\), i.e. twice the difference between the log likelihood of a saturated model (with one parameter per observation) and the model at hand.
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
- unit_deviance_derivative(y, mu)
Compute the derivative of the unit deviance with respect to
mu.The derivative of the unit deviance is given by \(2 \times (\mu - y) / v(\mu)\), where \(v(\mu)\) is the unit variance.
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
- Return type:
array-like, shape (n_samples,)
- unit_variance(mu)
Compute the unit variance.
The unit variance, \(v(\mu) \equiv b''((b')^{-1} (\mu))\), determines the variance as a function of the mean \(\mu\) by \(\mathrm{var}(y_i) = v(\mu_i) \times \phi / w_i\). It can also be derived from the unit deviance \(d(y, \mu)\) as
\[v(\mu) = 2 \div \frac{\partial^2 d(y, \mu)}{\partial\mu^2} \big| _{y=\mu}.\]See also
variance().- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
- Return type:
ndarray
- unit_variance_derivative(mu)
Compute the derivative of the unit variance with respect to
mu.- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
- Return type:
ndarray
- variance(mu, dispersion=1, sample_weight=1)
Compute the variance function.
The variance of \(Y_i \sim \mathrm{EDM}(\mu_i, \phi / w_i)\) takes the form \(v(\mu_i) \times \phi / w_i\), where \(v(\mu)\) is the unit variance and \(w_i\) are weights.
- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
dispersion (float, optional (default=1)) – Dispersion parameter \(\phi\).
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.
- Return type:
array-like, shape (n_samples,)
- variance_derivative(mu, dispersion=1, sample_weight=1)
Compute the derivative of the variance with respect to
mu.The derivative of the variance is equal to \(v(\mu_i) \times \phi / w_i\), where \(v(\mu)\) is the unit variance and \(ws_i\) are weights.
- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
dispersion (float, optional (default=1)) – Dispersion parameter \(\phi\).
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.
- Return type:
array-like, shape (n_samples,)
- class glum.Link
Bases:
objectAbstract base class for link functions.
- abstractmethod derivative(mu)
Compute the derivative of the link function.
- Parameters:
mu (array-like, shape (n_samples,)) – Usually the (predicted) mean.
- abstractmethod inverse(lin_pred)
Compute the inverse link function.
The inverse link function
hgives the inverse relationship between the linear predictor,X * w, and the mean,mu ≡ E(Y), so thath(X * w) = mu.- Parameters:
lin_pred (array-like, shape (n_samples,)) – Usually the (fitted) linear predictor.
- abstractmethod inverse_derivative(lin_pred)
Compute the derivative of the inverse link function.
- Parameters:
lin_pred (array-like, shape (n_samples,)) – Usually the (fitted) linear predictor.
- abstractmethod inverse_derivative2(lin_pred)
Compute second derivative of the inverse link function.
- Parameters:
lin_pred (array-like, shape (n_samples,)) – Usually the (fitted) linear predictor.
- abstractmethod link(mu)
Compute the link function.
The link function
glinks the mean,mu ≡ E(Y), to the linear predictor,X * w, so thatg(mu)is equal to the linear predictor.- Parameters:
mu (array-like, shape (n_samples,)) – Usually the (predicted) mean.
- to_tweedie(safe=True)
Return the Tweedie representation of a link function if it exists.
- class glum.LogLink
Bases:
LinkThe log link function
log(x).- derivative(mu)
Compute the derivative of the link function.
- Parameters:
mu (array-like, shape (n_samples,)) – Usually the (predicted) mean.
- inverse(lin_pred)
Compute the inverse link function.
The inverse link function
hgives the inverse relationship between the linear predictor,X * w, and the mean,mu ≡ E(Y), so thath(X * w) = mu.- Parameters:
lin_pred (array-like, shape (n_samples,)) – Usually the (fitted) linear predictor.
- inverse_derivative(lin_pred)
Compute the derivative of the inverse link function.
- Parameters:
lin_pred (array-like, shape (n_samples,)) – Usually the (fitted) linear predictor.
- inverse_derivative2(lin_pred)
Compute second derivative of the inverse link function.
- Parameters:
lin_pred (array-like, shape (n_samples,)) – Usually the (fitted) linear predictor.
- link(mu)
Compute the link function.
The link function
glinks the mean,mu ≡ E(Y), to the linear predictor,X * w, so thatg(mu)is equal to the linear predictor.- Parameters:
mu (array-like, shape (n_samples,)) – Usually the (predicted) mean.
- to_tweedie(safe=True)
Return the Tweedie representation of a link function if it exists.
- class glum.LogitLink
Bases:
LinkThe logit link function
logit(x).- derivative(mu)
Compute the derivative of the link function.
- Parameters:
mu (array-like, shape (n_samples,)) – Usually the (predicted) mean.
- inverse(lin_pred)
Compute the inverse link function.
The inverse link function
hgives the inverse relationship between the linear predictor,X * w, and the mean,mu ≡ E(Y), so thath(X * w) = mu.- Parameters:
lin_pred (array-like, shape (n_samples,)) – Usually the (fitted) linear predictor.
- inverse_derivative(lin_pred)
Compute the derivative of the inverse link function.
- Parameters:
lin_pred (array-like, shape (n_samples,)) – Usually the (fitted) linear predictor.
- inverse_derivative2(lin_pred)
Compute second derivative of the inverse link function.
- Parameters:
lin_pred (array-like, shape (n_samples,)) – Usually the (fitted) linear predictor.
- link(mu)
Compute the link function.
The link function
glinks the mean,mu ≡ E(Y), to the linear predictor,X * w, so thatg(mu)is equal to the linear predictor.- Parameters:
mu (array-like, shape (n_samples,)) – Usually the (predicted) mean.
- to_tweedie(safe=True)
Return the Tweedie representation of a link function if it exists.
- class glum.NegativeBinomialDistribution(theta=1.0)
Bases:
ExponentialDispersionModelA class for the Negative Binomial distribution.
A negative binomial distribution with mean \(\mu = \mathrm{E}(Y)\) is uniquely defined by its mean-variance relationship \(\mathrm{var}(Y) \propto \mu + \theta * \mu^2\).
- Parameters:
theta (float, optional (default=1.0)) – The dispersion parameter from the
unit_variance\(v(\mu) = \mu + \theta * \mu^2\). For \(\theta <= 0\), no distribution exists.
References
- For the log-likelihood and deviance:
M. L. Zwilling Negative Binomial Regression, The Mathematica Journal 2013. https://www.mathematica-journal.com/2013/06/27/negative-binomial-regression/
- deviance(y, mu, sample_weight=None)
Compute the deviance.
The deviance is a weighted sum of the unit deviances. In terms of the unit log likelihood \(\ell\), it equals \(2 \sum_i [\ell(y_i, y_i, \phi) - \ell(y_i, \mu, \phi)]\), i.e. twice the difference between the log likelihood of a saturated model (with one parameter per observation) and the model at hand.
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which the variance is inversely proportional.
- Return type:
float
- deviance_derivative(y, mu, sample_weight=1)
Compute the derivative of the deviance with respect to
mu.- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,) (default=1)) – Weights or exposure to which variance is inverse proportional.
- Return type:
array-like, shape (n_samples,)
- dispersion(y, mu, sample_weight=None, ddof=1, method='pearson')
Estimate the dispersion parameter \(\phi\).
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inversely proportional.
ddof (int, optional (default=1)) – Degrees of freedom consumed by the model for
mu.{'pearson' (method =) – Whether to base the estimate on the Pearson residuals or the deviance.
'deviance'} – Whether to base the estimate on the Pearson residuals or the deviance.
(default='pearson') (optional) – Whether to base the estimate on the Pearson residuals or the deviance.
- Return type:
float
- eta_mu_deviance(link, factor, cur_eta, X_dot_d, y, sample_weight)
Compute
eta,muand the deviance.- Returns:
numpy.ndarray, shape (X.shape[0],) – The linear predictor,
eta, ascur_eta + factor * X_dot_d.numpy.ndarray, shape (X.shape[0],) – The link-function-transformed prediction,
mu.float – The deviance.
- Parameters:
link (Link)
factor (float)
- Return type:
tuple[ndarray, ndarray, float]
- in_y_range(x)
Return
Trueifxis in the valid range of the EDM.- Return type:
ndarray
- log_likelihood(y, mu, sample_weight=None, dispersion=1)
Compute the log likelihood.
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Sample weights.
dispersion (float, optional (default=1.0)) – Ignored.
- Return type:
float
- rowwise_gradient_hessian(link, coef, dispersion, X, y, sample_weight, eta, mu, offset=None)
Compute the gradient and negative Hessian of the log likelihood row-wise.
- Returns:
numpy.ndarray, shape (X.shape[0],) – The gradient of the log likelihood, row-wise.
numpy.ndarray, shape (X.shape[0],) – The negative Hessian of the log likelihood, row-wise.
- Parameters:
link (Link)
X (MatrixBase | StandardizedMatrix)
- property theta
Return the negative binomial theta parameter.
- to_tweedie(safe=True)
Return the Tweedie representation of a distribution if it exists.
- unit_deviance(y, mu)
Compute the unit deviance.
In terms of the unit log likelihood \(\ell\), the unit deviance is \(2 [\ell(y_i, y_i, \phi) - \ell(y_i, \mu, \phi)]\), i.e. twice the difference between the log likelihood of a saturated model (with one parameter per observation) and the model at hand.
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
- unit_deviance_derivative(y, mu)
Compute the derivative of the unit deviance with respect to
mu.The derivative of the unit deviance is given by \(2 \times (\mu - y) / v(\mu)\), where \(v(\mu)\) is the unit variance.
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
- Return type:
array-like, shape (n_samples,)
- unit_variance(mu)
Compute the unit variance.
The unit variance, \(v(\mu) \equiv b''((b')^{-1} (\mu))\), determines the variance as a function of the mean \(\mu\) by \(\mathrm{var}(y_i) = v(\mu_i) \times \phi / w_i\). It can also be derived from the unit deviance \(d(y, \mu)\) as
\[v(\mu) = 2 \div \frac{\partial^2 d(y, \mu)}{\partial\mu^2} \big| _{y=\mu}.\]See also
variance().- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
- unit_variance_derivative(mu)
Compute the derivative of the unit variance with respect to
mu.- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
- variance(mu, dispersion=1, sample_weight=1)
Compute the variance function.
The variance of \(Y_i \sim \mathrm{EDM}(\mu_i, \phi / w_i)\) takes the form \(v(\mu_i) \times \phi / w_i\), where \(v(\mu)\) is the unit variance and \(w_i\) are weights.
- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
dispersion (float, optional (default=1)) – Dispersion parameter \(\phi\).
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.
- Return type:
array-like, shape (n_samples,)
- variance_derivative(mu, dispersion=1, sample_weight=1)
Compute the derivative of the variance with respect to
mu.The derivative of the variance is equal to \(v(\mu_i) \times \phi / w_i\), where \(v(\mu)\) is the unit variance and \(ws_i\) are weights.
- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
dispersion (float, optional (default=1)) – Dispersion parameter \(\phi\).
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.
- Return type:
array-like, shape (n_samples,)
- class glum.NormalDistribution
Bases:
ExponentialDispersionModelClass for the normal (a.k.a. Gaussian) distribution.
The normal distribution models outcomes
yin(-∞, +∞).See the documentation of the superclass,
ExponentialDispersionModel, for details.- deviance(y, mu, sample_weight=None)
Compute the deviance.
The deviance is a weighted sum of the unit deviances. In terms of the unit log likelihood \(\ell\), it equals \(2 \sum_i [\ell(y_i, y_i, \phi) - \ell(y_i, \mu, \phi)]\), i.e. twice the difference between the log likelihood of a saturated model (with one parameter per observation) and the model at hand.
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which the variance is inversely proportional.
- Return type:
float
- deviance_derivative(y, mu, sample_weight=1)
Compute the derivative of the deviance with respect to
mu.- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,) (default=1)) – Weights or exposure to which variance is inverse proportional.
- Return type:
array-like, shape (n_samples,)
- dispersion(y, mu, sample_weight=None, ddof=1, method='pearson')
Estimate the dispersion parameter \(\phi\).
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inversely proportional.
ddof (int, optional (default=1)) – Degrees of freedom consumed by the model for
mu.{'pearson' (method =) – Whether to base the estimate on the Pearson residuals or the deviance.
'deviance'} – Whether to base the estimate on the Pearson residuals or the deviance.
(default='pearson') (optional) – Whether to base the estimate on the Pearson residuals or the deviance.
- Return type:
float
- eta_mu_deviance(link, factor, cur_eta, X_dot_d, y, sample_weight)
Compute
eta,muand the deviance.- Returns:
numpy.ndarray, shape (X.shape[0],) – The linear predictor,
eta, ascur_eta + factor * X_dot_d.numpy.ndarray, shape (X.shape[0],) – The link-function-transformed prediction,
mu.float – The deviance.
- Parameters:
link (Link)
factor (float)
- Return type:
tuple[ndarray, ndarray, float]
- in_y_range(x)
Return
Trueifxis in the valid range of the EDM.- Return type:
ndarray
- log_likelihood(y, mu, sample_weight=None, dispersion=None)
Compute the log likelihood.
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Sample weights.
dispersion (float, optional (default=None)) – Dispersion parameter \(\phi\). Estimated if
None.
- Return type:
float
- rowwise_gradient_hessian(link, coef, dispersion, X, y, sample_weight, eta, mu, offset=None)
Compute the gradient and negative Hessian of the log likelihood row-wise.
- Returns:
numpy.ndarray, shape (X.shape[0],) – The gradient of the log likelihood, row-wise.
numpy.ndarray, shape (X.shape[0],) – The negative Hessian of the log likelihood, row-wise.
- Parameters:
link (Link)
X (MatrixBase | StandardizedMatrix)
- to_tweedie(safe=True)
Return the Tweedie representation of a distribution if it exists.
- unit_deviance(y, mu)
Compute the unit deviance.
In terms of the unit log likelihood \(\ell\), the unit deviance is \(2 [\ell(y_i, y_i, \phi) - \ell(y_i, \mu, \phi)]\), i.e. twice the difference between the log likelihood of a saturated model (with one parameter per observation) and the model at hand.
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
- unit_deviance_derivative(y, mu)
Compute the derivative of the unit deviance with respect to
mu.The derivative of the unit deviance is given by \(2 \times (\mu - y) / v(\mu)\), where \(v(\mu)\) is the unit variance.
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
- Return type:
array-like, shape (n_samples,)
- unit_variance(mu)
Compute the unit variance.
The unit variance, \(v(\mu) \equiv b''((b')^{-1} (\mu))\), determines the variance as a function of the mean \(\mu\) by \(\mathrm{var}(y_i) = v(\mu_i) \times \phi / w_i\). It can also be derived from the unit deviance \(d(y, \mu)\) as
\[v(\mu) = 2 \div \frac{\partial^2 d(y, \mu)}{\partial\mu^2} \big| _{y=\mu}.\]See also
variance().- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
- Return type:
ndarray
- unit_variance_derivative(mu)
Compute the derivative of the unit variance with respect to
mu.- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
- Return type:
ndarray
- variance(mu, dispersion=1, sample_weight=1)
Compute the variance function.
The variance of \(Y_i \sim \mathrm{EDM}(\mu_i, \phi / w_i)\) takes the form \(v(\mu_i) \times \phi / w_i\), where \(v(\mu)\) is the unit variance and \(w_i\) are weights.
- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
dispersion (float, optional (default=1)) – Dispersion parameter \(\phi\).
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.
- Return type:
array-like, shape (n_samples,)
- variance_derivative(mu, dispersion=1, sample_weight=1)
Compute the derivative of the variance with respect to
mu.The derivative of the variance is equal to \(v(\mu_i) \times \phi / w_i\), where \(v(\mu)\) is the unit variance and \(ws_i\) are weights.
- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
dispersion (float, optional (default=1)) – Dispersion parameter \(\phi\).
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.
- Return type:
array-like, shape (n_samples,)
- class glum.PoissonDistribution
Bases:
ExponentialDispersionModelClass for the scaled Poisson distribution.
The Poisson distribution models discrete outcomes
yin[0, +∞).See the documentation of the superclass,
ExponentialDispersionModel, for details.- deviance(y, mu, sample_weight=None)
Compute the deviance.
The deviance is a weighted sum of the unit deviances. In terms of the unit log likelihood \(\ell\), it equals \(2 \sum_i [\ell(y_i, y_i, \phi) - \ell(y_i, \mu, \phi)]\), i.e. twice the difference between the log likelihood of a saturated model (with one parameter per observation) and the model at hand.
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which the variance is inversely proportional.
- Return type:
float
- deviance_derivative(y, mu, sample_weight=1)
Compute the derivative of the deviance with respect to
mu.- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,) (default=1)) – Weights or exposure to which variance is inverse proportional.
- Return type:
array-like, shape (n_samples,)
- dispersion(y, mu, sample_weight=None, ddof=1, method='pearson')
Estimate the dispersion parameter \(\phi\).
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inversely proportional.
ddof (int, optional (default=1)) – Degrees of freedom consumed by the model for
mu.{'pearson' (method =) – Whether to base the estimate on the Pearson residuals or the deviance.
'deviance'} – Whether to base the estimate on the Pearson residuals or the deviance.
(default='pearson') (optional) – Whether to base the estimate on the Pearson residuals or the deviance.
- Return type:
float
- eta_mu_deviance(link, factor, cur_eta, X_dot_d, y, sample_weight)
Compute
eta,muand the deviance.- Returns:
numpy.ndarray, shape (X.shape[0],) – The linear predictor,
eta, ascur_eta + factor * X_dot_d.numpy.ndarray, shape (X.shape[0],) – The link-function-transformed prediction,
mu.float – The deviance.
- Parameters:
link (Link)
factor (float)
- Return type:
tuple[ndarray, ndarray, float]
- in_y_range(x)
Return
Trueifxis in the valid range of the EDM.- Return type:
ndarray
- log_likelihood(y, mu, sample_weight=None, dispersion=None)
Compute the log likelihood.
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Sample weights.
dispersion (float, optional (default=None)) – Dispersion parameter \(\phi\). Estimated if
None.
- Return type:
float
- rowwise_gradient_hessian(link, coef, dispersion, X, y, sample_weight, eta, mu, offset=None)
Compute the gradient and negative Hessian of the log likelihood row-wise.
- Returns:
numpy.ndarray, shape (X.shape[0],) – The gradient of the log likelihood, row-wise.
numpy.ndarray, shape (X.shape[0],) – The negative Hessian of the log likelihood, row-wise.
- Parameters:
link (Link)
X (MatrixBase | StandardizedMatrix)
- to_tweedie(safe=True)
Return the Tweedie representation of a distribution if it exists.
- unit_deviance(y, mu)
Compute the unit deviance.
- unit_deviance_derivative(y, mu)
Compute the derivative of the unit deviance with respect to
mu.The derivative of the unit deviance is given by \(2 \times (\mu - y) / v(\mu)\), where \(v(\mu)\) is the unit variance.
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
- Return type:
array-like, shape (n_samples,)
- unit_variance(mu)
Compute the unit variance.
The unit variance, \(v(\mu) \equiv b''((b')^{-1} (\mu))\), determines the variance as a function of the mean \(\mu\) by \(\mathrm{var}(y_i) = v(\mu_i) \times \phi / w_i\). It can also be derived from the unit deviance \(d(y, \mu)\) as
\[v(\mu) = 2 \div \frac{\partial^2 d(y, \mu)}{\partial\mu^2} \big| _{y=\mu}.\]See also
variance().- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
- Return type:
ndarray
- unit_variance_derivative(mu)
Compute the derivative of the unit variance with respect to
mu.- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
- Return type:
ndarray
- variance(mu, dispersion=1, sample_weight=1)
Compute the variance function.
The variance of \(Y_i \sim \mathrm{EDM}(\mu_i, \phi / w_i)\) takes the form \(v(\mu_i) \times \phi / w_i\), where \(v(\mu)\) is the unit variance and \(w_i\) are weights.
- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
dispersion (float, optional (default=1)) – Dispersion parameter \(\phi\).
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.
- Return type:
array-like, shape (n_samples,)
- variance_derivative(mu, dispersion=1, sample_weight=1)
Compute the derivative of the variance with respect to
mu.The derivative of the variance is equal to \(v(\mu_i) \times \phi / w_i\), where \(v(\mu)\) is the unit variance and \(ws_i\) are weights.
- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
dispersion (float, optional (default=1)) – Dispersion parameter \(\phi\).
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.
- Return type:
array-like, shape (n_samples,)
- class glum.TweedieDistribution(power=0)
Bases:
ExponentialDispersionModelA class for the Tweedie distribution.
A Tweedie distribution with mean \(\mu = \mathrm{E}(Y)\) is uniquely defined by its mean-variance relationship \(\mathrm{var}(Y) \propto \mu^{\mathrm{p}}\).
Special cases are:
Power
Distribution
Support
0
Normal
(-∞, +∞)1
Poisson
[0, +∞)(1, 2)
Compound Poisson
[0, +∞)2
Gamma
(0, +∞)3
Inverse Gaussian
(0, +∞)See the documentation of the superclass,
ExponentialDispersionModel, for details.- Parameters:
power (float, optional (default=0)) – The variance power of the
unit_variance\(v(\mu) = \mu^{\mathrm{power}}\). For \(0 < \mathrm{power} < 1\), no distribution exists.
- deviance(y, mu, sample_weight=None)
Compute the deviance.
The deviance is a weighted sum of the unit deviances. In terms of the unit log likelihood \(\ell\), it equals \(2 \sum_i [\ell(y_i, y_i, \phi) - \ell(y_i, \mu, \phi)]\), i.e. twice the difference between the log likelihood of a saturated model (with one parameter per observation) and the model at hand.
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which the variance is inversely proportional.
- Return type:
float
- deviance_derivative(y, mu, sample_weight=1)
Compute the derivative of the deviance with respect to
mu.- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,) (default=1)) – Weights or exposure to which variance is inverse proportional.
- Return type:
array-like, shape (n_samples,)
- dispersion(y, mu, sample_weight=None, ddof=1, method='pearson')
Estimate the dispersion parameter \(\phi\).
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inversely proportional.
ddof (int, optional (default=1)) – Degrees of freedom consumed by the model for
mu.{'pearson' (method =) – Whether to base the estimate on the Pearson residuals or the deviance.
'deviance'} – Whether to base the estimate on the Pearson residuals or the deviance.
(default='pearson') (optional) – Whether to base the estimate on the Pearson residuals or the deviance.
- Return type:
float
- eta_mu_deviance(link, factor, cur_eta, X_dot_d, y, sample_weight)
Compute
eta,muand the deviance.- Returns:
numpy.ndarray, shape (X.shape[0],) – The linear predictor,
eta, ascur_eta + factor * X_dot_d.numpy.ndarray, shape (X.shape[0],) – The link-function-transformed prediction,
mu.float – The deviance.
- Parameters:
link (Link)
factor (float)
- Return type:
tuple[ndarray, ndarray, float]
- in_y_range(x)
Return
Trueifxis in the valid range of the EDM.- Return type:
ndarray
- property include_lower_bound: bool
Return whether
lower_boundis allowed as a value ofy.
- log_likelihood(y, mu, sample_weight=None, dispersion=None)
Compute the log likelihood.
For
1 < p < 2, we use the series approximation by Dunn and Smyth (2005) to compute the normalization term.- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Sample weights.
dispersion (float, optional (default=None)) – Dispersion parameter \(\phi\). Estimated if
None.
- Return type:
float
- property lower_bound: float
Get the lower bound of values for the EDM.
- property power
Return the Tweedie power parameter.
- rowwise_gradient_hessian(link, coef, dispersion, X, y, sample_weight, eta, mu, offset=None)
Compute the gradient and negative Hessian of the log likelihood row-wise.
- Returns:
numpy.ndarray, shape (X.shape[0],) – The gradient of the log likelihood, row-wise.
numpy.ndarray, shape (X.shape[0],) – The negative Hessian of the log likelihood, row-wise.
- Parameters:
link (Link)
X (MatrixBase | StandardizedMatrix)
- to_tweedie(safe=True)
Return the Tweedie representation of a distribution if it exists.
- unit_deviance(y, mu)
Compute the unit deviance.
In terms of the unit log likelihood \(\ell\), the unit deviance is \(2 [\ell(y_i, y_i, \phi) - \ell(y_i, \mu, \phi)]\), i.e. twice the difference between the log likelihood of a saturated model (with one parameter per observation) and the model at hand.
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
- unit_deviance_derivative(y, mu)
Compute the derivative of the unit deviance with respect to
mu.The derivative of the unit deviance is given by \(2 \times (\mu - y) / v(\mu)\), where \(v(\mu)\) is the unit variance.
- Parameters:
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
- Return type:
array-like, shape (n_samples,)
- unit_variance(mu)
Compute the unit variance.
The unit variance, \(v(\mu) \equiv b''((b')^{-1} (\mu))\), determines the variance as a function of the mean \(\mu\) by \(\mathrm{var}(y_i) = v(\mu_i) \times \phi / w_i\). It can also be derived from the unit deviance \(d(y, \mu)\) as
\[v(\mu) = 2 \div \frac{\partial^2 d(y, \mu)}{\partial\mu^2} \big| _{y=\mu}.\]See also
variance().- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
- unit_variance_derivative(mu)
Compute the derivative of the unit variance with respect to
mu.- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
- variance(mu, dispersion=1, sample_weight=1)
Compute the variance function.
The variance of \(Y_i \sim \mathrm{EDM}(\mu_i, \phi / w_i)\) takes the form \(v(\mu_i) \times \phi / w_i\), where \(v(\mu)\) is the unit variance and \(w_i\) are weights.
- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
dispersion (float, optional (default=1)) – Dispersion parameter \(\phi\).
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.
- Return type:
array-like, shape (n_samples,)
- variance_derivative(mu, dispersion=1, sample_weight=1)
Compute the derivative of the variance with respect to
mu.The derivative of the variance is equal to \(v(\mu_i) \times \phi / w_i\), where \(v(\mu)\) is the unit variance and \(ws_i\) are weights.
- Parameters:
mu (array-like, shape (n_samples,)) – Predicted mean.
dispersion (float, optional (default=1)) – Dispersion parameter \(\phi\).
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.
- Return type:
array-like, shape (n_samples,)
- class glum.TweedieLink(power)
Bases:
LinkThe Tweedie link function
x^(1-p)ifp≠1andlog(x)ifp=1.See the documentation of the superclass,
Link, for details.- derivative(mu)
Compute the derivative of the link function.
- Parameters:
mu (array-like, shape (n_samples,)) – Usually the (predicted) mean.
- inverse(**kwargs)
Compute the inverse link function.
The inverse link function
hgives the inverse relationship between the linear predictor,X * w, and the mean,mu ≡ E(Y), so thath(X * w) = mu.- Parameters:
lin_pred (array-like, shape (n_samples,)) – Usually the (fitted) linear predictor.
- inverse_derivative(**kwargs)
Compute the derivative of the inverse link function.
- Parameters:
lin_pred (array-like, shape (n_samples,)) – Usually the (fitted) linear predictor.
- inverse_derivative2(**kwargs)
Compute second derivative of the inverse link function.
- Parameters:
lin_pred (array-like, shape (n_samples,)) – Usually the (fitted) linear predictor.
- link(mu)
Compute the link function.
The link function
glinks the mean,mu ≡ E(Y), to the linear predictor,X * w, so thatg(mu)is equal to the linear predictor.- Parameters:
mu (array-like, shape (n_samples,)) – Usually the (predicted) mean.
- to_tweedie(safe=True)
Return the Tweedie representation of a link function if it exists.
- glum.get_link(link, family)
For the Tweedie distribution, this code follows actuarial best practices regarding link functions. Note that these links are sometimes not canonical:
identity for normal (
p = 0);no convention for
p < 0, so let’s leave it as identity;log otherwise.
- Parameters:
link (str | Link)
family (ExponentialDispersionModel)
- Return type: