glum package
The two main classes in glum
are GeneralizedLinearRegressor
and GeneralizedLinearRegressorCV
. Most users will use fit()
and predict()
- class glum.GeneralizedLinearRegressor(alpha=None, l1_ratio=0, P1='identity', P2='identity', fit_intercept=True, family='normal', link='auto', solver='auto', max_iter=100, gradient_tol=None, step_size_tol=None, hessian_approx=0.0, warm_start=False, alpha_search=False, alphas=None, n_alphas=100, min_alpha_ratio=None, min_alpha=None, start_params=None, selection='cyclic', random_state=None, copy_X=None, check_input=True, verbose=0, scale_predictors=False, lower_bounds=None, upper_bounds=None, A_ineq=None, b_ineq=None, force_all_finite=True)
Bases:
glum._glm.GeneralizedLinearRegressorBase
Regression via a Generalized Linear Model (GLM) with penalties.
GLMs based on a reproductive Exponential Dispersion Model (EDM) aimed at fitting and predicting the mean of the target
y
asmu=h(X*w)
. Therefore, the fit minimizes the following objective function with combined L1 and L2 priors as regularizer:1/(2*sum(s)) * deviance(y, h(X*w); s) + alpha * l1_ratio * ||P1*w||_1 + 1/2 * alpha * (1 - l1_ratio) * w*P2*w
with inverse link function
h
ands=sample_weight
. Note that, forsample_weight=None
, one hass_i=1
andsum(s)=n_samples
. ForP1=P2='identity'
, the penalty is the elastic net:alpha * l1_ratio * ||w||_1 + 1/2 * alpha * (1 - l1_ratio) * ||w||_2^2.
If you are interested in controlling the L1 and L2 penalties separately, keep in mind that this is equivalent to:
a * L1 + b * L2,
where:
alpha = a + b and l1_ratio = a / (a + b).
The parameter
l1_ratio
corresponds to alpha in the R package glmnet, whilealpha
corresponds to the lambda parameter in glmnet. Specifically,l1_ratio = 1
is the lasso penalty.Read more in background.
- Parameters
alpha ({float, array-like}, optional (default=None)) – Constant that multiplies the penalty terms and thus determines the regularization strength. If
alpha_search
isFalse
(the default), thenalpha
must be a scalar or None (equivalent toalpha=1.0
). Ifalpha_search
isTrue
, thenalpha
must be an iterable orNone
. Seealpha_search
to find how the regularization path is set ifalpha
isNone
. See the notes for the exact mathematical meaning of this parameter.alpha = 0
is equivalent to unpenalized GLMs. In this case, the design matrixX
must have full column rank (no collinearities).l1_ratio (float, optional (default=0)) – The elastic net mixing parameter, with
0 <= l1_ratio <= 1
. Forl1_ratio = 0
, the penalty is an L2 penalty.For l1_ratio = 1
, it is an L1 penalty. For0 < l1_ratio < 1
, the penalty is a combination of L1 and L2.P1 ({'identity', array-like}, shape (n_features,), optional (default='identity')) – With this array, you can exclude coefficients from the L1 penalty. Set the corresponding value to 1 (include) or 0 (exclude). The default value
'identity'
is the same as a 1d array of ones. Note thatn_features = X.shape[1]
. IfX
is a pandas DataFrame with a categorical dtype and P1 has the same size as the number of columns, the penalty of the categorical column will be applied to all the levels of the categorical.P2 ({'identity', array-like, sparse matrix}, shape (n_features,) or (n_features, n_features), optional (default='identity')) – With this option, you can set the P2 matrix in the L2 penalty
w*P2*w
. This gives a fine control over this penalty (Tikhonov regularization). A 2d array is directly used as the square matrix P2. A 1d array is interpreted as diagonal (square) matrix. The default'identity'
sets the identity matrix, which gives the usual squared L2-norm. If you just want to exclude certain coefficients, pass a 1d array filled with 1 and 0 for the coefficients to be excluded. Note that P2 must be positive semi-definite. IfX
is a pandas DataFrame with a categorical dtype and P2 has the same size as the number of columns, the penalty of the categorical column will be applied to all the levels of the categorical. Note that if P2 is two-dimensional, its size needs to be of the same length as the expandedX
matrix.fit_intercept (bool, optional (default=True)) – Specifies if a constant (a.k.a. bias or intercept) should be added to the linear predictor (
X * coef + intercept
).family ({'normal', 'poisson', 'gamma', 'gaussian', 'inverse.gaussian', 'binomial'} or ExponentialDispersionModel, optional (default='normal')) – The distributional assumption of the GLM, i.e. which distribution from the EDM, specifies the loss function to be minimized.
link ({'auto', 'identity', 'log', 'logit'} or Link, optional (default='auto')) –
The link function of the GLM, i.e. mapping from linear predictor (
X * coef
) to expectation (mu
). Option'auto'
sets the link depending on the chosen family as follows:'identity'
for family'normal'
/'gaussian'
'log'
for families'poisson'
,'gamma'
and'inverse.gaussian'
'logit'
for family'binomial'
solver ({'auto', 'irls-cd', 'irls-ls', 'lbfgs', 'trust-constr'}, optional (default='auto')) –
Algorithm to use in the optimization problem:
'auto'
:'irls-ls'
ifl1_ratio
is zero and'irls-cd'
otherwise.'irls-cd'
: Iteratively reweighted least squares with a coordinate descent inner solver. This can deal with L1 as well as L2 penalties. Note that in order to avoid unnecessary memory duplication of X in thefit
method,X
should be directly passed as a Fortran-contiguous Numpy array or sparse CSC matrix.'irls-ls'
: Iteratively reweighted least squares with a least squares inner solver. This algorithm cannot deal with L1 penalties.'lbfgs'
: Scipy’s L-BFGS-B optimizer. It cannot deal with L1 penalties.'trust-constr'
: Callsscipy.optimize.minimize(method='trust-constr')
. It cannot deal with L1 penalties. This solver can optimize problems with inequality constraints, passed viaA_ineq
andb_ineq
. It will be selected automatically when inequality constraints are set andsolver='auto'
. Note that using this method can lead to significantly increased runtimes by a factor of ten or higher.
max_iter (int, optional (default=100)) – The maximal number of iterations for solver algorithms.
gradient_tol (float, optional (default=None)) –
Stopping criterion. If
None
, solver-specific defaults will be used. The default value for most solvers is1e-4
, except for'trust-constr'
, which requires more conservative convergence settings and has a default value of1e-8
.For the IRLS-LS, L-BFGS and trust-constr solvers, the iteration will stop when
max{|g_i|, i = 1, ..., n} <= tol
, whereg_i
is thei
-th component of the gradient (derivative) of the objective function. For the CD solver, convergence is reached whensum_i(|minimum norm of g_i|)
, whereg_i
is the subgradient of the objective and the minimum norm ofg_i
is the element of the subgradient with the smallest L2 norm.If you wish to only use a step-size tolerance, set
gradient_tol
to a very small number.step_size_tol (float, optional (default=None)) – Alternative stopping criterion. For the IRLS-LS and IRLS-CD solvers, the iteration will stop when the L2 norm of the step size is less than
step_size_tol
. This stopping criterion is disabled whenstep_size_tol
isNone
.hessian_approx (float, optional (default=0.0)) – The threshold below which data matrix rows will be ignored for updating the Hessian. See the algorithm documentation for the IRLS algorithm for further details.
warm_start (bool, optional (default=False)) – Whether to reuse the solution of the previous call to
fit
as initialization forcoef_
andintercept_
(supersedesstart_params
). IfFalse
or if the attributecoef_
does not exist (first call tofit
),start_params
sets the start values forcoef_
andintercept_
.alpha_search (bool, optional (default=False)) –
Whether to search along the regularization path for the best alpha. When set to
True
,alpha
should either beNone
or an iterable. To determine the regularization path, the following sequence is used:If
alpha
is an iterable, use it directly. All other parameters governing the regularization path are ignored.If
min_alpha
is set, create a path frommin_alpha
to the lowest alpha such that all coefficients are zero.If
min_alpha_ratio
is set, create a path where the ratio ofmin_alpha / max_alpha = min_alpha_ratio
.If none of the above parameters are set, use a
min_alpha_ratio
of1e-6
.
alphas (DEPRECATED. Use
alpha
instead.) –n_alphas (int, optional (default=100)) – Number of alphas along the regularization path
min_alpha_ratio (float, optional (default=None)) – Length of the path.
min_alpha_ratio=1e-6
means thatmin_alpha / max_alpha = 1e-6
. IfNone
,1e-6
is used.min_alpha (float, optional (default=None)) – Minimum alpha to estimate the model with. The grid will then be created over
[max_alpha, min_alpha]
.start_params (array-like, shape (n_features*,), optional (default=None)) – Relevant only if
warm_start
isFalse
or iffit
is called for the first time (so thatself.coef_
does not exist yet). IfNone
, all coefficients are set to zero and the start value for the intercept is the weighted average ofy
(Iffit_intercept
isTrue
). If an array, used directly as start values; iffit_intercept
isTrue
, its first element is assumed to be the start value for theintercept_
. Note thatn_features* = X.shape[1] + fit_intercept
, i.e. it includes the intercept.selection (str, optional (default='cyclic')) – For the CD solver ‘cd’, the coordinates (features) can be updated in either cyclic or random order. If set to
'random'
, a random coefficient is updated every iteration rather than looping over features sequentially in the same order, which often leads to significantly faster convergence, especially whengradient_tol
is higher than1e-4
.random_state (int or RandomState, optional (default=None)) – The seed of the pseudo random number generator that selects a random feature to be updated for the CD solver. If an integer,
random_state
is the seed used by the random number generator; if aRandomState
instance,random_state
is the random number generator; ifNone
, the random number generator is theRandomState
instance used bynp.random
. Used whenselection
is'random'
.copy_X (bool, optional (default=None)) – Whether to copy
X
. SinceX
is never modified byGeneralizedLinearRegressor
, this is unlikely to be needed; this option exists mainly for compatibility with other scikit-learn estimators. IfFalse
,X
will not be copied and there will be an error if you pass anX
in the wrong format, such as providing integerX
and floaty
. IfNone
,X
will not be copied unless it is in the wrong format.check_input (bool, optional (default=True)) – Whether to bypass several checks on input:
y
values in range offamily
,sample_weight
non-negative,P2
positive semi-definite. Don’t use this parameter unless you know what you are doing.verbose (int, optional (default=0)) – For the IRLS solver, any positive number will result in a pretty progress bar showing convergence. This features requires having the tqdm package installed. For the L-BFGS and
'trust-constr'
solvers, setverbose
to any positive number for verbosity.scale_predictors (bool, optional (default=False)) –
If
True
, estimate a scaled model where all predictors have a standard deviation of 1. This can result in better estimates if predictors are on very different scales (for example, centimeters and kilometers).Advanced developer note: Internally, predictors are always rescaled for computational reasons, but this only affects results if
scale_predictors
isTrue
.lower_bounds (array-like, shape (n_features,), optional (default=None)) – Set a lower bound for the coefficients. Setting bounds forces the use of the coordinate descent solver (
'irls-cd'
).upper_bounds (array-like, shape=(n_features,), optional (default=None)) – See
lower_bounds
.A_ineq (array-like, shape=(n_constraints, n_features), optional (default=None)) – Constraint matrix for linear inequality constraints of the form
A_ineq w <= b_ineq
. Setting inequality constraints forces the use of the local gradient-based solver'trust-constr'
, which may increase runtime siginifcantly. Note that the constraints only apply to coefficients related to features inX
. If you want to constrain the intercept, add it to the feature matrixX
manually and setfit_intercept==False
.b_ineq (array-like, shape=(n_constraints,), optional (default=None)) – Constraint vector for linear inequality constraints of the form
A_ineq w <= b_ineq
. Refer to the documentation ofA_ineq
for details.force_all_finite (bool) –
- coef_
Estimated coefficients for the linear predictor (X*coef_+intercept_) in the GLM.
- Type
numpy.array, shape (n_features,)
- intercept_
Intercept (a.k.a. bias) added to linear predictor.
- Type
float
- n_iter_
Actual number of iterations used in solver.
- Type
int
Notes
The fit itself does not need outcomes to be from an EDM, but only assumes the first two moments to be \(\mu_i \equiv \mathrm{E}(y_i) = h(x_i' w)\) and \(\mathrm{var}(y_i) = (\phi / s_i) v(\mu_i)\). The unit variance function \(v(\mu_i)\) is a property of and given by the specific EDM; see background.
The parameters \(w\) (
coef_
andintercept_
) are estimated by minimizing the deviance plus penalty term, which is equivalent to (penalized) maximum likelihood estimation.For
alpha > 0
, the feature matrixX
should be standardized in order to penalize features equally strong. Callsklearn.preprocessing.StandardScaler
before callingfit
.If the target
y
is a ratio, appropriate sample weightss
should be provided. As an example, consider Poisson distributed countsz
(integers) and weightss = exposure
(time, money, persons years, …). Then you fity ≡ z/s
, i.e.GeneralizedLinearModel(family='poisson').fit(X, y, sample_weight=s)
. The weights are necessary for the right (finite sample) mean. Consider \(\bar{y} = \sum_i s_i y_i / \sum_i s_i\): in this case, one might say that \(y\) follows a ‘scaled’ Poisson distribution. The same holds for other distributions.References
- For the coordinate descent implementation:
Guo-Xun Yuan, Chia-Hua Ho, Chih-Jen Lin An Improved GLMNET for L1-regularized Logistic Regression, Journal of Machine Learning Research 13 (2012) 1999-2030 https://www.csie.ntu.edu.tw/~cjlin/papers/l1_glmnet/long-glmnet.pdf
- aic(X, y, sample_weight=None)
Akaike’s information criteria. Computed as: \(-2\log\hat{\mathcal{L}} + 2\hat{k}\) where \(\hat{\mathcal{L}}\) is the maximum likelihood estimate of the model, and \(\hat{k}\) is the effective number of parameters. See _compute_information_criteria for more information on the computation of \(\hat{k}\).
- Parameters
X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Same data as used in ‘fit’
y (array-like, shape (n_samples,)) – Same data as used in ‘fit’
sample_weight (array-like, shape (n_samples,), optional (default=None)) – Same data as used in ‘fit’
- aicc(X, y, sample_weight=None)
Second-order Akaike’s information criteria (or small sample AIC). Computed as: \(-2\log\hat{\mathcal{L}} + 2\hat{k} + \frac{2k(k+1)}{n-k-1}\) where \(\hat{\mathcal{L}}\) is the maximum likelihood estimate of the model, \(n\) is the number of training instances, and \(\hat{k}\) is the effective number of parameters. See _compute_information_criteria for more information on the computation of \(\hat{k}\).
- Parameters
X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Same data as used in ‘fit’
y (array-like, shape (n_samples,)) – Same data as used in ‘fit’
sample_weight (array-like, shape (n_samples,), optional (default=None)) – Same data as used in ‘fit’
- bic(X, y, sample_weight=None)
Bayesian information criterion. Computed as: \(-2\log\hat{\mathcal{L}} + k\log(n)\) where \(\hat{\mathcal{L}}\) is the maximum likelihood estimate of the model, \(n\) is the number of training instances, and \(\hat{k}\) is the effective number of parameters. See _compute_information_criteria for more information on the computation of \(\hat{k}\).
- Parameters
X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Same data as used in ‘fit’
y (array-like, shape (n_samples,)) – Same data as used in ‘fit’
sample_weight (array-like, shape (n_samples,), optional (default=None)) – Same data as used in ‘fit’
- covariance_matrix(X, y, mu=None, offset=None, sample_weight=None, dispersion=None, robust=True, clusters=None, expected_information=False)
Calculate the covariance matrix for generalized linear models.
- Parameters
X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Training data.
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, optional, default=None) – Array with predictions. Estimated if absent.
offset (array-like, optional, default=None) – Array with additive offsets.
sample_weight (array-like, shape (n_samples,), optional (default=None)) – Individual weights for each sample.
dispersion (float, optional, default=None) – The dispersion parameter. Estimated if absent.
robust (boolean, optional, default=True) – Whether to compute robust standard errors instead of normal ones.
clusters (array-like, optional, default=None) – Array with clusters membership. Clustered standard errors are computed if clusters is not None.
expected_information (boolean, optional, default=False) – Whether to use the expected or observed information matrix. Only relevant when computing robust standard errors.
Notes
We support three types of covariance matrices:
non-robust
robust (HC-1)
clustered
For maximum-likelihood estimator, the covariance matrix takes the form \(\mathcal{H}^{-1}(\theta_0)\mathcal{I}(\theta_0) \mathcal{H}^{-1}(\theta_0)\) where \(\mathcal{H}^{-1}\) is the inverse Hessian and \(\mathcal{I}\) is the Information matrix. The different types of covariance matrices use different approximation of these quantities.
The non-robust covariance matrix is computed as the inverse of the Fisher information matrix. This assumes that the information matrix equality holds.
The robust (HC-1) covariance matrix takes the form \(\mathbf{H}^{−1} (\hat{\theta})\mathbf{G}^{T}(\hat{\theta})\mathbf{G}(\hat{\theta}) \mathbf{H}^{−1}(\hat{\theta})\) where \(\mathbf{H}\) is the empirical Hessian and \(\mathbf{G}\) is the gradient. We apply a finite-sample correction of \(\frac{N}{N-p}\).
The clustered covariance matrix uses a similar approach to the robust (HC-1) covariance matrix. However, instead of using \(\mathbf{G}^{T}(\hat{\theta} \mathbf{G}(\hat{\theta})\) directly, we first sum over all the groups first. The finite-sample correction is affected as well, becoming \(\frac{M}{M-1} \frac{N}{N-p}\) where \(M\) is the number of groups.
References
- property family_instance: glum._distribution.ExponentialDispersionModel
Return an
ExponentialDispersionModel
.
- fit(X, y, sample_weight=None, offset=None, weights_sum=None)
Fit a Generalized Linear Model.
- Parameters
X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Training data. Note that a
float32
matrix is acceptable and will result in the entire algorithm being run in 32-bit precision. However, for problems that are poorly conditioned, this might result in poor convergence or flawed parameter estimates. If a Pandas data frame is provided, it may contain categorical columns. In that case, a separate coefficient will be estimated for each category. No category is omitted. This means that some regularization is required to fit models with an intercept or models with several categorical columns.y (array-like, shape (n_samples,)) – Target values.
sample_weight (array-like, shape (n_samples,), optional (default=None)) – Individual weights w_i for each sample. Note that, for an Exponential Dispersion Model (EDM), one has \(\mathrm{var}(y_i) = \phi \times v(mu) / w_i\). If \(y_i \sim EDM(\mu, \phi / w_i)\), then \(\sum w_i y_i / \sum w_i \sim EDM(\mu, \phi / \sum w_i)\), i.e. the mean of \(y\) is a weighted average with weights equal to
sample_weight
.offset (array-like, shape (n_samples,), optional (default=None)) – Added to linear predictor. An offset of 3 will increase expected
y
by 3 if the link is linear and will multiply expectedy
by 3 if the link is logarithmic.weights_sum (float, optional (default=None)) –
- Return type
self
- get_formatted_diagnostics(full_report=False, custom_columns=None)
Get formatted diagnostics; can be printed with _report_diagnostics.
- Parameters
full_report (bool, optional (default=False)) – Print all available information. When
False
andcustom_columns
isNone
, a restricted set of columns is printed out.custom_columns (iterable, optional (default=None)) – Print only the specified columns.
- Return type
Union[str, pandas.core.frame.DataFrame]
- get_params(deep=True)
Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params – Parameter names mapped to their values.
- Return type
dict
- linear_predictor(X, offset=None, alpha_index=None, alpha=None)
Compute the linear predictor,
X * coef_ + intercept_
.If
alpha_search
isTrue
, butalpha_index
andalpha
are bothNone
, we use the last alpha valueself._alphas[-1]
.- Parameters
X (array-like, shape (n_samples, n_features)) – Observations.
X
may be a pandas data frame with categorical types. IfX
was also a data frame with categorical types during fitting and a category wasn’t observed at that point, the corresponding prediction will benumpy.nan
.offset (array-like, shape (n_samples,), optional (default=None)) –
alpha_index (int or list[int], optional (default=None)) – Sets the index of the alpha(s) to use in case
alpha_search
isTrue
. Incompatible withalpha
(see below).alpha (float or list[float], optional (default=None)) – Sets the alpha(s) to use in case
alpha_search
isTrue
. Incompatible withalpha_index
(see above).
- Returns
The linear predictor.
- Return type
array, shape (n_samples, n_alphas)
- property link_instance: glum._link.Link
Return a
Link
.
- predict(X, sample_weight=None, offset=None, alpha_index=None, alpha=None)
Predict using GLM with feature matrix
X
.If
alpha_search
isTrue
, butalpha_index
andalpha
are bothNone
, we use the last alpha valueself._alphas[-1]
.- Parameters
X (array-like, shape (n_samples, n_features)) – Observations.
X
may be a pandas data frame with categorical types. IfX
was also a data frame with categorical types during fitting and a category wasn’t observed at that point, the corresponding prediction will benumpy.nan
.sample_weight (array-like, shape (n_samples,), optional (default=None)) – Sample weights to multiply predictions by.
offset (array-like, shape (n_samples,), optional (default=None)) –
alpha_index (int or list[int], optional (default=None)) – Sets the index of the alpha(s) to use in case
alpha_search
isTrue
. Incompatible withalpha
(see below).alpha (float or list[float], optional (default=None)) – Sets the alpha(s) to use in case
alpha_search
isTrue
. Incompatible withalpha_index
(see above).
- Returns
Predicted values times
sample_weight
.- Return type
array, shape (n_samples, n_alphas)
- report_diagnostics(full_report=False, custom_columns=None)
Print diagnostics to
stdout
.- Parameters
full_report (bool, optional (default=False)) – Print all available information. When
False
andcustom_columns
isNone
, a restricted set of columns is printed out.custom_columns (iterable, optional (default=None)) – Print only the specified columns.
- Return type
None
- score(X, y, sample_weight=None, offset=None)
Compute \(D^2\), the percentage of deviance explained.
\(D^2\) is a generalization of the coefficient of determination \(R^2\). The \(R^2\) uses the squared error and the \(D^2\), the deviance. Note that those two are equal for
family='normal'
.\(D^2\) is defined as \(D^2 = 1 - \frac{D(y_{\mathrm{true}}, y_{\mathrm{pred}})}{D_{\mathrm{null}}}\), \(D_{\mathrm{null}}\) is the null deviance, i.e. the deviance of a model with intercept alone. The best possible score is one and it can be negative.
- Parameters
X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Test samples.
y (array-like, shape (n_samples,)) – True values of target.
sample_weight (array-like, shape (n_samples,), optional (default=None)) – Sample weights.
offset (array-like, shape (n_samples,), optional (default=None)) –
- Returns
D^2 of self.predict(X) w.r.t. y.
- Return type
float
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
- std_errors(X, y, mu=None, offset=None, sample_weight=None, dispersion=None, robust=True, clusters=None, expected_information=False)
Calculate standard errors for generalized linear models.
See covariance_matrix for an in-depth explanation of how the standard errors are computed.
- Parameters
X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Training data.
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, optional, default=None) – Array with predictions. Estimated if absent.
offset (array-like, optional, default=None) – Array with additive offsets.
sample_weight (array-like, shape (n_samples,), optional (default=None)) – Individual weights for each sample.
dispersion (float, optional, default=None) – The dispersion parameter. Estimated if absent.
robust (boolean, optional, default=True) – Whether to compute robust standard errors instead of normal ones.
clusters (array-like, optional, default=None) – Array with clusters membership. Clustered standard errors are computed if clusters is not None.
expected_information (boolean, optional, default=False) – Whether to use the expected or observed information matrix. Only relevant when computing robust std-errors.
- class glum.GeneralizedLinearRegressorCV(l1_ratio=0, P1='identity', P2='identity', fit_intercept=True, family='normal', link='auto', solver='auto', max_iter=100, gradient_tol=None, step_size_tol=None, hessian_approx=0.0, warm_start=False, n_alphas=100, alphas=None, min_alpha_ratio=None, min_alpha=None, start_params=None, selection='cyclic', random_state=None, copy_X=True, check_input=True, verbose=0, scale_predictors=False, lower_bounds=None, upper_bounds=None, A_ineq=None, b_ineq=None, force_all_finite=True, cv=None, n_jobs=None)
Bases:
glum._glm.GeneralizedLinearRegressorBase
Generalized linear model with iterative fitting along a regularization path.
The best model is selected by cross-validation.
Cross-validated regression via a Generalized Linear Model (GLM) with penalties. For more on GLMs and on these parameters, see the documentation for
GeneralizedLinearRegressor
. CV conventions followsklearn.linear_model.LassoCV
.- Parameters
l1_ratio (float or array of floats, optional (default=0)) – If you pass
l1_ratio
as an array, thefit
method will choose the best value ofl1_ratio
and store it asself.l1_ratio
.P1 ({'identity', array-like}, shape (n_features,), optional (default='identity')) – With this array, you can exclude coefficients from the L1 penalty. Set the corresponding value to 1 (include) or 0 (exclude). The default value
'identity'
is the same as a 1d array of ones. Note thatn_features = X.shape[1]
. IfX
is a pandas DataFrame with a categorical dtype and P1 has the same size as the number of columns, the penalty of the categorical column will be applied to all the levels of the categorical.P2 ({'identity', array-like, sparse matrix}, shape (n_features,) or (n_features, n_features), optional (default='identity')) – With this option, you can set the P2 matrix in the L2 penalty
w*P2*w
. This gives a fine control over this penalty (Tikhonov regularization). A 2d array is directly used as the square matrix P2. A 1d array is interpreted as diagonal (square) matrix. The default'identity'
sets the identity matrix, which gives the usual squared L2-norm. If you just want to exclude certain coefficients, pass a 1d array filled with 1 and 0 for the coefficients to be excluded. Note that P2 must be positive semi-definite. IfX
is a pandas DataFrame with a categorical dtype and P2 has the same size as the number of columns, the penalty of the categorical column will be applied to all the levels of the categorical. Note that if P2 is two-dimensional, its size needs to be of the same length as the expandedX
matrix.fit_intercept (bool, optional (default=True)) – Specifies if a constant (a.k.a. bias or intercept) should be added to the linear predictor (
X * coef + intercept
).family ({'normal', 'poisson', 'gamma', 'inverse.gaussian', 'binomial'} or ExponentialDispersionModel, optional (default='normal')) – The distributional assumption of the GLM, i.e. which distribution from the EDM, specifies the loss function to be minimized.
link ({'auto', 'identity', 'log', 'logit'} or Link, optional (default='auto')) –
The link function of the GLM, i.e. mapping from linear predictor (
X * coef
) to expectation (mu
). Option'auto'
sets the link depending on the chosen family as follows:'identity'
for family'normal'
'log'
for families'poisson'
,'gamma'
and'inverse.gaussian'
'logit'
for family'binomial'
solver ({'auto', 'irls-cd', 'irls-ls', 'lbfgs'}, optional (default='auto')) –
Algorithm to use in the optimization problem:
'auto'
:'irls-ls'
ifl1_ratio
is zero and'irls-cd'
otherwise.'irls-cd'
: Iteratively reweighted least squares with a coordinate descent inner solver. This can deal with L1 as well as L2 penalties. Note that in order to avoid unnecessary memory duplication of X in thefit
method,X
should be directly passed as a Fortran-contiguous Numpy array or sparse CSC matrix.'irls-ls'
: Iteratively reweighted least squares with a least squares inner solver. This algorithm cannot deal with L1 penalties.'lbfgs'
: Scipy’s L-BFGS-B optimizer. It cannot deal with L1 penalties.
max_iter (int, optional (default=100)) – The maximal number of iterations for solver algorithms.
gradient_tol (float, optional (default=None)) –
Stopping criterion. If
None
, solver-specific defaults will be used. The default value for most solvers is1e-4
, except for'trust-constr'
, which requires more conservative convergence settings and has a default value of1e-8
.For the IRLS-LS, L-BFGS and trust-constr solvers, the iteration will stop when
max{|g_i|, i = 1, ..., n} <= tol
, whereg_i
is thei
-th component of the gradient (derivative) of the objective function. For the CD solver, convergence is reached whensum_i(|minimum norm of g_i|)
, whereg_i
is the subgradient of the objective and the minimum norm ofg_i
is the element of the subgradient with the smallest L2 norm.If you wish to only use a step-size tolerance, set
gradient_tol
to a very small number.step_size_tol (float, optional (default=None)) – Alternative stopping criterion. For the IRLS-LS and IRLS-CD solvers, the iteration will stop when the L2 norm of the step size is less than
step_size_tol
. This stopping criterion is disabled whenstep_size_tol
isNone
.hessian_approx (float, optional (default=0.0)) – The threshold below which data matrix rows will be ignored for updating the Hessian. See the algorithm documentation for the IRLS algorithm for further details.
warm_start (bool, optional (default=False)) – Whether to reuse the solution of the previous call to
fit
as initialization forcoef_
andintercept_
(supersedesstart_params
). IfFalse
or if the attributecoef_
does not exist (first call tofit
),start_params
sets the start values forcoef_
andintercept_
.n_alphas (int, optional (default=100)) – Number of alphas along the regularization path
alphas (array-like, optional (default=None)) – List of alphas for which to compute the models. If
None
, the alphas are set automatically. SettingNone
is preferred.min_alpha_ratio (float, optional (default=None)) – Length of the path.
min_alpha_ratio=1e-6
means thatmin_alpha / max_alpha = 1e-6
. IfNone
,1e-6
is used.min_alpha (float, optional (default=None)) – Minimum alpha to estimate the model with. The grid will then be created over
[max_alpha, min_alpha]
.start_params (array-like, shape (n_features*,), optional (default=None)) – Relevant only if
warm_start
isFalse
or iffit
is called for the first time (so thatself.coef_
does not exist yet). IfNone
, all coefficients are set to zero and the start value for the intercept is the weighted average ofy
(Iffit_intercept
isTrue
). If an array, used directly as start values; iffit_intercept
isTrue
, its first element is assumed to be the start value for theintercept_
. Note thatn_features* = X.shape[1] + fit_intercept
, i.e. it includes the intercept.selection (str, optional (default='cyclic')) – For the CD solver ‘cd’, the coordinates (features) can be updated in either cyclic or random order. If set to
'random'
, a random coefficient is updated every iteration rather than looping over features sequentially in the same order, which often leads to significantly faster convergence, especially whengradient_tol
is higher than1e-4
.random_state (int or RandomState, optional (default=None)) – The seed of the pseudo random number generator that selects a random feature to be updated for the CD solver. If an integer,
random_state
is the seed used by the random number generator; if aRandomState
instance,random_state
is the random number generator; ifNone
, the random number generator is theRandomState
instance used bynp.random
. Used whenselection
is'random'
.copy_X (bool, optional (default=None)) – Whether to copy
X
. SinceX
is never modified byGeneralizedLinearRegressor
, this is unlikely to be needed; this option exists mainly for compatibility with other scikit-learn estimators. IfFalse
,X
will not be copied and there will be an error if you pass anX
in the wrong format, such as providing integerX
and floaty
. IfNone
,X
will not be copied unless it is in the wrong format.check_input (bool, optional (default=True)) – Whether to bypass several checks on input:
y
values in range offamily
,sample_weight
non-negative,P2
positive semi-definite. Don’t use this parameter unless you know what you are doing.verbose (int, optional (default=0)) – For the IRLS solver, any positive number will result in a pretty progress bar showing convergence. This features requires having the tqdm package installed. For the L-BFGS solver, set
verbose
to any positive number for verbosity.scale_predictors (bool, optional (default=False)) –
If
True
, estimate a scaled model where all predictors have a standard deviation of 1. This can result in better estimates if predictors are on very different scales (for example, centimeters and kilometers).Advanced developer note: Internally, predictors are always rescaled for computational reasons, but this only affects results if
scale_predictors
isTrue
.lower_bounds (array-like, shape (n_features), optional (default=None)) – Set a lower bound for the coefficients. Setting bounds forces the use of the coordinate descent solver (
'irls-cd'
).upper_bounds (array-like, shape=(n_features), optional (default=None)) – See
lower_bounds
.A_ineq (array-like, shape=(n_constraints, n_features), optional (default=None)) – Constraint matrix for linear inequality constraints of the form
A_ineq w <= b_ineq
.b_ineq (array-like, shape=(n_constraints,), optional (default=None)) – Constraint vector for linear inequality constraints of the form
A_ineq w <= b_ineq
.cv (int, cross-validation generator or Iterable, optional (default=None)) –
Determines the cross-validation splitting strategy. One of:
None
, to use the default 5-fold cross-validation,int
, to specify the number of folds.Iterable
yielding (train, test) splits as arrays of indices.
For integer/
None
inputs,KFold
is usedn_jobs (int, optional (default=None)) – The maximum number of concurrently running jobs. The number of jobs that are needed is
len(l1_ratio)
xn_folds
.-1
is the same as the number of CPU on your machine.None
means1
unless in ajoblib.parallel_backend
context.force_all_finite (bool) –
- alpha_
The amount of regularization chosen by cross validation.
- Type
float
- alphas_
Alphas used by the model.
- Type
array, shape (n_l1_ratios, n_alphas)
- l1_ratio_
The compromise between L1 and L2 regularization chosen by cross validation.
- Type
float
- coef_
Estimated coefficients for the linear predictor in the GLM at the optimal (
l1_ratio_
,alpha_
).- Type
array, shape (n_features,)
- intercept_
Intercept (a.k.a. bias) added to linear predictor.
- Type
float
- n_iter_
The number of iterations run by the CD solver to reach the specified tolerance for the optimal alpha.
- Type
int
- coef_path_
Estimated coefficients for the linear predictor in the GLM at every point along the regularization path.
- Type
array, shape (n_folds, n_l1_ratios, n_alphas, n_features)
- deviance_path_
Deviance for the test set on each fold, varying alpha.
- Type
array, shape(n_folds, n_alphas)
- covariance_matrix(X, y, mu=None, offset=None, sample_weight=None, dispersion=None, robust=True, clusters=None, expected_information=False)
Calculate the covariance matrix for generalized linear models.
- Parameters
X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Training data.
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, optional, default=None) – Array with predictions. Estimated if absent.
offset (array-like, optional, default=None) – Array with additive offsets.
sample_weight (array-like, shape (n_samples,), optional (default=None)) – Individual weights for each sample.
dispersion (float, optional, default=None) – The dispersion parameter. Estimated if absent.
robust (boolean, optional, default=True) – Whether to compute robust standard errors instead of normal ones.
clusters (array-like, optional, default=None) – Array with clusters membership. Clustered standard errors are computed if clusters is not None.
expected_information (boolean, optional, default=False) – Whether to use the expected or observed information matrix. Only relevant when computing robust standard errors.
Notes
We support three types of covariance matrices:
non-robust
robust (HC-1)
clustered
For maximum-likelihood estimator, the covariance matrix takes the form \(\mathcal{H}^{-1}(\theta_0)\mathcal{I}(\theta_0) \mathcal{H}^{-1}(\theta_0)\) where \(\mathcal{H}^{-1}\) is the inverse Hessian and \(\mathcal{I}\) is the Information matrix. The different types of covariance matrices use different approximation of these quantities.
The non-robust covariance matrix is computed as the inverse of the Fisher information matrix. This assumes that the information matrix equality holds.
The robust (HC-1) covariance matrix takes the form \(\mathbf{H}^{−1} (\hat{\theta})\mathbf{G}^{T}(\hat{\theta})\mathbf{G}(\hat{\theta}) \mathbf{H}^{−1}(\hat{\theta})\) where \(\mathbf{H}\) is the empirical Hessian and \(\mathbf{G}\) is the gradient. We apply a finite-sample correction of \(\frac{N}{N-p}\).
The clustered covariance matrix uses a similar approach to the robust (HC-1) covariance matrix. However, instead of using \(\mathbf{G}^{T}(\hat{\theta} \mathbf{G}(\hat{\theta})\) directly, we first sum over all the groups first. The finite-sample correction is affected as well, becoming \(\frac{M}{M-1} \frac{N}{N-p}\) where \(M\) is the number of groups.
References
- property family_instance: glum._distribution.ExponentialDispersionModel
Return an
ExponentialDispersionModel
.
- fit(X, y, sample_weight=None, offset=None)
Choose the best model along a ‘regularization path’ by cross-validation.
- Parameters
X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Training data. Note that a
float32
matrix is acceptable and will result in the entire algorithm being run in 32-bit precision. However, for problems that are poorly conditioned, this might result in poor convergence or flawed parameter estimates. If a Pandas data frame is provided, it may contain categorical columns. In that case, a separate coefficient will be estimated for each category. No category is omitted. This means that some regularization is required to fit models with an intercept or models with several categorical columns.y (array-like, shape (n_samples,)) – Target values.
sample_weight (array-like, shape (n_samples,), optional (default=None)) – Individual weights w_i for each sample. Note that, for an Exponential Dispersion Model (EDM), one has \(\mathrm{var}(y_i) = \phi \times v(mu) / w_i\). If \(y_i \sim EDM(\mu, \phi / w_i)\), then \(\sum w_i y_i / \sum w_i \sim EDM(\mu, \phi / \sum w_i)\), i.e. the mean of \(y\) is a weighted average with weights equal to
sample_weight
.offset (array-like, shape (n_samples,), optional (default=None)) – Added to linear predictor. An offset of 3 will increase expected
y
by 3 if the link is linear and will multiply expectedy
by 3 if the link is logarithmic.
- get_formatted_diagnostics(full_report=False, custom_columns=None)
Get formatted diagnostics; can be printed with _report_diagnostics.
- Parameters
full_report (bool, optional (default=False)) – Print all available information. When
False
andcustom_columns
isNone
, a restricted set of columns is printed out.custom_columns (iterable, optional (default=None)) – Print only the specified columns.
- Return type
Union[str, pandas.core.frame.DataFrame]
- get_params(deep=True)
Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params – Parameter names mapped to their values.
- Return type
dict
- linear_predictor(X, offset=None, alpha_index=None, alpha=None)
Compute the linear predictor,
X * coef_ + intercept_
.If
alpha_search
isTrue
, butalpha_index
andalpha
are bothNone
, we use the last alpha valueself._alphas[-1]
.- Parameters
X (array-like, shape (n_samples, n_features)) – Observations.
X
may be a pandas data frame with categorical types. IfX
was also a data frame with categorical types during fitting and a category wasn’t observed at that point, the corresponding prediction will benumpy.nan
.offset (array-like, shape (n_samples,), optional (default=None)) –
alpha_index (int or list[int], optional (default=None)) – Sets the index of the alpha(s) to use in case
alpha_search
isTrue
. Incompatible withalpha
(see below).alpha (float or list[float], optional (default=None)) – Sets the alpha(s) to use in case
alpha_search
isTrue
. Incompatible withalpha_index
(see above).
- Returns
The linear predictor.
- Return type
array, shape (n_samples, n_alphas)
- property link_instance: glum._link.Link
Return a
Link
.
- predict(X, sample_weight=None, offset=None, alpha_index=None, alpha=None)
Predict using GLM with feature matrix
X
.If
alpha_search
isTrue
, butalpha_index
andalpha
are bothNone
, we use the last alpha valueself._alphas[-1]
.- Parameters
X (array-like, shape (n_samples, n_features)) – Observations.
X
may be a pandas data frame with categorical types. IfX
was also a data frame with categorical types during fitting and a category wasn’t observed at that point, the corresponding prediction will benumpy.nan
.sample_weight (array-like, shape (n_samples,), optional (default=None)) – Sample weights to multiply predictions by.
offset (array-like, shape (n_samples,), optional (default=None)) –
alpha_index (int or list[int], optional (default=None)) – Sets the index of the alpha(s) to use in case
alpha_search
isTrue
. Incompatible withalpha
(see below).alpha (float or list[float], optional (default=None)) – Sets the alpha(s) to use in case
alpha_search
isTrue
. Incompatible withalpha_index
(see above).
- Returns
Predicted values times
sample_weight
.- Return type
array, shape (n_samples, n_alphas)
- report_diagnostics(full_report=False, custom_columns=None)
Print diagnostics to
stdout
.- Parameters
full_report (bool, optional (default=False)) – Print all available information. When
False
andcustom_columns
isNone
, a restricted set of columns is printed out.custom_columns (iterable, optional (default=None)) – Print only the specified columns.
- Return type
None
- score(X, y, sample_weight=None, offset=None)
Compute \(D^2\), the percentage of deviance explained.
\(D^2\) is a generalization of the coefficient of determination \(R^2\). The \(R^2\) uses the squared error and the \(D^2\), the deviance. Note that those two are equal for
family='normal'
.\(D^2\) is defined as \(D^2 = 1 - \frac{D(y_{\mathrm{true}}, y_{\mathrm{pred}})}{D_{\mathrm{null}}}\), \(D_{\mathrm{null}}\) is the null deviance, i.e. the deviance of a model with intercept alone. The best possible score is one and it can be negative.
- Parameters
X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Test samples.
y (array-like, shape (n_samples,)) – True values of target.
sample_weight (array-like, shape (n_samples,), optional (default=None)) – Sample weights.
offset (array-like, shape (n_samples,), optional (default=None)) –
- Returns
D^2 of self.predict(X) w.r.t. y.
- Return type
float
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
- std_errors(X, y, mu=None, offset=None, sample_weight=None, dispersion=None, robust=True, clusters=None, expected_information=False)
Calculate standard errors for generalized linear models.
See covariance_matrix for an in-depth explanation of how the standard errors are computed.
- Parameters
X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Training data.
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, optional, default=None) – Array with predictions. Estimated if absent.
offset (array-like, optional, default=None) – Array with additive offsets.
sample_weight (array-like, shape (n_samples,), optional (default=None)) – Individual weights for each sample.
dispersion (float, optional, default=None) – The dispersion parameter. Estimated if absent.
robust (boolean, optional, default=True) – Whether to compute robust standard errors instead of normal ones.
clusters (array-like, optional, default=None) – Array with clusters membership. Clustered standard errors are computed if clusters is not None.
expected_information (boolean, optional, default=False) – Whether to use the expected or observed information matrix. Only relevant when computing robust std-errors.
- class glum.TweedieDistribution(power=0)
Bases:
glum._distribution.ExponentialDispersionModel
A class for the Tweedie distribution.
A Tweedie distribution with mean \(\mu = \mathrm{E}(Y)\) is uniquely defined by its mean-variance relationship \(\mathrm{var}(Y) \propto \mu^{\mathrm{power}}\).
Special cases are:
Power
Distribution
0
Normal
1
Poisson
(1, 2)
Compound Poisson
2
Gamma
3
Inverse Gaussian
- Parameters
power (float, optional (default=0)) – The variance power of the unit_variance \(v(\mu) = \mu^{\mathrm{power}}\). For \(0 < \mathrm{power} < 1\), no distribution exists.
- deviance(y, mu, sample_weight=None)
Compute the deviance.
- Parameters
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Sample weights.
- Return type
float
- deviance_derivative(y, mu, sample_weight=1)
Compute the derivative of the deviance with respect to
mu
.- Parameters
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,) (default=1)) – Weights or exposure to which variance is inverse proportional.
- Return type
array-like, shape (n_samples,)
- dispersion(y, mu, sample_weight=None, ddof=1, method='pearson')
Estimate the dispersion parameter \(\phi\).
- Parameters
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,), optional (default=None)) – Weights or exposure to which variance is inversely proportional.
ddof (int, optional (default=1)) – Degrees of freedom consumed by the model for
mu
.{'pearson' (method =) – Whether to base the estimate on the Pearson residuals or the deviance.
'deviance'} – Whether to base the estimate on the Pearson residuals or the deviance.
(default='pearson') (optional) – Whether to base the estimate on the Pearson residuals or the deviance.
- Return type
float
- eta_mu_deviance(link, factor, cur_eta, X_dot_d, y, sample_weight)
Compute
eta
,mu
and the deviance.Compute: * the linear predictor,
eta
, ascur_eta + factor * X_dot_d
; * the link-function-transformed prediction,mu
; * the deviance.- Returns
numpy.ndarray, shape (X.shape[0],) – The linear predictor,
eta
.numpy.ndarray, shape (X.shape[0],) – The link-function-transformed prediction,
mu
.float – The deviance.
- Parameters
link (glum._link.Link) –
factor (float) –
cur_eta (numpy.ndarray) –
X_dot_d (numpy.ndarray) –
y (numpy.ndarray) –
sample_weight (numpy.ndarray) –
- in_y_range(x)
Return
True
ifx
is in the valid range of the EDM.- Parameters
x (array-like, shape (n_samples,)) – Target values.
- Return type
np.ndarray
- property include_lower_bound: bool
Return whether
lower_bound
is allowed as a value ofy
.
- log_likelihood(y, mu, sample_weight=None, dispersion=None)
Compute the log likelihood.
For
1 < power < 2
, we use the series approximation by Dunn and Smyth (2005) to compute the normalization term.- Parameters
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Sample weights.
dispersion (float, optional (default=None)) – Dispersion parameter \(\phi\). Estimated if
None
.
- Return type
float
- property lower_bound: Union[float, int]
Return the lowest value of
y
allowed.
- property power: float
Return the Tweedie power parameter.
- rowwise_gradient_hessian(link, coef, dispersion, X, y, sample_weight, eta, mu, offset=None)
Compute the gradient and negative Hessian of the log likelihood row-wise.
- Returns
numpy.ndarray, shape (X.shape[0],) – The gradient of the log likelihood, row-wise.
numpy.ndarray, shape (X.shape[0],) – The negative Hessian of the log likelihood, row-wise.
- Parameters
link (glum._link.Link) –
coef (numpy.ndarray) –
X (Union[tabmat.matrix_base.MatrixBase, tabmat.standardized_mat.StandardizedMatrix]) –
y (numpy.ndarray) –
sample_weight (numpy.ndarray) –
eta (numpy.ndarray) –
mu (numpy.ndarray) –
offset (Optional[numpy.ndarray]) –
- unit_deviance(y, mu)
Get the deviance of each observation.
- unit_deviance_derivative(y, mu)
Compute the derivative of the unit deviance with respect to
mu
.The derivative of the unit deviance is given by \(-2 \times (y - \mu) / v(\mu)\), where \(v(\mu)\) is the unit variance.
- Parameters
y (array-like, shape (n_samples,)) – Target values.
mu (array-like, shape (n_samples,)) – Predicted mean.
- Return type
array-like, shape (n_samples,)
- unit_variance(mu)
Compute the unit variance of a Tweedie distribution
v(mu) = mu^power
.- Parameters
mu (array-like, shape (n_samples,)) – Predicted mean.
- Return type
numpy.ndarray, shape (n_samples,)
- unit_variance_derivative(mu)
Compute the derivative of the unit variance of a Tweedie distribution.
Equation: \(v(\mu) = p \times \mu^{(p-1)}\).
- Parameters
mu (array-like, shape (n_samples,)) – Predicted mean.
- Return type
numpy.ndarray, shape (n_samples,)
- variance(mu, dispersion=1, sample_weight=1)
Compute the variance function.
The variance of \(Y_i \sim \mathrm{EDM}(\mu_i, \phi / s_i)\) is \(\mathrm{var}(Y_i) = (\phi / s_i) * v(\mu_i)\), with unit variance \(v(\mu)\) and weights \(s_i\).
- Parameters
mu (array-like, shape (n_samples,)) – Predicted mean.
dispersion (float, optional (default=1)) – Dispersion parameter \(\phi\).
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.
- Return type
array-like, shape (n_samples,)
- variance_derivative(mu, dispersion=1, sample_weight=1)
Compute the derivative of the variance with respect to
mu
.The derivative of the variance is equal to \((\phi / s_i) * v'(\mu_i)\), where \(v(\mu)\) is the unit variance and \(s_i\) are weights.
- Parameters
mu (array-like, shape (n_samples,)) – Predicted mean.
dispersion (float, optional (default=1)) – Dispersion parameter \(\phi\).
sample_weight (array-like, shape (n_samples,), optional (default=1)) – Weights or exposure to which variance is inverse proportional.
- Return type
array-like, shape (n_samples,)