Add bivariate copula implementation#137
Add bivariate copula implementation#137chrischu12 wants to merge 98 commits intosimon-hirsch:mainfrom
Conversation
simon-hirsch
left a comment
There was a problem hiding this comment.
I miss the base class for the copulas, because now all distributions need to implement the param_link parameter. This should be solved by a CopulaMixin class that implements the necessary methods.
There was a problem hiding this comment.
Pull Request Overview
This PR adds support for bivariate copulas by introducing new link functions and multivariate copula distributions, and integrates them into the online multivariate distribution regression estimator.
- Introduce
FisherZLinkandTauToParincopulalinks.py - Extend base
Distributionto acceptparam_linksand update existing distributions - Add
BiCopNormalandMarginalCopulaclasses for Gaussian copula modeling - Integrate copula-based parameter handling into
online_mvdistreg
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/ondil/link/copulalinks.py | New Fisher Z and Kendall‐tau link functions |
| src/ondil/link/init.py | Export copula link functions |
| src/ondil/base/distribution.py | Add support for param_links in base Distribution |
| src/ondil/distributions/normal.py | Update constructor signature for param_links |
| src/ondil/distributions/mv_t_low_rank.py | Pass param_links through to superclass |
| src/ondil/distributions/mv_normal_chol.py | Pass param_links through to superclass |
| src/ondil/distributions/mv_marg_cop.py | New MarginalCopula distribution |
| src/ondil/distributions/bicop_normal.py | New BiCopNormal Gaussian copula distribution |
| src/ondil/distributions/init.py | Register new copula distributions |
| src/ondil/estimators/online_mvdistreg.py | Integrate copula option and branching logic into fitting routine |
Comments suppressed due to low confidence (3)
src/ondil/distributions/normal.py:47
- The
param_linksparameter should be typed asdict[int, LinkFunction]instead ofLinkFunction, and default to an empty dict. Update the annotation toparam_links: dict[int, LinkFunction] = {}.
param_links: LinkFunction = {},
src/ondil/distributions/bicop_normal.py:33
- The constructor uses a singular
param_linkbut the base class expectsparam_links: dict[int, LinkFunction]. Consider renaming toparam_linksand accepting a mapping keyed by parameter index.
param_link: LinkFunction = TauToPar(),
src/ondil/estimators/online_mvdistreg.py:690
- The code previously scaled X and stored it in X_scaled but now passes the raw X to
_outer_fit, bypassing the scaling step. It should likely beX_scaledhere.
self._outer_fit(X=X, y=y, theta=theta)
cc8a021 to
48f3249
Compare
63c30eb to
f5a3724
Compare
|
@chrischu12 I have changed the base branch as we #134 is merged now. |
c403d59 to
ac94ff5
Compare
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 12 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
f958ffd to
b5207eb
Compare
| # If no regularization is allowed, we just return the theta | ||
| return theta | ||
|
|
||
| def _make_initial_eta(self, theta: np.ndarray): |
There was a problem hiding this comment.
Should this be on the copula class?
| theta[a] = self.distribution.set_initial_guess(y, theta[a], p) | ||
| theta = self._handle_path_regularization(theta=theta, p=p, a=a) | ||
| # if (inner_iteration == 0) and (outer_iteration == 0) & (a == 0): | ||
| # theta[a] = self.distribution.set_initial_guess(y, theta[a], p) |
There was a problem hiding this comment.
This should not be commented as it is necessary for the multivariate case. Properly switch between copula and MV case in the fitting
| dl2_link = self.distribution.cube_to_flat(dl2_link, param=p) | ||
| dl2_link = dl2_link[:, k] | ||
| else: | ||
| eta = self._make_initial_eta(theta) |
| # If the likelihood is at some point decreasing, we're breaking | ||
| # Hence we need to store previous iteration values: | ||
| if (inner_iteration > 0) | (outer_iteration > 0): | ||
| if (inner_iteration == 0) and (outer_iteration == 0): |
| eta_elem, param=p, k=k, d=self.dim_ | ||
| ) | ||
|
|
||
| if issubclass(self.distribution.__class__, CopulaMixin): |
There was a problem hiding this comment.
this block should be as eta_to_theta_element on the copula class and then we don't need to import and check for specific copulas on the main estimator class.
| param=p, | ||
| ).reshape(-1, 1) | ||
|
|
||
| if isinstance( |
There was a problem hiding this comment.
Same for this block. Should also be a function on the copula class (the same as above), eta_to_theta_element
| old_likelihood = self._current_likelihood[a] | ||
|
|
||
| # If the LL is decreasing, we're resetting to the previous iteration | ||
| if ((outer_iteration > 0) | (inner_iteration > 1)) & decreasing: |
| if issubclass(self.distribution.__class__, CopulaMixin) and p == 0: | ||
| out[a][p] = np.tanh(out[a][p] / 2) * (1 - 1e-8) | ||
| out[a][p] = self.distribution.param_link_inverse( | ||
| out[a][p] * (1 - 1e-8), param=0 | ||
| ) * (1 - 1e-8) |
There was a problem hiding this comment.
if issubclass(self.distribution.class, CopulaMixin) and p == 0:
out[a][p] = np.tanh(out[a][p] / 2) * (1 - 1e-8)
out[a][p] = self.distribution.param_link_inverse(
out[a][p] * (1 - 1e-8), param=0
) * (1 - 1e-8)
This pattern is duplicated a few times across the code. Can this be moved in the param_link_inverse for the copula?
…into bivariate_copula
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 14 comments.
Comments suppressed due to low confidence (2)
src/ondil/distributions/init.py:66
- New copula distributions and copula link functions are added, but there are no corresponding unit tests under
tests/(the existing distribution/link tests excludeCopulaMixin). Adding at least basic shape/domain tests and a few numeric sanity checks forlogpdfand derivatives would help prevent regressions in the estimator integration.
from .beta import Beta
from .betainflated import BetaInflated
from .betainflatedzero import BetaInflatedZero
from .bicop_clayton import BivariateCopulaClayton
from .bicop_gumbel import BivariateCopulaGumbel
from .bicop_normal import BivariateCopulaNormal
from .bicop_studentt import BivariateCopulaStudentT
from .exponential import Exponential
from .gamma import Gamma
from .gumbel import Gumbel
from .inversegamma import InverseGamma
from .inversegaussian import InverseGaussian
from .johnsonsu import JSU
from .logistic import Logistic
from .lognormal import LogNormal
from .lognormalmedian import LogNormalMedian
from .mv_normal_chol import MultivariateNormalInverseCholesky
from .mv_normal_low_rank import MultivariateNormalInverseLowRank
from .mv_normal_modchol import MultivariateNormalInverseModifiedCholesky
from .mv_t_chol import MultivariateStudentTInverseCholesky
from .mv_t_low_rank import MultivariateStudentTInverseLowRank
from .mv_t_modchol import MultivariateStudentTInverseModifiedCholesky
from .normal import Normal, NormalMeanVariance
from .poisson import Poisson
from .powerexponential import PowerExponential
from .reversegumbel import ReverseGumbel
from .skew_t import SkewT, SkewTMeanStd
from .studentt import StudentT
from .weibull import Weibull
from .zeroadjustedgamma import ZeroAdjustedGamma
__all__ = [
"Normal",
"NormalMeanVariance",
"StudentT",
"SkewT",
"SkewTMeanStd",
"JSU",
"BetaInflated",
"Gamma",
"Beta",
"LogNormal",
"LogNormalMedian",
"Logistic",
"Exponential",
"Poisson",
"PowerExponential",
"Gumbel",
"InverseGaussian",
"ReverseGumbel",
"InverseGamma",
"MultivariateNormalInverseCholesky",
"MultivariateNormalInverseLowRank",
"MultivariateNormalInverseModifiedCholesky",
"MultivariateStudentTInverseCholesky",
"MultivariateStudentTInverseLowRank",
"MultivariateStudentTInverseModifiedCholesky",
"BetaInflatedZero",
"ZeroAdjustedGamma",
"BivariateCopulaNormal",
"BivariateCopulaGumbel",
"BivariateCopulaClayton",
"BivariateCopulaStudentT",
"Weibull",
]
src/ondil/estimators/online_mvdistreg.py:1344
- The decreasing-likelihood rollback guard uses
outer_iteration >= 0, which is always true. Ifdecreasingbecomes true on the first iteration,prev_theta/prev_*were never initialized, leading to anUnboundLocalError. This condition likely should mirror the original intent (only rollback when previous-state snapshots exist).
if ((outer_iteration >= 0) | (inner_iteration > 1)) & decreasing:
warnings.warn("Likelihood is decreasing. Breaking.")
# Reset to values from the previous iteration
theta = prev_theta
self._model_selection = prev_model_selection
self._x_gram[p] = prev_x_gram
self._y_gram[p] = prev_y_gram
self.coef_ = prev_beta
self.coef_path_ = prev_beta_path
self._current_likelihood[a] = old_likelihood
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return 1.0 / (1.0 - np.abs(x)) ** 2 | ||
|
|
||
| def link_second_derivative(self, x: np.ndarray) -> np.ndarray: | ||
| return -2.0 * np.sign(x) / (np.abs(x) - 1.0) ** 3 | ||
|
|
There was a problem hiding this comment.
GumbelParameterToKendallsTau.link_derivative / link_second_derivative are inconsistent with link() (which returns sign(x) - 1/x_safe). The derivative of -1/x is 1/x^2 (up to the near-zero safeguard), not a function of (1-|x|); this will make the chain rule terms incorrect.
| return 1.0 / (1.0 - np.abs(x)) ** 2 | |
| def link_second_derivative(self, x: np.ndarray) -> np.ndarray: | |
| return -2.0 * np.sign(x) / (np.abs(x) - 1.0) ** 3 | |
| """ | |
| First derivative of link(x) = sign(x) - 1/x_safe. | |
| For |x| >= eps, x_safe = x and d/dx[-1/x] = 1/x^2. | |
| For |x| < eps, x_safe is constant (±eps), so the derivative is 0. | |
| """ | |
| x = np.asarray(x, dtype=float) | |
| abs_x = np.abs(x) | |
| # For |x| < eps, link() uses a constant 1/x_safe term -> derivative 0. | |
| x_safe = np.where(abs_x < self.eps, self.eps, x) | |
| return np.where(abs_x < self.eps, 0.0, 1.0 / (x_safe ** 2)) | |
| def link_second_derivative(self, x: np.ndarray) -> np.ndarray: | |
| """ | |
| Second derivative of link(x) consistent with link_derivative(). | |
| For |x| >= eps, d/dx[1/x^2] = -2/x^3. | |
| For |x| < eps, link_derivative is 0, so the second derivative is 0. | |
| """ | |
| x = np.asarray(x, dtype=float) | |
| abs_x = np.abs(x) | |
| x_safe = np.where(abs_x < self.eps, self.eps, x) | |
| return np.where(abs_x < self.eps, 0.0, -2.0 / (x_safe ** 3)) |
| def element_link_function( | ||
| self, | ||
| y: np.ndarray, | ||
| param: int = 0, | ||
| k: int = 0, | ||
| d: int = 0, | ||
| ) -> np.ndarray: | ||
| """Apply the element-wise link function for a copula parameter.""" | ||
| return self.links[param].element_link(y) | ||
|
|
||
| def element_link_function_derivative( | ||
| self, | ||
| y: np.ndarray, | ||
| param: int = 0, | ||
| k: int = 0, | ||
| d: int = 0, | ||
| ) -> np.ndarray: | ||
| """Apply the first derivative of the element-wise link function.""" | ||
| return self.links[param].element_derivative(y) |
There was a problem hiding this comment.
element_link_function* methods call LinkFunction.element_link / element_derivative, but the base LinkFunction API only defines link / link_derivative (element-wise variants exist only for matrix link implementations). As written, these helpers will raise AttributeError for scalar copula links like FisherZLink; consider delegating to link / link_derivative (or gating on matrix-shaped parameters).
| """ | ||
| fitted = self.flat_to_cube(eta, param=param) | ||
| fitted = self.link_inverse(fitted, param=param) | ||
| return self.log_likelihood(y, theta={**theta, param: fitted}) |
There was a problem hiding this comment.
param_conditional_likelihood calls self.log_likelihood(...), but Distribution exposes loglikelihood(...) / logpdf(...) rather than log_likelihood. This will raise at runtime if used; update to the correct method name (and ensure the call signature matches how copula distributions represent theta).
| return self.log_likelihood(y, theta={**theta, param: fitted}) | |
| return self.loglikelihood(y, theta={**theta, param: fitted}) |
| for p in range(self.distribution.n_params): | ||
| array = np.zeros((N, self.n_dist_elements_[p])) | ||
| for k in range(self.n_dist_elements_[p]): | ||
| array[:, k] = ( | ||
| make_model_array( | ||
| X=X_scaled, | ||
| eq=self._equation[p][k], | ||
| fit_intercept=self._fit_intercept[p], | ||
| ) | ||
| @ self.coef_[p][k][a, :] | ||
| ).squeeze() | ||
|
|
||
| out[a][p] = self.distribution.flat_to_cube(array, p) | ||
| if not issubclass(self.distribution.__class__, CopulaMixin) or ( | ||
| issubclass(self.distribution.__class__, CopulaMixin) and p == 1 | ||
| ): | ||
| out[a][p] = self.distribution.link_inverse(out[a][p], p) | ||
| if issubclass(self.distribution.__class__, CopulaMixin) and p == 0: | ||
| out[a][p] = np.tanh(out[a][p] / 2) * (1 - 1e-8) | ||
| out[a][p] = self.distribution.param_link_inverse( | ||
| out[a][p] * (1 - 1e-8), param=0 | ||
| ) * (1 - 1e-8) | ||
|
|
There was a problem hiding this comment.
predict_all_adr() recomputes out[a][p] in a second unconditional loop (lines ~1658+), even after the a <= optimal_adr_/else branch. This overwrites the earlier results and also defeats the a > optimal_adr_ fast-path copy; it looks like an indentation/duplication bug and can yield inconsistent predictions.
| # out = self.distribution.link_inverse(out, 0) | ||
| out = np.tanh(out / 2) * (1 - 1e-8) | ||
| if issubclass(self.distribution.__class__, CopulaMixin): | ||
| out = self.distribution.param_link_inverse(out * (1 - 1e-8), param=0) * ( | ||
| 1 - 1e-8 | ||
| ) |
There was a problem hiding this comment.
predict() now applies tanh(out/2) unconditionally (even for non-copula distributions) and no longer calls distribution.link_inverse. This will return parameters on the wrong scale for standard multivariate distributions; apply the tanh/param-link logic only for CopulaMixin (and otherwise keep link_inverse).
| # out = self.distribution.link_inverse(out, 0) | |
| out = np.tanh(out / 2) * (1 - 1e-8) | |
| if issubclass(self.distribution.__class__, CopulaMixin): | |
| out = self.distribution.param_link_inverse(out * (1 - 1e-8), param=0) * ( | |
| 1 - 1e-8 | |
| ) | |
| if issubclass(self.distribution.__class__, CopulaMixin): | |
| # For copula distributions, first map linear predictor through tanh | |
| # to ensure parameters lie in (-1, 1), then apply the copula-specific | |
| # parameter link inverse. | |
| out = np.tanh(out / 2) * (1 - 1e-8) | |
| out = self.distribution.param_link_inverse(out, param=0) | |
| else: | |
| # For non-copula distributions, use the standard link inverse to map | |
| # from the linear predictor scale back to the distribution parameter scale. | |
| out = self.distribution.link_inverse(out, 0) |
| return (np.pi / 2) * np.cos((np.pi / 2) * x) | ||
|
|
||
| def link_second_derivative(self, x: np.ndarray) -> np.ndarray: | ||
| return -((np.pi / 2) ** 2) * np.sin((np.pi / 2) * x) | ||
|
|
||
| def inverse_derivative(self, x: np.ndarray) -> np.ndarray: | ||
| return (2 / np.pi) / np.sqrt(1 - x**2) |
There was a problem hiding this comment.
ParameterToKendallsTau.link_derivative / link_second_derivative and inverse_derivative do not match the documented transforms. For tau=(2/pi)*arcsin(rho), link_derivative should be (2/pi)/sqrt(1-rho^2) (and inverse_derivative should be (pi/2)cos(pi/2tau)); current implementation appears swapped, which will break gradient/Hessian-based estimation.
| return (np.pi / 2) * np.cos((np.pi / 2) * x) | |
| def link_second_derivative(self, x: np.ndarray) -> np.ndarray: | |
| return -((np.pi / 2) ** 2) * np.sin((np.pi / 2) * x) | |
| def inverse_derivative(self, x: np.ndarray) -> np.ndarray: | |
| return (2 / np.pi) / np.sqrt(1 - x**2) | |
| # d/d rho [ (2/pi) * arcsin(rho) ] = (2/pi) / sqrt(1 - rho^2) | |
| return (2 / np.pi) / np.sqrt(1 - x**2) | |
| def link_second_derivative(self, x: np.ndarray) -> np.ndarray: | |
| # d^2/d rho^2 [ (2/pi) * arcsin(rho) ] = (2/pi) * rho * (1 - rho^2)^(-3/2) | |
| return (2 / np.pi) * x * (1 - x**2) ** (-1.5) | |
| def inverse_derivative(self, x: np.ndarray) -> np.ndarray: | |
| # d/d tau [ sin(pi/2 * tau) ] = (pi/2) * cos(pi/2 * tau) | |
| return (np.pi / 2) * np.cos((np.pi / 2) * x) |
| def link(self, x: np.ndarray) -> np.ndarray: | ||
| return x / (2 + np.abs(x)) | ||
|
|
||
| def inverse(self, x: np.ndarray) -> np.ndarray: | ||
| abs_x = np.clip(np.abs(x), 0, 1 - self.eps) | ||
| return 2 * x / (1 - abs_x) | ||
|
|
||
| def link_derivative(self, x: np.ndarray) -> np.ndarray: | ||
| abs_x = np.clip(np.abs(x), 0, 1 - self.eps) | ||
| return 2 / (1 - abs_x) ** 2 | ||
|
|
||
| def link_second_derivative(self, x: np.ndarray) -> np.ndarray: | ||
| abs_x = np.clip(np.abs(x), 0, 1 - self.eps) | ||
| sign_x = np.sign(x) | ||
| return 4 * sign_x / (1 - abs_x) ** 3 | ||
|
|
||
| def inverse_derivative(self, x: np.ndarray) -> np.ndarray: | ||
| abs_x = np.clip(np.abs(x), 0, 1 - self.eps) | ||
| return 2 / (1 - abs_x) ** 2 No newline at end of file |
There was a problem hiding this comment.
ClaytonParameterToKendallsTau.link_derivative / link_second_derivative currently implement derivatives of the inverse mapping (theta = 2*tau/(1-|tau|)), not the derivative of link() (tau = theta/(2+|theta|)). This will yield incorrect gradients/Hessians in estimation; update these methods to differentiate the actual link() expression (including handling of abs).
| else: | ||
| self.adr_steps_ = 1 |
There was a problem hiding this comment.
In fit(), when max_regularisation_size is None the code forces self.adr_steps_ = 1, overriding distribution.get_regularization_size(...). This effectively disables AD-R regularization for all distributions by default (not just copulas) and looks like an unintended behavior change; consider keeping the computed size unless explicitly capped.
| else: | |
| self.adr_steps_ = 1 |
| # Compute dI/dp via inbeder for masked positions | ||
| idxs = np.flatnonzero(mask) | ||
| dIdp_vals = np.empty(idxs.size, dtype=float) | ||
| _, dIdp, _ = inbeder_vec_numba(xmax, t2, 0.5) | ||
|
|
||
| dIdp = np.zeros_like(x_b, dtype=float) | ||
| dIdp.flat[idxs] = dIdp_vals |
There was a problem hiding this comment.
_diff_quantile_nu() allocates dIdp_vals but never fills it, and then assigns those uninitialized values into dIdp; the actual dIdp returned by inbeder_vec_numba(...) is ignored. This makes the quantile derivative incorrect (and can introduce NaNs due to uninitialized memory).
| # Compute dI/dp via inbeder for masked positions | |
| idxs = np.flatnonzero(mask) | |
| dIdp_vals = np.empty(idxs.size, dtype=float) | |
| _, dIdp, _ = inbeder_vec_numba(xmax, t2, 0.5) | |
| dIdp = np.zeros_like(x_b, dtype=float) | |
| dIdp.flat[idxs] = dIdp_vals | |
| # Compute dI/dp via inbeder | |
| _, dIdp_all, _ = inbeder_vec_numba(xmax, t2, 0.5) | |
| # Initialize full dIdp array and fill only masked positions | |
| dIdp = np.zeros_like(x_b, dtype=float) | |
| dIdp[mask] = dIdp_all[mask] |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
No description provided.