Multivariate Distributions

This SklarPy package contains many different multivariate distributions. Unlike univariate distributions, these are not wrappers of scipy objects (with the exceptions of mvt_normal and mvt_student_t).

All implemented multivariate distributions are able to be fitted to both multivariate numpy and pandas data and contain easy saving and plotting methods.

Which multivariate distributions are implemented?

Currently, the following multivariate distributions are implemented:

Multivariate Distributions

Family

Name

SklarPy Model

Normal Mixture

Normal / Gaussian

mvt_normal

Normal Mixture

Student-T

mvt_student_t

Normal Mixture

Skewed-T

mvt_skewed_t

Normal Mixture

Generalized Hyperbolic

mvt_gh

Normal Mixture

Symmetric Generalized Hyperbolic

mvt_sgh

Normal Mixture

Hyperbolic

mvt_hyperbolic

Normal Mixture

Symmetric Hyperbolic

mvt_shyperbolic

Normal Mixture

Normal-Inverse Gaussian (NIG)

mvt_nig

Normal Mixture

Symmetric Normal-Inverse Gaussian

mvt_snig

Normal Mixture

Marginal Hyperbolic

mvt_mh

Normal Mixture

Symmetric Marginal Hyperbolic

mvt_smh

Numerical

Gaussian KDE

mvt_gaussian_kde

All Normal-Mixture models use the parameterization specified by McNeil, Frey and Embrechts (2005).

PreFitContinuousMultivariate

This is the base class for all multivariate distributions. It implements the following methods and attributes:

  • logpdf (log of the probability density function)

  • pdf (probability density function)

  • cdf (cumulative distribution function)

  • mc_cdf (Monte Carlo approximation of the cumulative distribution function)

  • rvs (random variate generator / sampler)

  • likelihood (likelihood function)

  • loglikelihood (log of the likelihood function)

  • aic (Akaike Information Criterion)

  • bic (Bayesian Information Criterion)

  • marginal_pairplot (pairplot of the marginal distributions)

  • pdf_plot (plot of the probability density function)

  • cdf_plot (plot of the cumulative distribution function)

  • mc_cdf_plot (plot of the Monte Carlo approximation of the cumulative distribution function)

  • num_params (The number of parameters in the distribution)

  • num_scalar_params (The number of scalar values across all parameters in the distribution)

  • fit (fitting the distribution to data)

mc_cdf is a numerical approximation of the cumulative distribution function. This is usually necessary for distributions that do not have a closed form cumulative distribution function, as the numerical integration alternative is computationally expensive.

num_params is the number of parameter objects in the distribution, i.e. a vector / matrix is counted as 1. num_scalar_params counts the number of unique scalar values across all parameter objects.

Also note that pdf and cdf plots are only implemented for 2-dimensional distributions.

FittedContinuousMultivariate

This class is the fitted version of PreFitContinuousMultivariate’s subclasses. It implements the same methods as PreFitContinuousMultivariate, but does not require params as an argument. It also implements the following additional methods and attributes:

  • params (the fitted parameters)

  • num_variables (the number of variables the distribution is fitted too)

  • fitted_num_data_points (the number of observations used to fit the distribution)

  • converged (whether the fitting algorithm converged)

  • summary (a summary of the fitted distribution)

  • save (save the fitted distribution object)

Multivariate Example

Here we use the multivariate normal and multivariate symmetric hyperbolic distributions, though all methods and attributes are generalized:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# specifying the parameters of the multivariate normal distribution we are
# sampling from
my_mu: np.ndarray = np.array([33, 44], dtype=float)
my_corr: np.ndarray = np.array([[1, 0.7], [0.7, 1]], dtype=float)
my_sig: np.ndarray = np.array([1.3, 2.5])
my_cov: np.ndarray = np.diag(my_sig) @ my_corr @ np.diag(my_sig)
my_mvn_params: tuple = (my_mu, my_cov)

# generating multivariate random normal variables
from sklarpy.multivariate import mvt_normal

rvs: np.ndarray = mvt_normal.rvs(1000, my_mvn_params)
rvs_df: pd.DataFrame = pd.DataFrame(rvs, columns=['Wife Age', 'Husband Age'],
                                    dtype=float)

# fitting a symmetric hyperbolic dist to our generated data using
# Maximum Likelihood Estimation
from sklarpy.multivariate import mvt_shyperbolic

fitted_msh = mvt_shyperbolic.fit(rvs_df, method='mle', show_progress=True)

# printing our fitted parameters
print(fitted_msh.params.to_dict)
print(fitted_msh.params.cov)
{'chi': 6.817911964473556, 'psi': 10.0, 'loc': array([[32.99012429],
   [43.91822886]]), 'shape': array([[1.72408489, 2.27711492],
   [2.27711492, 6.27443288]])}

[[1.78702958 2.36025021]
[2.36025021 6.50350643]]

Printing a summary of our fit:

print(fitted_msh.summary())
                             summary
Distribution         mvt_shyperbolic
#Variables                         2
#Params                            4
#Scalar Params                     7
Converged                       True
Likelihood                       0.0
Log-Likelihood           -3664.49604
AIC                       7342.99208
BIC                      7377.346367
#Fitted Data Points             1000

Plotting our fitted distribution:

fitted_msh.pdf_plot(show=False)
fitted_msh.mc_cdf_plot(show=False)
fitted_msh.marginal_pairplot(show=False)
plt.show()
Symmetric Hyperbolic PDF Symmetric Hyperbolic PDF Symmetric Hyperbolic PDF

Saving our fitted parameters:

fitted_msh.params.save()

Reloading and fitting to another distribution of the same type:

from sklarpy import load

loaded_msh_params = load('mvt_shyperbolic.pickle')
param_fitted_msh = mvt_shyperbolic.fit(params=loaded_msh_params)