A copula based representation for tailings dam failures

Laura Maria Canno Ferreira Fais; Verónica Andrea González-López; Diego Samuel Rodrigues; Rafael Rodrigues de Moraes

doi:10.1051/fopen/2020011

All issues

Volume 3 (2020)

4open, 3 (2020) 12

Full HTML

Open Access

Issue		4open Volume 3, 2020


Article Number		12
Number of page(s)		10
Section		Mathematics - Applied Mathematics
DOI		https://doi.org/10.1051/fopen/2020011
Published online		25 September 2020

4open 2020, 3, 12

Research Article

A copula based representation for tailings dam failures

Laura Maria Canno Ferreira Fais¹, Verónica Andrea González-López², Diego Samuel Rodrigues¹ and Rafael Rodrigues de Moraes²^*

¹ School of Technology, University of Campinas, Paschoal Marmo 1888, 13484-332 Limeira, S.P., Brazil
² Department of Statistics, University of Campinas, Sergio Buarque de Holanda 651, 13083-859 Campinas, S.P., Brazil

^* Corresponding author: rafael.moraes@gmx.de

Received: 16 March 2020
Accepted: 13 August 2020

Abstract

In this article, we model the dependence between dam factor and D _max, where dam factor is an indicator of risk of a tailings dam failure, which involves the height H of the tailings dam, the volume of material housed by the tailings dam V_T and the volume dispensed by the tailings dam, V_F, when the dam breaks. And, D_max is the maximum distance traveled by the material released by the tailings dam, after the collapse. With the dependence found via copula models and Bayesian estimation, given a range of dam factor, we estimate the probability of the released material to exceed a certain threshold. Since the dam factor involves the released volume V_F (unknown before the dam break), we present a naive way to estimate it using V_T and H. In this way, it is possible to estimate the dam factor of a tailings dam and with such a value to identify the probability of the tailings dam to show a D_max that exceeds a certain threshold.

Key words: Copula models / Bayesian estimation / Conditional probability

© L.M. Canno Ferreira Fais, et al., Published by EDP Sciences, 2020

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Introduction

Tailings dams are scattered throughout various regions of the world, many of them already inactive, others still operating. Some countries have turned their attention to the conditions of such reservoirs, as entire regions have suffered from the impact of tailings dams breaks. The impacts of such ruptures have been incalculable, taking lives, leaving useless areas, and polluting rivers, projecting damages miles away from the reservoir site. Despite the demand imposed by several entities, the problem of regularizing these dams is complex due to several reasons. Many reservoirs are not used and their maintenance requires specialized equipment/technicians. In other cases, it is not easy to identify those responsible for the reservoirs because they work (or worked) for companies which are no longer in operation.

Several aspects must be taken into account in establishing the reliability of a tailings dam, the construction method, its size, the region where it is located, etc. For example, when talking about the region where it is located, factors such as the proximity of watercourses or cities raises the risk of the tailings dam in case of failure. Moreover, it is very common to find entire cities standing in the vicinity of reservoirs of this type, since the mining activity that surrounds the reservoir generates jobs and as a consequence facilitates the rising of communities that are sustained mainly by this activity. As expected, the databases that can help understand the dynamics of these tailings dams break are small (few cases), or at least those from which comparable information has been collected.

A valuable database of 35 tailings dam breaks reported around the world is presented by [1]. Information such as the volume of the material in a tailings dam (V_T) could be used to determine the volume of the material to be released in a dam breaking (V_F). The height H, attained by the material contained in the dam, and the volume of the released material could be used to estimate the distance reached by the released material (D_max). Among the amounts to consider, the following two quantities are usually incorporated into the studies: (i) H × V_F (dam factor), see [2], (ii) H_f = H × $\frac{V_{F}}{V_{T}}$ $\frac{{V}_F}{{V}_T}$ × V_F (dam factor related to the fractional volume $\frac{V_{F}}{V_{T}}$ $\frac{{V}_F}{{V}_T}$ ), see [1]. In the literature there are several proposals to determine V_F and D_max. The most recent according to our knowledge is also given in [1], where the authors propose a stochastic linear relationship between log(V_F) and log(V_T), and also a stochastic linear relationship between D_max (or log(D_max)) and the dam factor version (i) or (ii).

Several aspects arise in the statistical perspective used in [1]. The first one is about the estimation of the parameters of the linear relations. A frequentist view as it is used could be compromised by the limited number of cases available in the database, since the law of large numbers (large sample size) is not feasible, which results in a lack of good properties in parameter estimators. To overcome this limitation, a Bayesian approach is then recommended. The second aspect presented in [1] is the imposition of linearity in the relationships to be estimated. This assumption is quite ambitious and, although the authors take the precaution of analyzing the goodness of the fit, the validity of the diagnoses also ends up being compromised by the size of the data set. Using the same data as [1], here we address both issues, making more flexible the relationship between V_T and V_F and also between dam factor and D_max using copula models and from a Bayesian perspective.

This article is organized as follows. Section Theoretical Background exposes the theoretical foundations of the work, with a focus on dependence models (see [3]). Inspecting the dependence relationships of the data provided in [1], the model selection procedure and the referred Bayesian parameter estimation are presented in Section Model Representation. Section Chances for Large Values of Dmax shows a dam factor-based prediction study in order to determine the distance that the released material can travel after a breakdown. In this same section, we offer a strategy to identify the reservoir’s dam factor. Finally, Section Conclusion closes the paper with some general remarks and conclusions.

Theoretical background

What really interests us is the relationship between X₁ and X₂, where X₁ is the variable V_T (or dam factor) and X₂ is the variable V_F (or D_max) corresponding to tailings dams. Figures 1a, 2a and 3a show the scatter plots between the pairs (see [1]), with the most extreme cases highlighted.

Figure 1

Scatterplot for V_T (×10⁶ m³) vs. V_F (×10⁶ m³) with (a) original observations and (b) observations transformed to [0, 1] by scaling ranks, U vs. V.

Figure 2

Scatterplot for H × V_F vs. D_max with (a) original observations and (b) observations transformed to [0, 1] by scaling ranks, U vs. V. H in meters (m), D_max in kilometres (km).

Figure 3

Scatterplot for H_f vs. D_max with (a) original observations and (b) observations transformed to [0, 1] by scaling ranks, U vs. V. H in meters (m), D_max in kilometres (km).

We approach this problem using the concept of copula, where if H is the cumulative distribution function of (X₁, X₂), being those continuous random variables, there is a function C such that for all (x, y) ∈ Image (X₁, X₂),

$H (x, y) = C (F_{1} (x), F_{2} (y)), wit h F_{1} (x) = H (x, \infty) and F_{2} (y) = H (\infty, y) .$ $H(x,y)=C({F}_1(x),{F}_2(y)),\enspace \hspace{1em}\mathrm{wit}\mathrm{h}\hspace{1em}{F}_1(x)=H(x,\mathrm{\infty })\hspace{1em}\mathrm{and}\hspace{1em}{F}_2(y)=H(\mathrm{\infty },y).$ (1)

If the function C is the 2-copula of (X₁, X₂), C(u, v) = Prob (F₁(X₁) ≤ u, F₂(X₂) ≤ v), for u, v ∈ [0,1], then, C is the joint distribution of the variables U $∶ =$ $:=$ F₁(X₁) and V $∶ =$ $:=$ F₂(X₂). The function C is the one that we want to identify. The result given by (1) – see [4] – allows the decomposition of H into three functions: the marginal distribution F₁, the marginal distribution F₂, and the copula function C. F₁ and F₂ are marginally determined, that is, it is enough to inspect the stochastic behavior of X₁ to determine F₁ and similarly, X₂ is enough to determine F₂. Already to determine C, we need both variables duly transformed by their marginal distributions. Copula functions cover all dependence types, since ∀(u, v) ∈ [0, 1]² and for all copula C(u, v),

$W (u, v) = \max {0, u + v - 1} \leq C (u, v) \leq \min {u, v} = M (u, v) .$ $W(u,v)=\mathrm{max}\{0,u+v-1\}\le C(u,v)\le \mathrm{min}\{u,v\}=M(u,v).$ (2)

(X₁, X₂) has 2-copula M (W) if and only if X₂ is a monotone nondecreasing (nonincreasing) funcion of X₁ almost surely (see [5]). This property demonstrates that for X₁ and X₂ verify a linear relationship the 2-copula of (X₁, X₂) can only be M (Spearman correlation coefficient = 1) or W (Spearman correlation coefficient = −1). In the next example, we show a family of copulas indexed by a parameter θ ∈ [1, ∞). This allows covering the cases in the spectrum considered by the result (2).

Example 2.1

Consider (u, v) ∈ [0, 1]², θ ∈ [1, ∞), the Gumbel–Hougaard copula is given by,

$C (u, v θ) ∶ = \exp (- ((- \ln (u))^{θ} + (- \ln (v))^{θ})^{\frac{1}{θ}}) .$ $C\left(u,v,\theta \right):=\mathrm{exp}(-((-\mathrm{ln}(u){)}^{\theta }+(-\mathrm{ln}(v){)}^{\theta }{)}^{\frac{1}{\theta }}).$

The Gumbel–Hougaard family is a family inside the Archimedean class (see [3]). That being, then there is a function ϕ: [0, 1] → [0, ∞] associated with this family which generates it. In this case, ϕ_θ(t) $∶ =$ $:=$ (−ln(t))^θ and the generation of C occurs since the following analytical expression is allowed:

$C (u, v | θ) = ϕ_{θ}^{- 1} (ϕ_{θ} (u) + ϕ_{θ} (v)),$ $C(u,v|\theta )={\phi }_{\theta }^{-1}({\phi }_{\theta }(u)+{\phi }_{\theta }(v)),$ (3)where $ϕ_{θ}^{- 1} (z) = \exp (- z^{\frac{1}{θ}}) .$ ${\phi }_{\theta }^{-1}(z)=\mathrm{exp}(-{z}^{\frac{1}{\theta }}).$ Note that C(u, v|θ = 1) = uv (independence between the random variables) and C(u, v|θ) → M(u, v) when θ → ∞ (positive and strict dependence between the variables). This means that the Gumbel–Hougaard family allows to model positive dependence types with a spectrum that goes from independence to perfect positive dependence, not being possible for this family the cases from independence to the copula W. Also, the Kendall’s tau coefficient is given by $τ_{θ} = \frac{θ - 1}{θ} \in [0, 1]$ ${\tau }_{\theta }=\frac{\theta -1}{\theta }\in [0,\enspace 1]$ because θ ∈ [1, ∞). So, the spectrum of τ_θ summarizes the positive dependence attained by the family. {C(·,·|θ)}_θ∈[1,∞) is a positively ordered family since, given θ, θ′ such that 1 ≤ θ ≤ θ′ < ∞, C(u, v|θ) ≤ C(u, v|θ′), ∀(u,v) ∈ [0, 1]². Features such as those mentioned for the Gumbel–Hougaard copula serve as a framework to interpret the dependence of random variables. This subject will be central in the next sections when the dependence between the variables of interest is determined.

In the application, we compare several copula models, some of them to be formulated considering the equation (3) (Archimedean copulas) with appropriate generators ϕ_θ, for which the pseudo-inverse of ϕ_θ is defined as $ϕ_{θ}^{[- 1]} (s) ∶ = ϕ_{θ}^{- 1} (s)$ ${\phi }_{\theta }^{\left[-1\right]}(s):={\phi }_{\theta }^{-1}(s)$ when 0 ≤ s ≤ ϕ_θ(0) and $ϕ_{θ}^{[- 1]} (s) ∶ = 0$ ${\phi }_{\theta }^{\left[-1\right]}(s):=0$ if ϕ_θ(0) ≤ s ≤ ∞. For instance, if ϕ_θ(t) $∶ = \frac{1}{θ}$ $:=\frac{1}{\theta }$ (t^−θ − 1), θ ∈ [−1, ∞)\{0}, the model generated by equation (3) is called Clayton family, if $ϕ_{θ} (t) ∶ = - \ln (\frac{e^{- θ t} - 1}{e^{- θ} - 1})$ ${\phi }_{\theta }(t):=-\mathrm{ln}\left(\frac{{e}^{-{\theta t}}-1}{{e}^{-\theta }-1}\right)$ , θ ∈ (−∞, ∞)\{0}, the model generated by equation (3) is called Frank family. And, if ϕ_θ(t) $∶ =$ $:=$ −ln(1 − (1 − t)^θ), θ ∈ [1, ∞), the model generated by equation (3) is called Joe family. Clayton and Frank reach the limits W and M, see equation (2). Joe’s family follows the same pattern as the Gumbel–Hougaard, only allowing cases from the independence to M. We also consider two well-known copulas belonging to the family of elliptical copulas with the shape,

$C (u, v | ρ) = ψ (ψ^{- 1} (u), ψ^{- 1} (v) | ρ), (u, v) \in [0,1]^{2},$ $C(u,v|\rho )=\psi ({\psi }^{-1}(u),\hspace{1em}{\psi }^{-1}(v)|\rho ),\hspace{1em}(u,v)\in [\mathrm{0,1}{]}^2,$ (4)for an appropriate function ψ and parameter ρ ∈ [−1, 1]. (i) The Gaussian copula given by ψ(t) $∶ =$ $:=$ Φ(t) which is the usual cumulative standard Gaussian distribution, N(0, 1) and ψ(s, t|ρ) $∶ =$ $:=$ Φ(s, t|ρ) which is the bivariate standard Gaussian distribution centered in zero, N₂(0, P) with $P = (\begin{array}{l} 1 & ρ \\ ρ & 1 \end{array})$ $\mathbf{P}=\left(\begin{array}{ll}1& \rho \\ \rho & 1\end{array}\right)$ ; (ii) the t-Student copula given by ψ(t) $∶ =$ $:=$ T_η(t) which is the cumulative of the univariate t-Student distribution with η degrees and ψ(s, t|ρ) $∶ =$ $:=$ T_η(s, t|ρ) that is the bivariate cumulative t-Student distribution with η degrees of freedom and ρ correlation.

As can be verified by the examples of copulas registered here, there is a huge diversity concerning the possibilities of definition for the relationships between X₁ and X₂. Many of these are available in statistical libraries (see for example copula package in R project [R-project],¹ that in our case is used for the modeling).

With the variety of models introduced previously, we wish to cover a considerable range of dependence types that allow us to determine the best representation of the dependence between X₁ and X₂. For comparison between the models, we will adopt a model selection criterion (see [6]).

Model representation

In order to proceed with the estimation of the model, the original observations ${(x_{1 i}, x_{2 i})}_{i = 1}^{n}$ $\{({x}_{1i},{x}_{2i}){\}}_{i=1}^n$ are replaced by their re-scaled marginal ranks to [0, 1], $u_{i} ∶ = \frac{| {j : 1 \leq j \leq n, x_{1 j} \leq x_{1 i}} |}{n + 1}$ ${u}_i:=\frac{|\{j:1\le j\le n,\mathrm{\enspace }{x}_{1j}\le {x}_{1i}\}|}{n+1}$ and $v_{i} ∶ = \frac{| {j : 1 \leq j \leq n, x_{2 j} \leq x_{2 i}} |}{n + 1},$ ${v}_i:=\frac{|\{j:1\le j\le n,\mathrm{\enspace }{x}_{2j}\le {x}_{2i}\}|}{n+1},$ i = 1, …, n, where |A| denotes the cardinal of the set A. This step has two purposes, namely to neutralize the effect arising from the (possibly) difference in measurement scale of the original observations and to mitigate the problems of evaluating the copula likelihood function in the boundaries of the region [0, 1] × [0, 1]. This latter task is done by dividing the ranks by n + 1 so that the pseudo-observations, ${u_{1 i}}_{i = 1}^{n}$ $\{{u}_{1i}{\}}_{i=1}^n$ (or ${v_{2 i}}_{i = 1}^{n})$ $\{{v}_{2i}{\}}_{i=1}^n)$ , never attain 0 or 1. Pseudo observations are nonparametric estimates of U = F₁(X₁) (or V = F₂(X₂)), furthermore, the function C is the distribution of (U, V), which leads us to infer that the dependence exposed by equation (1) is revealed when exploring the dispersion between the pseudo-observations. See the scatterplot for the three cases: (1) V_T vs. V_F, (2) H × V_F vs. D_max, (3) H_f vs. D_max, of the values U vs. V represented by the pseudo-observations ${(u_{i}, v_{i})}_{i = 1}^{n}$ $\{({u}_i,{v}_i){\}}_{i=1}^n$ , Figures 1b, 2b and 3b.

In order to guarantee some conditions required by the models, we test for stochastic independence and exchangeability. Originally proposed by [7], the independence test applied here, is based on the distance between the empirical copula and the product copula Π(u, v) = uv and is implemented using simulations executed by the indepTest() function from the copula R-package. We test H₀: U and V are independent and, H₀ is rejected with p-value < 0.001 in all the cases. Such evidence supports the search for a model that represents the dependence. A rather desirable property of the dependence is the exchangeability, a condition requested by many families of copulas including the Archimedean and the elliptical ones. A copula is exchangeable if the property C(u, v|θ) = C(v, u|θ) is verified for all (u, v) ∈ [0, 1]² and the test proposed by [8], based on the empirical copula, is implemented using simulations by the function exchTest() of the copula R-package. Then, we test if H₀: U and V are exchangeable, and H₀ is not rejected in all the cases, with p-values much greater than the generally used significance level of 5%. Such evidence supports to use the models given by Archimedean and elliptical copulas. In order to define the appropriate copula in the three situations we use the function fitCopula(), with arguments (a) copula and (b) method being (a) “claytonCopula(dim = 2)”, “frankCopula(dim = 2)”, “gumbelCopula(dim = 2)”, “joeCopula(dim = 2)”, “normalCopula(dim = 2)”, “tCopula(dim = 2)” and being (b) method = “mpl” (maximum pseudo likelihood) which is the maximum log-likelihood (MLL) method evaluated on the pseudo observations.

Given a copula C its density c is computed and the log-likelihood is given by $\ln (\prod_{i = 1}^{n} c (u_{i}, v_{i}))$ $\mathrm{ln}({\prod }_{i=1}^n c({u}_i,{v}_i))$ which is maximized in the underlying parameters to obtain $MLL (C, {(u_{i}, v_{i})}_{i = 1}^{n})$ $\mathrm{MLL}\enspace (C,\{({u}_i,{v}_i){\}}_{i=1}^n)$ , related to the model C and the set ${(u_{i}, v_{i})}_{i = 1}^{n}$ $\{({u}_i,{v}_i){\}}_{i=1}^n$ . Note that five of these models have one parameter while the t-Student copula model has two parameters, so that a penalty is applied to this model in order to promote a fair selection. We consider the Bayesian Information Criterion (BIC) for this purpose, see [6],

$BIC (C, {(u_{i}, v_{i})}_{i = 1}^{n}) ∶ = MLL (C, {(u_{i}, v_{i})}_{i = 1}^{n}) - \frac{1}{2} N \ln (n),$ $\mathrm{BIC}\enspace (C,\left\{\left({u}_i,{v}_i\right){\}}_{i=1}^n\right):=\mathrm{MLL}\enspace (C,\{({u}_i,{v}_i){\}}_{i=1}^n)-\frac{1}{2}{N}\enspace \mathrm{ln}(n),$ (5)where N is the total number of parameters of C and n = 30 is the number of observations in the dataset. The BIC takes into account the trade-off between the maximum log-likelihood, $MLL (C, {(u_{i}, v_{i})}_{i = 1}^{n})$ $\mathrm{MLL}\enspace (C,\{({u}_i,{v}_i){\}}_{i=1}^n)$ , and the penalty imposed by the second term, $\frac{1}{2} N \ln (n)$ $\frac{1}{2}{N}\enspace \mathrm{ln}(n)$ , which is done to enforce parsimony, i.e., simpler models having fewer parameters. According to the BIC, the higher the value taken by the equation (5), the better the model.

In the next subsection, we show the results produced by the selection procedure (selecting the best copulas) and also the estimation of the parameters. For the latter, we show the classical estimator as well as two Bayesian versions.

Estimation

We note that in the three cases, the two best models (copulas) are the Gaussian and the Gumbel–Hougaard (see Tabs. 1 and 2). It is clear that due to the size of the database (n = 30) the reliability of the classical parameter estimates may not be the best possible. This aspect can be addressed in a timely manner, but for now we can get some conclusions about the type of relationship in each case.

Table 1

Parameters estimated by Maximum Log-Likelihood (MLL) and BIC values for the copula between the pseudo-observations ranks(V_T) and ranks(V_F); degrees of t-Student $\hat{η}$ $\widehat{\eta }$ = 3.529 also estimated by MLL using the tCopula() function of the copula R-package.

Table 2

Parameters estimated by Maximum Log-Likelihood (MLL) and BIC values for the copula between the pseudo-observations ranks(dam factor) and ranks(D_max). Left: dam factor = H × V_F, $\hat{η}$ $\widehat{\eta }$ = 12378.75; Right: dam factor = H_f, $\hat{η}$ $\widehat{\eta }$ = 1479.85. In both cases the degrees η are estimated by MLL using the tCopula() function of the copula R-package.

The Gaussian copula confirms that assuming linearity between X₁ and X₂ could not be indicated. If the relationship between these variables were in fact linear, the Gumbel–Hougaard copula could have detected this fact by attributing a very large estimate for θ (in the direction of the M(u, v) copula), see Section Theoretical Background. For all the cases, (1) V_T vs. V_F, (2) H × V_F vs. D_max, and (3) H_f vs. D_max, the estimation of θ is moderate $\hat{θ}$ $\widehat{\theta }$ ∈ (2, 3] showing evidence of the non-applicability of the M copula.

In order to introduce a level of confiability to the parametric estimation process, we conduct a Bayesian estimation. We apply Hamiltonian Monte Carlo (HMC) simulations through the rstan R-package in two Bayesian strategies, described below and in general terms, Non-Informative (NI) and Informative (I). In Table 3, we show the results for each of the three cases and for each of the two best models pointed out by the BIC, see Tables 1 and 2. For the Non-Informative cases of the Gaussian copula, we use an uniform prior distribution for ρ, properly accommodated in the interval [−1, 1], which is the range of possibilities of the ρ parameter. The Informative cases of Gaussian copula are implemented assuming a Beta priori distribution for ρ, accommodated in [−1, 1], where its mode is the one derived from the relation between Kendall’s tau coefficient and the copula association parameter. For instance, by means of the empirical estimation of Kendall’s tau coefficient we can obtain an estimation of the parameter (see iTau() function from copula R package). The NI cases of Gumbel–Hougaard copula are implemented assuming as a prior distribution on θ an improper prior distribution (proportional to 1), while the Informative cases are implemented by assuming a Gamma distribution on θ accommodated in [1, ∞) with mode indicated by the parameter estimation derived from the iTau() function.

Table 3

Summaries of the Bayesian estimation.

While the informative scenario (I) aims to follow the trend of the data, the non-informative (NI) seeks objectivity, without risking in declaring trends. Table 3 shows convergence indicators of the simulations and a descriptive summary of the values taken by the posterior distributions of ρ (or θ) obtained by the simulations. We note that Bayesian estimates (by quadratic and multilinear loss) return values close to the classic estimates reported in Tables 1 and 2, which gives coherence to the adjustments.

In this section, we have determined the best representation of the dependence between X₁ and X₂ in each of the three cases, which happens to be given by the Gaussian copula in the first instance and by the Gumbel–Hougaard copula in the second instance. We also implemented the Bayesian estimation of the parameters involved, determining among other aspects an estimation of the dependence structure, given by $C (u, v | {\hat{ρ}}_{B})$ $C(u,v|{\widehat{\rho }}_B)$ in the case of Gaussian copula (Eq. (4)) and $C (u, v | {\hat{θ}}_{B})$ $C(u,v|{\widehat{\theta }}_B)$ for the Gumbel–Hougaard copula (Example 2.1), where ${\hat{ρ}}_{B}$ ${\widehat{\rho }}_B$ and ${\hat{θ}}_{B}$ ${\widehat{\theta }}_B$ are the Bayesian estimators given by a certain loss function.

Chances for large values of D_max

In this section, we approach the specific problem of calculating the probability that the material released by the dam will exceed a certain distance, given a specific risk represented by dam factor values. To address this issue, we use the copula determined by the BIC and the Bayesian estimate presented in Table 3. Given that the perspective via copulas translates the problem to the ranks of the observations, we first address the questions for the ranks of the observations, in this paper denoted by U and V. Now we operate with the vector (U, V) represented by the pseudo-observations ${(u_{i}, v_{i})}_{i = 1}^{n}$ $\{({u}_i,{v}_i){\}}_{i=1}^n$ coming from the paired values of dam factor and D_max transformed to [0, 1] by marginal scaling ranks. Then, for each threshold v ∈ [0, 1] and interval (a, b] ⊂ [0, 1],

$Prob (V > v | U \in (a, b]) = 1 - Prob (V \leq v | U \in (a, b]) = 1 - \frac{Prob (V \leq v, U \in (a, b])}{Prob (U \in (a, b])} = 1 - \frac{Prob (V \leq v, U \leq b) - Prob (V \leq v, U \leq a)}{Prob (U \leq b) - Prob (U \leq a)} = 1 - \frac{C (b, v)}{b - a} + \frac{C (a, v)}{b - a} .$ $\mathrm{Prob}\enspace (V>v|U\in (a,b])=1-\enspace \mathrm{Prob}\enspace (V\le v|U\in (a,b])=1-\frac{\enspace \mathrm{Prob}\enspace (V\le v,U\in (a,b])}{\enspace \mathrm{Prob}\enspace (U\in (a,b])}=1-\frac{\enspace \mathrm{Prob}\enspace (V\le v,U\le b)-\enspace \mathrm{Prob}\enspace (V\le v,U\le a)}{\enspace \mathrm{Prob}\enspace (U\le b)-\enspace \mathrm{Prob}\enspace (U\le a)}=1-\frac{C(b,v)}{b-a}+\frac{C(a,v)}{b-a}.$ (6)

Under the Gaussian copula, we estimate the probability (6) using the Bayesian estimator by quadratic loss function of ρ, say ${\hat{ρ}}_{B} .$ ${\widehat{\rho }}_B.$ In the same way, under the Gumbel–Hougaard copula, we estimate the probability (6) using the Bayesian estimator by quadratic loss function of θ, say ${\hat{θ}}_{B}$ ${\widehat{\theta }}_B$ . Then, the estimators of (6) are,

$\hat{Prob} (V > v | U \in (a, b]) = 1 - \frac{C (b, v | {\hat{ρ}}_{B})}{b - a} + \frac{C (a, v | {\hat{ρ}}_{B})}{b - a},$ $\widehat{\enspace \mathrm{Prob}\enspace }(V>v|U\in (a,b])=1-\frac{C(b,v|{\widehat{\rho }}_B)}{b-a}+\frac{C(a,v|{\widehat{\rho }}_B)}{b-a},$ (7)and,

$\hat{Prob} (V > v | U \in (a, b]) = 1 - \frac{C (b, v | {\hat{θ}}_{B})}{b - a} + \frac{C (a, v | {\hat{θ}}_{B})}{b - a},$ $\widehat{\enspace \mathrm{Prob}\enspace }(V>v|U\in (a,b])=1-\frac{C(b,v|{\widehat{\theta }}_B)}{b-a}+\frac{C(a,v|{\widehat{\theta }}_B)}{b-a},$ (8)respectively. Each one of the estimates (Eqs. (7) and (8)) presents two versions, one under the Non-Informative setting and another under the Informative setting, see Table 3. Tables 4 and 5 record the estimated values according to equations (7) and (8). Table 4 shows the estimates when the dam factor is H × V_F and Table 5 shows the estimates when the dam factor is H_f. We note that the highest probabilities occur with the estimate implemented by the Gumbel–Hougaard copula (Eq. (8)) for the intervals of the extremes (0, 0.25] and (0.75, 1], while for the central intervals, the Gaussian copula (Eq. (7)) offers the highest values. This behavior is displayed for both types of dam factor (Tabs. 4 and 5). The Gumbel–Hougaard copula is well known as having more realistic properties in extreme values (see [9]); in the copula’s framework, this means near to the values 0 or 1.

Table 4

Case H × V_Fvs. D_max (see Tab. 3). From left to right: interval (a, b], v lower threshold of V, P_i = Prob_i (V > v|U ∈ (a, b]) following equation (8), i = 1, 2 and equation (7), i = 3, 4, Non Informative settings i = 1, 3 and Informative settings i = 2, 4. In bold, the highest probability per line.

Table 5

Case H_f vs. D_max (see Tab. 3). From left to right: interval (a, b], v lower threshold of V, P_i = Prob_i (V > v|U ∈ (a, b]) following equation (8), i = 1, 2 and equation (7), i = 3, 4, Non Informative settings i = 1, 3 and Informative settings i = 2, 4. In bold, the highest probability per line.

In Tables 6 and 7, we associate the [a, b) intervals (a, b ∈ [0, 1]) with dam factor intervals in the originally observed scale. We also associate the v values with the D_max thresholds in the original scale, according to the original data. Since it is a small database of 30 observations, each interval (in Tab. 6) on the original scale corresponds to a small number of cases, on average approximately 30/4.

Table 6

Relation between the intervals (a, b] and the dam factor intervals.

Table 7

Relation between v and the D_max thresholds.

To compute probabilities in real cases, it is enough to identify on the original scale the dam factor interval associated with U ∈ (a, b] and also the threshold in V, v associated with D_max. For instance, consider the case 11 of [1], first line in Table 8. Both versions of dam factor are in the interval (66.5, 2880], which is associated with (a, b] = (0.75, 1], see Table 6. This means that, according to the estimations given by Tables 4 and 5, the probability P(V > 0.75|U ∈ (0.75, 1]) is ≥0.6. While the threshold v = 0.75 is related to D_max = 25 (see Tab. 7). This already indicates that there is a high probability of the released material traveling 25 km or more. Besides, the computation by the Gumbel–Hougaard copula offers estimates that exceed 0.7. In fact, this case has an observed D_max = 120. This simple exercise of using Tables 4 and 5 shows the ability of the copulas to identify risks in real situations, for example, if one wants to calculate the probability of the material released reaching a certain point as a city, a river, etc. The practical impact of these estimates also exposes the need of defining an estimator of the dam factor. That, according to the notions used here H × V_F and H_f = H × $\frac{V_{F}}{V_{T}}$ $\frac{{V}_F}{{V}_T}$ × V_F, implies estimating V_F.

Table 8

Special cases.

The next subsection is intended to estimate V_F, since an adequate estimate of V_F guarantees a reliable estimate of the dam factor. For such, we will take the V_T value as an available information.

Expected Value for V_F given V_T

As proved in [10], the expected value $E (V | U \in (c, d])$ $\mathbb{E}(V|U\in (c,d])$ can be computed by integrating the copula function,

$E (V | U \in (c, d]) = 1 - \frac{1}{d - c} (\int_{0}^{1} C (d, v) d v - \int_{0}^{1} C (c, v) d v) .$ $\mathbb{E}(V|U\in (c,d])=1-\frac{1}{d-c}\left({\int }_0^1 C(d,v)\mathrm{d}v-{\int }_0^1 C(c,v)\mathrm{d}v\right).$ (9)

Under the Gaussian copula we estimate the equation (9) by means of the Bayesian estimator by quadratic loss function of ρ, say ${\hat{ρ}}_{B}$ ${\widehat{\rho }}_B$ (see Tab. 3). The estimator of equation (9) is defined as,

$\hat{E} (V | U \in (c, d]) = 1 - \frac{1}{d - c} (\int_{0}^{1} C (d, v | {\hat{ρ}}_{B}) d v - \int_{0}^{1} C (c, v | {\hat{ρ}}_{B}) d v) .$ $\widehat{\mathbb{E}}(V|U\in (c,d])=1-\frac{1}{d-c}\left({\int }_0^1 C(d,v|{\widehat{\rho }}_B)\mathrm{d}v-{\int }_0^1 C(c,v|{\widehat{\rho }}_B)\mathrm{d}v\right).$ (10)

In the same way, and under the Gumbel–Hougaard copula we estimate the equation (9) using the Bayesian estimator by quadratic loss function of θ, say ${\hat{θ}}_{B}$ ${\widehat{\theta }}_B$ (see Tab. 3). Here is the estimator:

$\hat{E} (V | U \in (c, d]) = 1 - \frac{1}{d - c} (\int_{0}^{1} C (d, v | {\hat{θ}}_{B}) d v - \int_{0}^{1} C (c, v | {\hat{θ}}_{B}) d v) .$ $\widehat{\mathbb{E}}(V|U\in (c,d])=1-\frac{1}{d-c}\left({\int }_0^1 C(d,v|{\widehat{\theta }}_B)\mathrm{d}v-{\int }_0^1 C(c,v|{\widehat{\theta }}_B)\mathrm{d}v\right).$ (11)

In Table 9, we clearly visualize the effect of the copula on the magnitude of the mean values, the Gumbel–Hougaard copula offers the highest values in the extreme intervals and the Gaussian copula does it in the central intervals. As expected, with increasing the U interval (V_T scaled to [0, 1]) the mean of V increases (V_F scaled to [0, 1]).

Table 9

$E_{i}$ ${\mathbb{E}}_i$ , i = 1, 2 computed using Gumbel–Hougaard copula (Eq. (11)), $E_{i}$ ${\mathbb{E}}_i$ , i = 3, 4 computed using Gaussian copula (Eq. (10)), i = 1, 3 from Non Informative settings, i = 2, 4 from Informative settings. In brackets, the standard deviation of the expected values. In bold, the highest mean expected values per line.

Table 10

Original scale. $E_{i}$ ${\mathbb{E}}_i$ , i = 1, 2 computed using Gumbel–Hougaard copula, $E_{i}$ ${\mathbb{E}}_i$ , i = 3, 4 computed using Gaussian copula, i = 1, 3 from Non Informative settings, i = 2, 4 from Informative settings.

In Table 10, we record in the first column the ranges of V_F values corresponding to the intervals to the left of Table 9, these last ranges made in the percentages of V_F. Table 10 offers naive estimates for V_F, based on V_T intervals. For example, ${\hat{E}}_{2} (V_{F} | V_{T} \in (c, d])$ ${\widehat{\mathbb{E}}}_2({V}_F|{V}_T\in (c,d])$ is the estimator of V_F when V_T ∈ (c, d] and computed from the Gumbel–Hougaard copula with an Informative setting.

Next, we compare the intervals in which the naive estimates fall, with those reported in Table 8 and, according to Table 6. Regarding line 1 of Table 8: since V_T = 12.34, this value is associated with the interval (7.040, 74.00] of Table 10, which brings us to an average of V_F estimated as ${\hat{E}}_{i}$ ${\widehat{\mathbb{E}}}_i$ (V_F│V_T ∈ (7.04, 74]) = 32. When calculating the dam factor-(i), we obtain H × ${\hat{E}}_{i}$ ${\widehat{\mathbb{E}}}_i$ (V_F|V_T ∈ (7.04, 74]) = 15 × 32 = 480 ∈ (66.5, 2880], note that the observed value 135 (Tab. 8) is also in the interval (66.5, 2880]. Computing the dam factor-(ii), we obtain H × $\frac{{\hat{E}}_{i} (V_{F} | V_{T} \in (7.04,74])}{V_{T}}$ $\frac{{\widehat{\mathbb{E}}}_i({V}_F|{V}_T\in (\mathrm{7.04,74}])}{{V}_T}$ × ${\hat{E}}_{i}$ ${\widehat{\mathbb{E}}}_i$ (V_F|V_T ∈ (7.04, 74]) = 15 × 32/12.34 × 32 = 1244.733 ∈ (13.85, 1675.64], which is in fact the interval in which was found the observed data (98.46). In other words, the estimates and observed values are accommodated in the same risk interval, reported in Table 6. If we consider the Samarco case (second case in Tab. 8), we have that the estimation is ${\hat{E}}_{i}$ ${\widehat{\mathbb{E}}}_i$ (V_F|V_T ∈ (7.04, 74]) = 32, since the observed V_T = 55 ∈ (7.04, 74], and in fact the observed value is V_F = 32. Then, the dam factor estimates are the same as those recorded in Table 8.

Conclusion

In this article, we model the dependence between each version of the dam factor and D_max with dam factor given by the risk factor of a reservoir that depends on its dam’s height (total volume) H (V_T) and the volume of material released V_F after a collapse. D_max is given by the maximum distance reached by the dispensed material. In this way, it is possible to accurately describe the performance of the dependence relationship between each dam factor and D_max. Furthermore, using the BIC criterion we can point out the best representation, given in our case by the Gaussian copula (see Tab. 2). We were also able to delineate the dependence between the volume available from a reservoir V_T and the material’s volume V_F capable of being expelled after a collapse (see Tab. 1).

Through a Bayesian study, we offer scenarios that aim to give flexibility to the representation of the dependence. Under an assumption of lack of prior information, the NI scenario would be the one indicated (see Tab. 3). With the alliance between Bayesian tools and copula models we can determine tail probabilities of D_max values scaled to [0, 1] and conditioned at dam factor intervals (see Tabs. 4 and 5). With them, we determine risks for reservoirs, depending on the dam factor.

We also present in this article a Bayesian and non-parametric method to infer the dam factor of a reservoir (see Sect. Expected Value for VF Given VT). We show its efficiency for some available cases in the literature. Note that such a method could be improved by univariate modeling techniques in future research studies. We see that the Gaussian copula has a competitor, which is the Gumbel–Hougaard copula (Tabs. 1 and 2), which offers slightly different (and higher) probabilities for extreme events (see Tabs. 4 and 5). On the other hand, when using the Gumbel–Hougaard copula in calculating the dam factor (see Tabs. 9 and 10), it produces results comparable to those of the Gaussian copula, which shows that the notion of mean hides certain stochastic characteristics. For a complement of other aspects addressed in this problem, see [11]. We consider this work as the starting point of other approaches enriched with marginal models, since here we treat the marginal effect only from a non-parametric perspective (ranks), which may show restrictions, in the presence of few data.

Acknowledgments

R. Rodrigues de Moraes gratefully acknowledge the partial financial support provided by CAPES and CNPq with fellowships from the Master Program in Statistics of University of Campinas. Also, the authors wish to thank the two referees for their many useful comments and suggestions on an earlier draft of this paper.

¹

https://www.r-project.org.

References

Concha Larrauri P, Lall U (2018), Tailings dams failures: Updated statistical model for discharge volume and runout. Environments 5, 2, 28. [Google Scholar]
Rico M, Benito G, Diez-Herrero A (2008), Floods from tailings dam failures. J Hazard Mater 154, 1–3, 79–87. [Google Scholar]
Nelsen RB (2007), An introduction to copulas. Springer Science & Business Media. [Google Scholar]
Sklar M (1959), Fonctions de repartition à n dimensions et leurs marges. Publ Inst Statist Univ Paris 8, 229–231. [Google Scholar]
Mikusinski P, Sherwood H, Taylor MD (1991), The Fréchet bounds revisited. Real Anal Exch 17, 2, 759–764. [CrossRef] [Google Scholar]
Schwarz G (1978), Estimating the dimension of a model. Ann Stat 6, 2, 461–464. [Google Scholar]
Deheuvels P (1981), A non parametric test for independence. Publ Inst Stat Univ Paris 26, 29–50. [Google Scholar]
Genest C, Nešlehová J (2012), Tests of symmetry for bivariate copulas. Ann Inst Stat Math 64, 4, 811–834. [Google Scholar]
Genest C, Rivest LP (1989), A characterization of Gumbel’s family of extreme value distributions. Stat Prob Lett 8, 3, 207–211. [CrossRef] [Google Scholar]
González-López VA, Rodrigues de Moraes R (2020), A copula-based quantifying of the relationship between race inequality among neighbourhoods in São Paulo and age at death, 4open 3, 11. [CrossRef] [EDP Sciences] [Google Scholar]
Rodrigues de Moraes R (2020), Eventos Caudais na Prática. Modelagem Bayesiana via Cópulas (Unpublished Master’s Thesis). [Google Scholar]

Cite this article as: Canno Ferreira Fais LM, González-López VA, Rodrigues DS & Rodrigues de Moraes R 2020. A copula based representation for tailings dam failures. 4open, 3, 12

All Tables

Table 1

Parameters estimated by Maximum Log-Likelihood (MLL) and BIC values for the copula between the pseudo-observations ranks(V_T) and ranks(V_F); degrees of t-Student $\hat{η}$ $\widehat{\eta }$ = 3.529 also estimated by MLL using the tCopula() function of the copula R-package.

In the text

Table 2

Parameters estimated by Maximum Log-Likelihood (MLL) and BIC values for the copula between the pseudo-observations ranks(dam factor) and ranks(D_max). Left: dam factor = H × V_F, $\hat{η}$ $\widehat{\eta }$ = 12378.75; Right: dam factor = H_f, $\hat{η}$ $\widehat{\eta }$ = 1479.85. In both cases the degrees η are estimated by MLL using the tCopula() function of the copula R-package.

In the text

Table 3

Summaries of the Bayesian estimation.

In the text

Table 4

Case H × V_Fvs. D_max (see Tab. 3). From left to right: interval (a, b], v lower threshold of V, P_i = Prob_i (V > v|U ∈ (a, b]) following equation (8), i = 1, 2 and equation (7), i = 3, 4, Non Informative settings i = 1, 3 and Informative settings i = 2, 4. In bold, the highest probability per line.

In the text

Table 5

Case H_f vs. D_max (see Tab. 3). From left to right: interval (a, b], v lower threshold of V, P_i = Prob_i (V > v|U ∈ (a, b]) following equation (8), i = 1, 2 and equation (7), i = 3, 4, Non Informative settings i = 1, 3 and Informative settings i = 2, 4. In bold, the highest probability per line.

In the text

Table 6

Relation between the intervals (a, b] and the dam factor intervals.

In the text

Table 7

Relation between v and the D_max thresholds.

Special cases.

$E_{i}$ ${\mathbb{E}}_i$ , i = 1, 2 computed using Gumbel–Hougaard copula (Eq. (11)), $E_{i}$ ${\mathbb{E}}_i$ , i = 3, 4 computed using Gaussian copula (Eq. (10)), i = 1, 3 from Non Informative settings, i = 2, 4 from Informative settings. In brackets, the standard deviation of the expected values. In bold, the highest mean expected values per line.

In the text

Table 10

Original scale. $E_{i}$ ${\mathbb{E}}_i$ , i = 1, 2 computed using Gumbel–Hougaard copula, $E_{i}$ ${\mathbb{E}}_i$ , i = 3, 4 computed using Gaussian copula, i = 1, 3 from Non Informative settings, i = 2, 4 from Informative settings.

In the text

All Figures

	Figure 1 Scatterplot for V_T (×10⁶ m³) vs. V_F (×10⁶ m³) with (a) original observations and (b) observations transformed to [0, 1] by scaling ranks, U vs. V.
In the text

	Figure 2 Scatterplot for H × V_F vs. D_max with (a) original observations and (b) observations transformed to [0, 1] by scaling ranks, U vs. V. H in meters (m), D_max in kilometres (km).
In the text

	Figure 3 Scatterplot for H_f vs. D_max with (a) original observations and (b) observations transformed to [0, 1] by scaling ranks, U vs. V. H in meters (m), D_max in kilometres (km).
In the text

[1] Concha Larrauri P, Lall U (2018), Tailings dams failures: Updated statistical model for discharge volume and runout. Environments 5, 2, 28. [Google Scholar]

[2] Rico M, Benito G, Diez-Herrero A (2008), Floods from tailings dam failures. J Hazard Mater 154, 1–3, 79–87. [Google Scholar]

[3] Nelsen RB (2007), An introduction to copulas. Springer Science & Business Media. [Google Scholar]

[4] Sklar M (1959), Fonctions de repartition à n dimensions et leurs marges. Publ Inst Statist Univ Paris 8, 229–231. [Google Scholar]

[5] Mikusinski P, Sherwood H, Taylor MD (1991), The Fréchet bounds revisited. Real Anal Exch 17, 2, 759–764. [CrossRef] [Google Scholar]

[6] Schwarz G (1978), Estimating the dimension of a model. Ann Stat 6, 2, 461–464. [Google Scholar]

[7] Deheuvels P (1981), A non parametric test for independence. Publ Inst Stat Univ Paris 26, 29–50. [Google Scholar]

[8] Genest C, Nešlehová J (2012), Tests of symmetry for bivariate copulas. Ann Inst Stat Math 64, 4, 811–834. [Google Scholar]

[9] Genest C, Rivest LP (1989), A characterization of Gumbel’s family of extreme value distributions. Stat Prob Lett 8, 3, 207–211. [CrossRef] [Google Scholar]

[10] González-López VA, Rodrigues de Moraes R (2020), A copula-based quantifying of the relationship between race inequality among neighbourhoods in São Paulo and age at death, 4open 3, 11. [CrossRef] [EDP Sciences] [Google Scholar]

[11] Rodrigues de Moraes R (2020), Eventos Caudais na Prática. Modelagem Bayesiana via Cópulas (Unpublished Master’s Thesis). [Google Scholar]

A copula based representation for tailings dam failures

Introduction

Theoretical background

Example 2.1

Model representation

Estimation

Chances for large values of Dmax

Expected Value for VF given VT

Conclusion

Acknowledgments

References

All Tables

All Figures

Chances for large values of D_max

Expected Value for V_F given V_T