Open Access
Issue
4open
Volume 3, 2020
Article Number 12
Number of page(s) 10
Section Mathematics - Applied Mathematics
DOI https://doi.org/10.1051/fopen/2020011
Published online 25 September 2020

© L.M. Canno Ferreira Fais, et al., Published by EDP Sciences, 2020

Licence Creative CommonsThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Introduction

Tailings dams are scattered throughout various regions of the world, many of them already inactive, others still operating. Some countries have turned their attention to the conditions of such reservoirs, as entire regions have suffered from the impact of tailings dams breaks. The impacts of such ruptures have been incalculable, taking lives, leaving useless areas, and polluting rivers, projecting damages miles away from the reservoir site. Despite the demand imposed by several entities, the problem of regularizing these dams is complex due to several reasons. Many reservoirs are not used and their maintenance requires specialized equipment/technicians. In other cases, it is not easy to identify those responsible for the reservoirs because they work (or worked) for companies which are no longer in operation.

Several aspects must be taken into account in establishing the reliability of a tailings dam, the construction method, its size, the region where it is located, etc. For example, when talking about the region where it is located, factors such as the proximity of watercourses or cities raises the risk of the tailings dam in case of failure. Moreover, it is very common to find entire cities standing in the vicinity of reservoirs of this type, since the mining activity that surrounds the reservoir generates jobs and as a consequence facilitates the rising of communities that are sustained mainly by this activity. As expected, the databases that can help understand the dynamics of these tailings dams break are small (few cases), or at least those from which comparable information has been collected.

A valuable database of 35 tailings dam breaks reported around the world is presented by [1]. Information such as the volume of the material in a tailings dam (VT) could be used to determine the volume of the material to be released in a dam breaking (VF). The height H, attained by the material contained in the dam, and the volume of the released material could be used to estimate the distance reached by the released material (Dmax). Among the amounts to consider, the following two quantities are usually incorporated into the studies: (i) H × VF (dam factor), see [2], (ii) Hf = H × V F V T $ \frac{{V}_F}{{V}_T}$ × VF (dam factor related to the fractional volume V F V T $ \frac{{V}_F}{{V}_T}$), see [1]. In the literature there are several proposals to determine VF and Dmax. The most recent according to our knowledge is also given in [1], where the authors propose a stochastic linear relationship between log(VF) and log(VT), and also a stochastic linear relationship between Dmax (or log(Dmax)) and the dam factor version (i) or (ii).

Several aspects arise in the statistical perspective used in [1]. The first one is about the estimation of the parameters of the linear relations. A frequentist view as it is used could be compromised by the limited number of cases available in the database, since the law of large numbers (large sample size) is not feasible, which results in a lack of good properties in parameter estimators. To overcome this limitation, a Bayesian approach is then recommended. The second aspect presented in [1] is the imposition of linearity in the relationships to be estimated. This assumption is quite ambitious and, although the authors take the precaution of analyzing the goodness of the fit, the validity of the diagnoses also ends up being compromised by the size of the data set. Using the same data as [1], here we address both issues, making more flexible the relationship between VT and VF and also between dam factor and Dmax using copula models and from a Bayesian perspective.

This article is organized as follows. Section Theoretical Background exposes the theoretical foundations of the work, with a focus on dependence models (see [3]). Inspecting the dependence relationships of the data provided in [1], the model selection procedure and the referred Bayesian parameter estimation are presented in Section Model Representation. Section Chances for Large Values of Dmax shows a dam factor-based prediction study in order to determine the distance that the released material can travel after a breakdown. In this same section, we offer a strategy to identify the reservoir’s dam factor. Finally, Section Conclusion closes the paper with some general remarks and conclusions.

Theoretical background

What really interests us is the relationship between X1 and X2, where X1 is the variable VT (or dam factor) and X2 is the variable VF (or Dmax) corresponding to tailings dams. Figures 1a, 2a and 3a show the scatter plots between the pairs (see [1]), with the most extreme cases highlighted.

thumbnail Figure 1

Scatterplot for VT (×106 m3) vs. VF (×106 m3) with (a) original observations and (b) observations transformed to [0, 1] by scaling ranks, U vs. V.

thumbnail Figure 2

Scatterplot for H × VF vs. Dmax with (a) original observations and (b) observations transformed to [0, 1] by scaling ranks, U vs. V. H in meters (m), Dmax in kilometres (km).

thumbnail Figure 3

Scatterplot for Hf vs. Dmax with (a) original observations and (b) observations transformed to [0, 1] by scaling ranks, U vs. V. H in meters (m), Dmax in kilometres (km).

We approach this problem using the concept of copula, where if H is the cumulative distribution function of (X1, X2), being those continuous random variables, there is a function C such that for all (x, y) ∈ Image (X1, X2),

H ( x , y ) = C ( F 1 ( x ) , F 2 ( y ) ) , wit h F 1 ( x ) = H ( x , ) and F 2 ( y ) = H ( , y ) . $$ H(x,y)=C({F}_1(x),{F}_2(y)),\enspace \hspace{1em}\mathrm{wit}\mathrm{h}\hspace{1em}{F}_1(x)=H(x,\mathrm{\infty })\hspace{1em}\mathrm{and}\hspace{1em}{F}_2(y)=H(\mathrm{\infty },y). $$(1)

If the function C is the 2-copula of (X1, X2), C(u, v) = Prob (F1(X1) ≤ u, F2(X2) ≤ v), for u, v ∈ [0,1], then, C is the joint distribution of the variables U = $:=$ F1(X1) and V = $:=$ F2(X2). The function C is the one that we want to identify. The result given by (1) – see [4] – allows the decomposition of H into three functions: the marginal distribution F1, the marginal distribution F2, and the copula function C. F1 and F2 are marginally determined, that is, it is enough to inspect the stochastic behavior of X1 to determine F1 and similarly, X2 is enough to determine F2. Already to determine C, we need both variables duly transformed by their marginal distributions. Copula functions cover all dependence types, since ∀(u, v) ∈ [0, 1]2 and for all copula C(u, v),

W ( u , v ) = max { 0 , u + v - 1 } C ( u , v ) min { u , v } = M ( u , v ) . $$ W(u,v)=\mathrm{max}\{0,u+v-1\}\le C(u,v)\le \mathrm{min}\{u,v\}=M(u,v). $$(2)

(X1, X2) has 2-copula M (W) if and only if X2 is a monotone nondecreasing (nonincreasing) funcion of X1 almost surely (see [5]). This property demonstrates that for X1 and X2 verify a linear relationship the 2-copula of (X1, X2) can only be M (Spearman correlation coefficient = 1) or W (Spearman correlation coefficient = −1). In the next example, we show a family of copulas indexed by a parameter θ ∈ [1, ∞). This allows covering the cases in the spectrum considered by the result (2).

Example 2.1

Consider (u, v) ∈ [0, 1]2, θ ∈ [1, ∞), the Gumbel–Hougaard copula is given by,

C ( u , v θ ) = exp ( - ( ( - ln ( u ) ) θ + ( - ln ( v ) ) θ ) 1 θ ) . $$ C\left(u,v,\theta \right):=\mathrm{exp}(-((-\mathrm{ln}(u){)}^{\theta }+(-\mathrm{ln}(v){)}^{\theta }{)}^{\frac{1}{\theta }}). $$

The Gumbel–Hougaard family is a family inside the Archimedean class (see [3]). That being, then there is a function ϕ: [0, 1] → [0, ∞] associated with this family which generates it. In this case, ϕθ(t) = $:=$ (−ln(t))θ and the generation of C occurs since the following analytical expression is allowed:

C ( u , v | θ ) = ϕ θ - 1 ( ϕ θ ( u ) + ϕ θ ( v ) ) , $$ C(u,v|\theta )={\phi }_{\theta }^{-1}({\phi }_{\theta }(u)+{\phi }_{\theta }(v)), $$(3)where ϕ θ - 1 ( z ) = exp ( - z 1 θ ) . $ {\phi }_{\theta }^{-1}(z)=\mathrm{exp}(-{z}^{\frac{1}{\theta }}).$ Note that C(u, v|θ = 1) = uv (independence between the random variables) and C(u, v|θ) → M(u, v) when θ → ∞ (positive and strict dependence between the variables). This means that the Gumbel–Hougaard family allows to model positive dependence types with a spectrum that goes from independence to perfect positive dependence, not being possible for this family the cases from independence to the copula W. Also, the Kendall’s tau coefficient is given by τ θ = θ - 1 θ [ 0 , 1 ] $ {\tau }_{\theta }=\frac{\theta -1}{\theta }\in [0,\enspace 1]$ because θ ∈ [1, ∞). So, the spectrum of τθ summarizes the positive dependence attained by the family. {C(·,·|θ)}θ∈[1,∞) is a positively ordered family since, given θ, θ′ such that 1 ≤ θθ′ < ∞, C(u, v|θ) ≤ C(u, v|θ′), ∀(u,v) ∈ [0, 1]2. Features such as those mentioned for the Gumbel–Hougaard copula serve as a framework to interpret the dependence of random variables. This subject will be central in the next sections when the dependence between the variables of interest is determined.

In the application, we compare several copula models, some of them to be formulated considering the equation (3) (Archimedean copulas) with appropriate generators ϕθ, for which the pseudo-inverse of ϕθ is defined as ϕ θ [ - 1 ] ( s ) = ϕ θ - 1 ( s ) $ {\phi }_{\theta }^{\left[-1\right]}(s):={\phi }_{\theta }^{-1}(s)$ when 0 ≤ sϕθ(0) and ϕ θ [ - 1 ] ( s ) = 0 $ {\phi }_{\theta }^{\left[-1\right]}(s):=0$ if ϕθ(0) ≤ s ≤ ∞. For instance, if ϕθ(t) = 1 θ $:=\frac{1}{\theta }$(t−θ − 1), θ ∈ [−1, ∞)\{0}, the model generated by equation (3) is called Clayton family, if ϕ θ ( t ) = - ln ( e - θ t - 1 e - θ - 1 ) $ {\phi }_{\theta }(t):=-\mathrm{ln}\left(\frac{{e}^{-{\theta t}}-1}{{e}^{-\theta }-1}\right)$, θ ∈ (−∞, ∞)\{0}, the model generated by equation (3) is called Frank family. And, if ϕθ(t) = $:=$ −ln(1 − (1 − t)θ), θ ∈ [1, ∞), the model generated by equation (3) is called Joe family. Clayton and Frank reach the limits W and M, see equation (2). Joe’s family follows the same pattern as the Gumbel–Hougaard, only allowing cases from the independence to M. We also consider two well-known copulas belonging to the family of elliptical copulas with the shape,

C ( u , v | ρ ) = ψ ( ψ - 1 ( u ) , ψ - 1 ( v ) | ρ ) , ( u , v ) [ 0,1 ] 2 , $$ C(u,v|\rho )=\psi ({\psi }^{-1}(u),\hspace{1em}{\psi }^{-1}(v)|\rho ),\hspace{1em}(u,v)\in [\mathrm{0,1}{]}^2, $$(4)for an appropriate function ψ and parameter ρ ∈ [−1, 1]. (i) The Gaussian copula given by ψ(t) = $:=$ Φ(t) which is the usual cumulative standard Gaussian distribution, N(0, 1) and ψ(s, t|ρ) = $:=$ Φ(s, t|ρ) which is the bivariate standard Gaussian distribution centered in zero, N2(0, P) with P = ( 1 ρ ρ 1 ) $ \mathbf{P}=\left(\begin{array}{ll}1& \rho \\ \rho & 1\end{array}\right)$; (ii) the t-Student copula given by ψ(t) = $:=$ Tη(t) which is the cumulative of the univariate t-Student distribution with η degrees and ψ(s, t|ρ) = $:=$ Tη(s, t|ρ) that is the bivariate cumulative t-Student distribution with η degrees of freedom and ρ correlation.

As can be verified by the examples of copulas registered here, there is a huge diversity concerning the possibilities of definition for the relationships between X1 and X2. Many of these are available in statistical libraries (see for example copula package in R project [R-project],1 that in our case is used for the modeling).

With the variety of models introduced previously, we wish to cover a considerable range of dependence types that allow us to determine the best representation of the dependence between X1 and X2. For comparison between the models, we will adopt a model selection criterion (see [6]).

Model representation

In order to proceed with the estimation of the model, the original observations { ( x 1 i , x 2 i ) } i = 1 n $ \{({x}_{1i},{x}_{2i}){\}}_{i=1}^n$ are replaced by their re-scaled marginal ranks to [0, 1], u i = | { j : 1 j n , x 1 j x 1 i } | n + 1 $ {u}_i:=\frac{|\{j:1\le j\le n,\mathrm{\enspace }{x}_{1j}\le {x}_{1i}\}|}{n+1}$ and v i = | { j : 1 j n , x 2 j x 2 i } | n + 1 , $ {v}_i:=\frac{|\{j:1\le j\le n,\mathrm{\enspace }{x}_{2j}\le {x}_{2i}\}|}{n+1},$i = 1, …, n, where |A| denotes the cardinal of the set A. This step has two purposes, namely to neutralize the effect arising from the (possibly) difference in measurement scale of the original observations and to mitigate the problems of evaluating the copula likelihood function in the boundaries of the region [0, 1] × [0, 1]. This latter task is done by dividing the ranks by n + 1 so that the pseudo-observations, { u 1 i } i = 1 n $ \{{u}_{1i}{\}}_{i=1}^n$ (or { v 2 i } i = 1 n ) $ \{{v}_{2i}{\}}_{i=1}^n)$, never attain 0 or 1. Pseudo observations are nonparametric estimates of U = F1(X1) (or V = F2(X2)), furthermore, the function C is the distribution of (U, V), which leads us to infer that the dependence exposed by equation (1) is revealed when exploring the dispersion between the pseudo-observations. See the scatterplot for the three cases: (1) VT vs. VF, (2) H × VF vs. Dmax, (3) Hf vs. Dmax, of the values U vs. V represented by the pseudo-observations { ( u i , v i ) } i = 1 n $ \{({u}_i,{v}_i){\}}_{i=1}^n$, Figures 1b, 2b and 3b.

In order to guarantee some conditions required by the models, we test for stochastic independence and exchangeability. Originally proposed by [7], the independence test applied here, is based on the distance between the empirical copula and the product copula Π(u, v) = uv and is implemented using simulations executed by the indepTest() function from the copula R-package. We test H0: U and V are independent and, H0 is rejected with p-value < 0.001 in all the cases. Such evidence supports the search for a model that represents the dependence. A rather desirable property of the dependence is the exchangeability, a condition requested by many families of copulas including the Archimedean and the elliptical ones. A copula is exchangeable if the property C(u, v|θ) = C(v, u|θ) is verified for all (u, v) ∈ [0, 1]2 and the test proposed by [8], based on the empirical copula, is implemented using simulations by the function exchTest() of the copula R-package. Then, we test if H0: U and V are exchangeable, and H0 is not rejected in all the cases, with p-values much greater than the generally used significance level of 5%. Such evidence supports to use the models given by Archimedean and elliptical copulas. In order to define the appropriate copula in the three situations we use the function fitCopula(), with arguments (a) copula and (b) method being (a) “claytonCopula(dim = 2)”, “frankCopula(dim = 2)”, “gumbelCopula(dim = 2)”, “joeCopula(dim = 2)”, “normalCopula(dim = 2)”, “tCopula(dim = 2)” and being (b) method = “mpl” (maximum pseudo likelihood) which is the maximum log-likelihood (MLL) method evaluated on the pseudo observations.

Given a copula C its density c is computed and the log-likelihood is given by ln ( i = 1 n c ( u i , v i ) ) $ \mathrm{ln}({\prod }_{i=1}^n c({u}_i,{v}_i))$ which is maximized in the underlying parameters to obtain MLL ( C , { ( u i , v i ) } i = 1 n ) $ \mathrm{MLL}\enspace (C,\{({u}_i,{v}_i){\}}_{i=1}^n)$, related to the model C and the set { ( u i , v i ) } i = 1 n $ \{({u}_i,{v}_i){\}}_{i=1}^n$. Note that five of these models have one parameter while the t-Student copula model has two parameters, so that a penalty is applied to this model in order to promote a fair selection. We consider the Bayesian Information Criterion (BIC) for this purpose, see [6],

BIC ( C , { ( u i , v i ) } i = 1 n ) = MLL ( C , { ( u i , v i ) } i = 1 n ) - 1 2 N ln ( n ) , $$ \mathrm{BIC}\enspace (C,\left\{\left({u}_i,{v}_i\right){\}}_{i=1}^n\right):=\mathrm{MLL}\enspace (C,\{({u}_i,{v}_i){\}}_{i=1}^n)-\frac{1}{2}{N}\enspace \mathrm{ln}(n), $$(5)where N is the total number of parameters of C and n = 30 is the number of observations in the dataset. The BIC takes into account the trade-off between the maximum log-likelihood, MLL ( C , { ( u i , v i ) } i = 1 n ) $ \mathrm{MLL}\enspace (C,\{({u}_i,{v}_i){\}}_{i=1}^n)$, and the penalty imposed by the second term, 1 2 N ln ( n ) $ \frac{1}{2}{N}\enspace \mathrm{ln}(n)$, which is done to enforce parsimony, i.e., simpler models having fewer parameters. According to the BIC, the higher the value taken by the equation (5), the better the model.

In the next subsection, we show the results produced by the selection procedure (selecting the best copulas) and also the estimation of the parameters. For the latter, we show the classical estimator as well as two Bayesian versions.

Estimation

We note that in the three cases, the two best models (copulas) are the Gaussian and the Gumbel–Hougaard (see Tabs. 1 and 2). It is clear that due to the size of the database (n = 30) the reliability of the classical parameter estimates may not be the best possible. This aspect can be addressed in a timely manner, but for now we can get some conclusions about the type of relationship in each case.

Table 1

Parameters estimated by Maximum Log-Likelihood (MLL) and BIC values for the copula between the pseudo-observations ranks(VT) and ranks(VF); degrees of t-Student η ̂ $ \widehat{\eta }$ = 3.529 also estimated by MLL using the tCopula() function of the copula R-package.

Table 2

Parameters estimated by Maximum Log-Likelihood (MLL) and BIC values for the copula between the pseudo-observations ranks(dam factor) and ranks(Dmax). Left: dam factor = H × VF, η ̂ $ \widehat{\eta }$ = 12378.75; Right: dam factor = Hf, η ̂ $ \widehat{\eta }$ = 1479.85. In both cases the degrees η are estimated by MLL using the tCopula() function of the copula R-package.

The Gaussian copula confirms that assuming linearity between X1 and X2 could not be indicated. If the relationship between these variables were in fact linear, the Gumbel–Hougaard copula could have detected this fact by attributing a very large estimate for θ (in the direction of the M(u, v) copula), see Section Theoretical Background. For all the cases, (1) VT vs. VF, (2) H × VF vs. Dmax, and (3) Hf vs. Dmax, the estimation of θ is moderate θ ̂ $ \widehat{\theta }$ ∈ (2, 3] showing evidence of the non-applicability of the M copula.

In order to introduce a level of confiability to the parametric estimation process, we conduct a Bayesian estimation. We apply Hamiltonian Monte Carlo (HMC) simulations through the rstan R-package in two Bayesian strategies, described below and in general terms, Non-Informative (NI) and Informative (I). In Table 3, we show the results for each of the three cases and for each of the two best models pointed out by the BIC, see Tables 1 and 2. For the Non-Informative cases of the Gaussian copula, we use an uniform prior distribution for ρ, properly accommodated in the interval [−1, 1], which is the range of possibilities of the ρ parameter. The Informative cases of Gaussian copula are implemented assuming a Beta priori distribution for ρ, accommodated in [−1, 1], where its mode is the one derived from the relation between Kendall’s tau coefficient and the copula association parameter. For instance, by means of the empirical estimation of Kendall’s tau coefficient we can obtain an estimation of the parameter (see iTau() function from copula R package). The NI cases of Gumbel–Hougaard copula are implemented assuming as a prior distribution on θ an improper prior distribution (proportional to 1), while the Informative cases are implemented by assuming a Gamma distribution on θ accommodated in [1, ∞) with mode indicated by the parameter estimation derived from the iTau() function.

Table 3

Summaries of the Bayesian estimation.

While the informative scenario (I) aims to follow the trend of the data, the non-informative (NI) seeks objectivity, without risking in declaring trends. Table 3 shows convergence indicators of the simulations and a descriptive summary of the values taken by the posterior distributions of ρ (or θ) obtained by the simulations. We note that Bayesian estimates (by quadratic and multilinear loss) return values close to the classic estimates reported in Tables 1 and 2, which gives coherence to the adjustments.

In this section, we have determined the best representation of the dependence between X1 and X2 in each of the three cases, which happens to be given by the Gaussian copula in the first instance and by the Gumbel–Hougaard copula in the second instance. We also implemented the Bayesian estimation of the parameters involved, determining among other aspects an estimation of the dependence structure, given by C ( u , v | ρ ̂ B ) $ C(u,v|{\widehat{\rho }}_B)$ in the case of Gaussian copula (Eq. (4)) and C ( u , v | θ ̂ B ) $ C(u,v|{\widehat{\theta }}_B)$ for the Gumbel–Hougaard copula (Example 2.1), where ρ ̂ B $ {\widehat{\rho }}_B$ and θ ̂ B $ {\widehat{\theta }}_B$ are the Bayesian estimators given by a certain loss function.

Chances for large values of Dmax

In this section, we approach the specific problem of calculating the probability that the material released by the dam will exceed a certain distance, given a specific risk represented by dam factor values. To address this issue, we use the copula determined by the BIC and the Bayesian estimate presented in Table 3. Given that the perspective via copulas translates the problem to the ranks of the observations, we first address the questions for the ranks of the observations, in this paper denoted by U and V. Now we operate with the vector (U, V) represented by the pseudo-observations { ( u i , v i ) } i = 1 n $ \{({u}_i,{v}_i){\}}_{i=1}^n$ coming from the paired values of dam factor and Dmax transformed to [0, 1] by marginal scaling ranks. Then, for each threshold v ∈ [0, 1] and interval (a, b] ⊂ [0, 1],

Prob ( V > v | U ( a , b ] ) = 1 - Prob ( V v | U ( a , b ] ) = 1 - Prob ( V v , U ( a , b ] ) Prob ( U ( a , b ] ) = 1 - Prob ( V v , U b ) - Prob ( V v , U a ) Prob ( U b ) - Prob ( U a ) = 1 - C ( b , v ) b - a + C ( a , v ) b - a . $$ \mathrm{Prob}\enspace (V>v|U\in (a,b])=1-\enspace \mathrm{Prob}\enspace (V\le v|U\in (a,b])=1-\frac{\enspace \mathrm{Prob}\enspace (V\le v,U\in (a,b])}{\enspace \mathrm{Prob}\enspace (U\in (a,b])}=1-\frac{\enspace \mathrm{Prob}\enspace (V\le v,U\le b)-\enspace \mathrm{Prob}\enspace (V\le v,U\le a)}{\enspace \mathrm{Prob}\enspace (U\le b)-\enspace \mathrm{Prob}\enspace (U\le a)}=1-\frac{C(b,v)}{b-a}+\frac{C(a,v)}{b-a}. $$(6)

Under the Gaussian copula, we estimate the probability (6) using the Bayesian estimator by quadratic loss function of ρ, say ρ ̂ B . $ {\widehat{\rho }}_B.$ In the same way, under the Gumbel–Hougaard copula, we estimate the probability (6) using the Bayesian estimator by quadratic loss function of θ, say θ ̂ B $ {\widehat{\theta }}_B$. Then, the estimators of (6) are,

Prob ̂ ( V > v | U ( a , b ] ) = 1 - C ( b , v | ρ ̂ B ) b - a + C ( a , v | ρ ̂ B ) b - a , $$ \widehat{\enspace \mathrm{Prob}\enspace }(V>v|U\in (a,b])=1-\frac{C(b,v|{\widehat{\rho }}_B)}{b-a}+\frac{C(a,v|{\widehat{\rho }}_B)}{b-a}, $$(7)and,

Prob ̂ ( V > v | U ( a , b ] ) = 1 - C ( b , v | θ ̂ B ) b - a + C ( a , v | θ ̂ B ) b - a , $$ \widehat{\enspace \mathrm{Prob}\enspace }(V>v|U\in (a,b])=1-\frac{C(b,v|{\widehat{\theta }}_B)}{b-a}+\frac{C(a,v|{\widehat{\theta }}_B)}{b-a}, $$(8)respectively. Each one of the estimates (Eqs. (7) and (8)) presents two versions, one under the Non-Informative setting and another under the Informative setting, see Table 3. Tables 4 and 5 record the estimated values according to equations (7) and (8). Table 4 shows the estimates when the dam factor is H × VF and Table 5 shows the estimates when the dam factor is Hf. We note that the highest probabilities occur with the estimate implemented by the Gumbel–Hougaard copula (Eq. (8)) for the intervals of the extremes (0, 0.25] and (0.75, 1], while for the central intervals, the Gaussian copula (Eq. (7)) offers the highest values. This behavior is displayed for both types of dam factor (Tabs. 4 and 5). The Gumbel–Hougaard copula is well known as having more realistic properties in extreme values (see [9]); in the copula’s framework, this means near to the values 0 or 1.

Table 4

Case H × VFvs. Dmax (see Tab. 3). From left to right: interval (a, b], v lower threshold of V, Pi = Probi (V > v|U ∈ (a, b]) following equation (8), i = 1, 2 and equation (7), i = 3, 4, Non Informative settings i = 1, 3 and Informative settings i = 2, 4. In bold, the highest probability per line.

Table 5

Case Hf vs. Dmax (see Tab. 3). From left to right: interval (a, b], v lower threshold of V, Pi = Probi (V > v|U ∈ (a, b]) following equation (8), i = 1, 2 and equation (7), i = 3, 4, Non Informative settings i = 1, 3 and Informative settings i = 2, 4. In bold, the highest probability per line.

In Tables 6 and 7, we associate the [a, b) intervals (a, b ∈ [0, 1]) with dam factor intervals in the originally observed scale. We also associate the v values with the Dmax thresholds in the original scale, according to the original data. Since it is a small database of 30 observations, each interval (in Tab. 6) on the original scale corresponds to a small number of cases, on average approximately 30/4.

Table 6

Relation between the intervals (a, b] and the dam factor intervals.

Table 7

Relation between v and the Dmax thresholds.

To compute probabilities in real cases, it is enough to identify on the original scale the dam factor interval associated with U ∈ (a, b] and also the threshold in V, v associated with Dmax. For instance, consider the case 11 of [1], first line in Table 8. Both versions of dam factor are in the interval (66.5, 2880], which is associated with (a, b] = (0.75, 1], see Table 6. This means that, according to the estimations given by Tables 4 and 5, the probability P(V > 0.75|U ∈ (0.75, 1]) is ≥0.6. While the threshold v = 0.75 is related to Dmax = 25 (see Tab. 7). This already indicates that there is a high probability of the released material traveling 25 km or more. Besides, the computation by the Gumbel–Hougaard copula offers estimates that exceed 0.7. In fact, this case has an observed Dmax = 120. This simple exercise of using Tables 4 and 5 shows the ability of the copulas to identify risks in real situations, for example, if one wants to calculate the probability of the material released reaching a certain point as a city, a river, etc. The practical impact of these estimates also exposes the need of defining an estimator of the dam factor. That, according to the notions used here H × VF and Hf = H × V F V T $ \frac{{V}_F}{{V}_T}$ × VF, implies estimating VF.

Table 8

Special cases.

The next subsection is intended to estimate VF, since an adequate estimate of VF guarantees a reliable estimate of the dam factor. For such, we will take the VT value as an available information.

Expected Value for VF given VT

As proved in [10], the expected value E ( V | U ( c , d ] ) $ \mathbb{E}(V|U\in (c,d])$ can be computed by integrating the copula function,

E ( V | U ( c , d ] ) = 1 - 1 d - c ( 0 1 C ( d , v ) d v - 0 1 C ( c , v ) d v ) . $$ \mathbb{E}(V|U\in (c,d])=1-\frac{1}{d-c}\left({\int }_0^1 C(d,v)\mathrm{d}v-{\int }_0^1 C(c,v)\mathrm{d}v\right). $$(9)

Under the Gaussian copula we estimate the equation (9) by means of the Bayesian estimator by quadratic loss function of ρ, say ρ ̂ B $ {\widehat{\rho }}_B$ (see Tab. 3). The estimator of equation (9) is defined as,

E ̂ ( V | U ( c , d ] ) = 1 - 1 d - c ( 0 1 C ( d , v | ρ ̂ B ) d v - 0 1 C ( c , v | ρ ̂ B ) d v ) . $$ \widehat{\mathbb{E}}(V|U\in (c,d])=1-\frac{1}{d-c}\left({\int }_0^1 C(d,v|{\widehat{\rho }}_B)\mathrm{d}v-{\int }_0^1 C(c,v|{\widehat{\rho }}_B)\mathrm{d}v\right). $$(10)

In the same way, and under the Gumbel–Hougaard copula we estimate the equation (9) using the Bayesian estimator by quadratic loss function of θ, say θ ̂ B $ {\widehat{\theta }}_B$ (see Tab. 3). Here is the estimator:

E ̂ ( V | U ( c , d ] ) = 1 - 1 d - c ( 0 1 C ( d , v | θ ̂ B ) d v - 0 1 C ( c , v | θ ̂ B ) d v ) . $$ \widehat{\mathbb{E}}(V|U\in (c,d])=1-\frac{1}{d-c}\left({\int }_0^1 C(d,v|{\widehat{\theta }}_B)\mathrm{d}v-{\int }_0^1 C(c,v|{\widehat{\theta }}_B)\mathrm{d}v\right). $$(11)

In Table 9, we clearly visualize the effect of the copula on the magnitude of the mean values, the Gumbel–Hougaard copula offers the highest values in the extreme intervals and the Gaussian copula does it in the central intervals. As expected, with increasing the U interval (VT scaled to [0, 1]) the mean of V increases (VF scaled to [0, 1]).

Table 9

E i $ {\mathbb{E}}_i$, i = 1, 2 computed using Gumbel–Hougaard copula (Eq. (11)), E i $ {\mathbb{E}}_i$, i = 3, 4 computed using Gaussian copula (Eq. (10)), i = 1, 3 from Non Informative settings, i = 2, 4 from Informative settings. In brackets, the standard deviation of the expected values. In bold, the highest mean expected values per line.

Table 10

Original scale. E i $ {\mathbb{E}}_i$, i = 1, 2 computed using Gumbel–Hougaard copula, E i $ {\mathbb{E}}_i$, i = 3, 4 computed using Gaussian copula, i = 1, 3 from Non Informative settings, i = 2, 4 from Informative settings.

In Table 10, we record in the first column the ranges of VF values corresponding to the intervals to the left of Table 9, these last ranges made in the percentages of VF. Table 10 offers naive estimates for VF, based on VT intervals. For example, E ̂ 2 ( V F | V T ( c , d ] ) $ {\widehat{\mathbb{E}}}_2({V}_F|{V}_T\in (c,d])$ is the estimator of VF when VT ∈ (c, d] and computed from the Gumbel–Hougaard copula with an Informative setting.

Next, we compare the intervals in which the naive estimates fall, with those reported in Table 8 and, according to Table 6. Regarding line 1 of Table 8: since VT = 12.34, this value is associated with the interval (7.040, 74.00] of Table 10, which brings us to an average of VF estimated as E ̂ i $ {\widehat{\mathbb{E}}}_i$(VFVT ∈ (7.04, 74]) = 32. When calculating the dam factor-(i), we obtain H × E ̂ i $ {\widehat{\mathbb{E}}}_i$(VF|VT ∈ (7.04, 74]) = 15 × 32 = 480 ∈ (66.5, 2880], note that the observed value 135 (Tab. 8) is also in the interval (66.5, 2880]. Computing the dam factor-(ii), we obtain H × E ̂ i ( V F | V T ( 7.04,74 ] ) V T $ \frac{{\widehat{\mathbb{E}}}_i({V}_F|{V}_T\in (\mathrm{7.04,74}])}{{V}_T}$ × E ̂ i $ {\widehat{\mathbb{E}}}_i$(VF|VT ∈ (7.04, 74]) = 15 × 32/12.34 × 32 = 1244.733 ∈ (13.85, 1675.64], which is in fact the interval in which was found the observed data (98.46). In other words, the estimates and observed values are accommodated in the same risk interval, reported in Table 6. If we consider the Samarco case (second case in Tab. 8), we have that the estimation is E ̂ i $ {\widehat{\mathbb{E}}}_i$(VF|VT ∈ (7.04, 74]) = 32, since the observed VT = 55 ∈ (7.04, 74], and in fact the observed value is VF = 32. Then, the dam factor estimates are the same as those recorded in Table 8.

Conclusion

In this article, we model the dependence between each version of the dam factor and Dmax with dam factor given by the risk factor of a reservoir that depends on its dam’s height (total volume) H (VT) and the volume of material released VF after a collapse. Dmax is given by the maximum distance reached by the dispensed material. In this way, it is possible to accurately describe the performance of the dependence relationship between each dam factor and Dmax. Furthermore, using the BIC criterion we can point out the best representation, given in our case by the Gaussian copula (see Tab. 2). We were also able to delineate the dependence between the volume available from a reservoir VT and the material’s volume VF capable of being expelled after a collapse (see Tab. 1).

Through a Bayesian study, we offer scenarios that aim to give flexibility to the representation of the dependence. Under an assumption of lack of prior information, the NI scenario would be the one indicated (see Tab. 3). With the alliance between Bayesian tools and copula models we can determine tail probabilities of Dmax values scaled to [0, 1] and conditioned at dam factor intervals (see Tabs. 4 and 5). With them, we determine risks for reservoirs, depending on the dam factor.

We also present in this article a Bayesian and non-parametric method to infer the dam factor of a reservoir (see Sect. Expected Value for VF Given VT). We show its efficiency for some available cases in the literature. Note that such a method could be improved by univariate modeling techniques in future research studies. We see that the Gaussian copula has a competitor, which is the Gumbel–Hougaard copula (Tabs. 1 and 2), which offers slightly different (and higher) probabilities for extreme events (see Tabs. 4 and 5). On the other hand, when using the Gumbel–Hougaard copula in calculating the dam factor (see Tabs. 9 and 10), it produces results comparable to those of the Gaussian copula, which shows that the notion of mean hides certain stochastic characteristics. For a complement of other aspects addressed in this problem, see [11]. We consider this work as the starting point of other approaches enriched with marginal models, since here we treat the marginal effect only from a non-parametric perspective (ranks), which may show restrictions, in the presence of few data.

Acknowledgments

R. Rodrigues de Moraes gratefully acknowledge the partial financial support provided by CAPES and CNPq with fellowships from the Master Program in Statistics of University of Campinas. Also, the authors wish to thank the two referees for their many useful comments and suggestions on an earlier draft of this paper.


References

  1. Concha Larrauri P, Lall U (2018), Tailings dams failures: Updated statistical model for discharge volume and runout. Environments 5, 2, 28. [Google Scholar]
  2. Rico M, Benito G, Diez-Herrero A (2008), Floods from tailings dam failures. J Hazard Mater 154, 1–3, 79–87. [Google Scholar]
  3. Nelsen RB (2007), An introduction to copulas. Springer Science & Business Media. [Google Scholar]
  4. Sklar M (1959), Fonctions de repartition à n dimensions et leurs marges. Publ Inst Statist Univ Paris 8, 229–231. [Google Scholar]
  5. Mikusinski P, Sherwood H, Taylor MD (1991), The Fréchet bounds revisited. Real Anal Exch 17, 2, 759–764. [CrossRef] [Google Scholar]
  6. Schwarz G (1978), Estimating the dimension of a model. Ann Stat 6, 2, 461–464. [Google Scholar]
  7. Deheuvels P (1981), A non parametric test for independence. Publ Inst Stat Univ Paris 26, 29–50. [Google Scholar]
  8. Genest C, Nešlehová J (2012), Tests of symmetry for bivariate copulas. Ann Inst Stat Math 64, 4, 811–834. [Google Scholar]
  9. Genest C, Rivest LP (1989), A characterization of Gumbel’s family of extreme value distributions. Stat Prob Lett 8, 3, 207–211. [CrossRef] [Google Scholar]
  10. González-López VA, Rodrigues de Moraes R (2020), A copula-based quantifying of the relationship between race inequality among neighbourhoods in São Paulo and age at death, 4open 3, 11. [CrossRef] [EDP Sciences] [Google Scholar]
  11. Rodrigues de Moraes R (2020), Eventos Caudais na Prática. Modelagem Bayesiana via Cópulas (Unpublished Master’s Thesis). [Google Scholar]

Cite this article as: Canno Ferreira Fais LM, González-López VA, Rodrigues DS & Rodrigues de Moraes R 2020. A copula based representation for tailings dam failures. 4open, 3, 12

All Tables

Table 1

Parameters estimated by Maximum Log-Likelihood (MLL) and BIC values for the copula between the pseudo-observations ranks(VT) and ranks(VF); degrees of t-Student η ̂ $ \widehat{\eta }$ = 3.529 also estimated by MLL using the tCopula() function of the copula R-package.

Table 2

Parameters estimated by Maximum Log-Likelihood (MLL) and BIC values for the copula between the pseudo-observations ranks(dam factor) and ranks(Dmax). Left: dam factor = H × VF, η ̂ $ \widehat{\eta }$ = 12378.75; Right: dam factor = Hf, η ̂ $ \widehat{\eta }$ = 1479.85. In both cases the degrees η are estimated by MLL using the tCopula() function of the copula R-package.

Table 3

Summaries of the Bayesian estimation.

Table 4

Case H × VFvs. Dmax (see Tab. 3). From left to right: interval (a, b], v lower threshold of V, Pi = Probi (V > v|U ∈ (a, b]) following equation (8), i = 1, 2 and equation (7), i = 3, 4, Non Informative settings i = 1, 3 and Informative settings i = 2, 4. In bold, the highest probability per line.

Table 5

Case Hf vs. Dmax (see Tab. 3). From left to right: interval (a, b], v lower threshold of V, Pi = Probi (V > v|U ∈ (a, b]) following equation (8), i = 1, 2 and equation (7), i = 3, 4, Non Informative settings i = 1, 3 and Informative settings i = 2, 4. In bold, the highest probability per line.

Table 6

Relation between the intervals (a, b] and the dam factor intervals.

Table 7

Relation between v and the Dmax thresholds.

Table 8

Special cases.

Table 9

E i $ {\mathbb{E}}_i$, i = 1, 2 computed using Gumbel–Hougaard copula (Eq. (11)), E i $ {\mathbb{E}}_i$, i = 3, 4 computed using Gaussian copula (Eq. (10)), i = 1, 3 from Non Informative settings, i = 2, 4 from Informative settings. In brackets, the standard deviation of the expected values. In bold, the highest mean expected values per line.

Table 10

Original scale. E i $ {\mathbb{E}}_i$, i = 1, 2 computed using Gumbel–Hougaard copula, E i $ {\mathbb{E}}_i$, i = 3, 4 computed using Gaussian copula, i = 1, 3 from Non Informative settings, i = 2, 4 from Informative settings.

All Figures

thumbnail Figure 1

Scatterplot for VT (×106 m3) vs. VF (×106 m3) with (a) original observations and (b) observations transformed to [0, 1] by scaling ranks, U vs. V.

In the text
thumbnail Figure 2

Scatterplot for H × VF vs. Dmax with (a) original observations and (b) observations transformed to [0, 1] by scaling ranks, U vs. V. H in meters (m), Dmax in kilometres (km).

In the text
thumbnail Figure 3

Scatterplot for Hf vs. Dmax with (a) original observations and (b) observations transformed to [0, 1] by scaling ranks, U vs. V. H in meters (m), Dmax in kilometres (km).

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.