Issue 
4open
Volume 3, 2020



Article Number  12  
Number of page(s)  10  
Section  Mathematics  Applied Mathematics  
DOI  https://doi.org/10.1051/fopen/2020011  
Published online  25 September 2020 
Research Article
A copula based representation for tailings dam failures
^{1}
School of Technology, University of Campinas, Paschoal Marmo 1888, 13484332 Limeira, S.P., Brazil
^{2}
Department of Statistics, University of Campinas, Sergio Buarque de Holanda 651, 13083859 Campinas, S.P., Brazil
^{*} Corresponding author: rafael.moraes@gmx.de
Received:
16
March
2020
Accepted:
13
August
2020
In this article, we model the dependence between dam factor and D _{max}, where dam factor is an indicator of risk of a tailings dam failure, which involves the height H of the tailings dam, the volume of material housed by the tailings dam V_{T} and the volume dispensed by the tailings dam, V_{F}, when the dam breaks. And, D_{max} is the maximum distance traveled by the material released by the tailings dam, after the collapse. With the dependence found via copula models and Bayesian estimation, given a range of dam factor, we estimate the probability of the released material to exceed a certain threshold. Since the dam factor involves the released volume V_{F} (unknown before the dam break), we present a naive way to estimate it using V_{T} and H. In this way, it is possible to estimate the dam factor of a tailings dam and with such a value to identify the probability of the tailings dam to show a D_{max} that exceeds a certain threshold.
Key words: Copula models / Bayesian estimation / Conditional probability
© L.M. Canno Ferreira Fais, et al., Published by EDP Sciences, 2020
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Introduction
Tailings dams are scattered throughout various regions of the world, many of them already inactive, others still operating. Some countries have turned their attention to the conditions of such reservoirs, as entire regions have suffered from the impact of tailings dams breaks. The impacts of such ruptures have been incalculable, taking lives, leaving useless areas, and polluting rivers, projecting damages miles away from the reservoir site. Despite the demand imposed by several entities, the problem of regularizing these dams is complex due to several reasons. Many reservoirs are not used and their maintenance requires specialized equipment/technicians. In other cases, it is not easy to identify those responsible for the reservoirs because they work (or worked) for companies which are no longer in operation.
Several aspects must be taken into account in establishing the reliability of a tailings dam, the construction method, its size, the region where it is located, etc. For example, when talking about the region where it is located, factors such as the proximity of watercourses or cities raises the risk of the tailings dam in case of failure. Moreover, it is very common to find entire cities standing in the vicinity of reservoirs of this type, since the mining activity that surrounds the reservoir generates jobs and as a consequence facilitates the rising of communities that are sustained mainly by this activity. As expected, the databases that can help understand the dynamics of these tailings dams break are small (few cases), or at least those from which comparable information has been collected.
A valuable database of 35 tailings dam breaks reported around the world is presented by [1]. Information such as the volume of the material in a tailings dam (V_{T}) could be used to determine the volume of the material to be released in a dam breaking (V_{F}). The height H, attained by the material contained in the dam, and the volume of the released material could be used to estimate the distance reached by the released material (D_{max}). Among the amounts to consider, the following two quantities are usually incorporated into the studies: (i) H × V_{F} (dam factor), see [2], (ii) H_{f} = H × × V_{F} (dam factor related to the fractional volume ), see [1]. In the literature there are several proposals to determine V_{F} and D_{max}. The most recent according to our knowledge is also given in [1], where the authors propose a stochastic linear relationship between log(V_{F}) and log(V_{T}), and also a stochastic linear relationship between D_{max} (or log(D_{max})) and the dam factor version (i) or (ii).
Several aspects arise in the statistical perspective used in [1]. The first one is about the estimation of the parameters of the linear relations. A frequentist view as it is used could be compromised by the limited number of cases available in the database, since the law of large numbers (large sample size) is not feasible, which results in a lack of good properties in parameter estimators. To overcome this limitation, a Bayesian approach is then recommended. The second aspect presented in [1] is the imposition of linearity in the relationships to be estimated. This assumption is quite ambitious and, although the authors take the precaution of analyzing the goodness of the fit, the validity of the diagnoses also ends up being compromised by the size of the data set. Using the same data as [1], here we address both issues, making more flexible the relationship between V_{T} and V_{F} and also between dam factor and D_{max} using copula models and from a Bayesian perspective.
This article is organized as follows. Section Theoretical Background exposes the theoretical foundations of the work, with a focus on dependence models (see [3]). Inspecting the dependence relationships of the data provided in [1], the model selection procedure and the referred Bayesian parameter estimation are presented in Section Model Representation. Section Chances for Large Values of Dmax shows a dam factorbased prediction study in order to determine the distance that the released material can travel after a breakdown. In this same section, we offer a strategy to identify the reservoir’s dam factor. Finally, Section Conclusion closes the paper with some general remarks and conclusions.
Theoretical background
What really interests us is the relationship between X_{1} and X_{2}, where X_{1} is the variable V_{T} (or dam factor) and X_{2} is the variable V_{F} (or D_{max}) corresponding to tailings dams. Figures 1a, 2a and 3a show the scatter plots between the pairs (see [1]), with the most extreme cases highlighted.
Figure 1
Scatterplot for V_{T} (×10^{6} m^{3}) vs. V_{F} (×10^{6} m^{3}) with (a) original observations and (b) observations transformed to [0, 1] by scaling ranks, U vs. V. 
Figure 2
Scatterplot for H × V_{F} vs. D_{max} with (a) original observations and (b) observations transformed to [0, 1] by scaling ranks, U vs. V. H in meters (m), D_{max} in kilometres (km). 
Figure 3
Scatterplot for H_{f} vs. D_{max} with (a) original observations and (b) observations transformed to [0, 1] by scaling ranks, U vs. V. H in meters (m), D_{max} in kilometres (km). 
We approach this problem using the concept of copula, where if H is the cumulative distribution function of (X_{1}, X_{2}), being those continuous random variables, there is a function C such that for all (x, y) ∈ Image (X_{1}, X_{2}),
If the function C is the 2copula of (X_{1}, X_{2}), C(u, v) = Prob (F_{1}(X_{1}) ≤ u, F_{2}(X_{2}) ≤ v), for u, v ∈ [0,1], then, C is the joint distribution of the variables U F_{1}(X_{1}) and V F_{2}(X_{2}). The function C is the one that we want to identify. The result given by (1) – see [4] – allows the decomposition of H into three functions: the marginal distribution F_{1}, the marginal distribution F_{2}, and the copula function C. F_{1} and F_{2} are marginally determined, that is, it is enough to inspect the stochastic behavior of X_{1} to determine F_{1} and similarly, X_{2} is enough to determine F_{2}. Already to determine C, we need both variables duly transformed by their marginal distributions. Copula functions cover all dependence types, since ∀(u, v) ∈ [0, 1]^{2} and for all copula C(u, v),
(X_{1}, X_{2}) has 2copula M (W) if and only if X_{2} is a monotone nondecreasing (nonincreasing) funcion of X_{1} almost surely (see [5]). This property demonstrates that for X_{1} and X_{2} verify a linear relationship the 2copula of (X_{1}, X_{2}) can only be M (Spearman correlation coefficient = 1) or W (Spearman correlation coefficient = −1). In the next example, we show a family of copulas indexed by a parameter θ ∈ [1, ∞). This allows covering the cases in the spectrum considered by the result (2).
Example 2.1
Consider (u, v) ∈ [0, 1]^{2}, θ ∈ [1, ∞), the Gumbel–Hougaard copula is given by,
The Gumbel–Hougaard family is a family inside the Archimedean class (see [3]). That being, then there is a function ϕ: [0, 1] → [0, ∞] associated with this family which generates it. In this case, ϕ_{θ}(t) (−ln(t))^{θ} and the generation of C occurs since the following analytical expression is allowed:
(3)where Note that C(u, vθ = 1) = uv (independence between the random variables) and C(u, vθ) → M(u, v) when θ → ∞ (positive and strict dependence between the variables). This means that the Gumbel–Hougaard family allows to model positive dependence types with a spectrum that goes from independence to perfect positive dependence, not being possible for this family the cases from independence to the copula W. Also, the Kendall’s tau coefficient is given by because θ ∈ [1, ∞). So, the spectrum of τ_{θ} summarizes the positive dependence attained by the family. {C(·,·θ)}_{θ∈[1,∞)} is a positively ordered family since, given θ, θ′ such that 1 ≤ θ ≤ θ′ < ∞, C(u, vθ) ≤ C(u, vθ′), ∀(u,v) ∈ [0, 1]^{2}. Features such as those mentioned for the Gumbel–Hougaard copula serve as a framework to interpret the dependence of random variables. This subject will be central in the next sections when the dependence between the variables of interest is determined.
In the application, we compare several copula models, some of them to be formulated considering the equation (3) (Archimedean copulas) with appropriate generators ϕ_{θ}, for which the pseudoinverse of ϕ_{θ} is defined as when 0 ≤ s ≤ ϕ_{θ}(0) and if ϕ_{θ}(0) ≤ s ≤ ∞. For instance, if ϕ_{θ}(t) (t^{−θ} − 1), θ ∈ [−1, ∞)\{0}, the model generated by equation (3) is called Clayton family, if , θ ∈ (−∞, ∞)\{0}, the model generated by equation (3) is called Frank family. And, if ϕ_{θ}(t) −ln(1 − (1 − t)^{θ}), θ ∈ [1, ∞), the model generated by equation (3) is called Joe family. Clayton and Frank reach the limits W and M, see equation (2). Joe’s family follows the same pattern as the Gumbel–Hougaard, only allowing cases from the independence to M. We also consider two wellknown copulas belonging to the family of elliptical copulas with the shape,
(4)for an appropriate function ψ and parameter ρ ∈ [−1, 1]. (i) The Gaussian copula given by ψ(t) Φ(t) which is the usual cumulative standard Gaussian distribution, N(0, 1) and ψ(s, tρ) Φ(s, tρ) which is the bivariate standard Gaussian distribution centered in zero, N_{2}(0, P) with ; (ii) the tStudent copula given by ψ(t) T_{η}(t) which is the cumulative of the univariate tStudent distribution with η degrees and ψ(s, tρ) T_{η}(s, tρ) that is the bivariate cumulative tStudent distribution with η degrees of freedom and ρ correlation.
As can be verified by the examples of copulas registered here, there is a huge diversity concerning the possibilities of definition for the relationships between X_{1} and X_{2}. Many of these are available in statistical libraries (see for example copula package in R project [Rproject],^{1} that in our case is used for the modeling).
With the variety of models introduced previously, we wish to cover a considerable range of dependence types that allow us to determine the best representation of the dependence between X_{1} and X_{2}. For comparison between the models, we will adopt a model selection criterion (see [6]).
Model representation
In order to proceed with the estimation of the model, the original observations are replaced by their rescaled marginal ranks to [0, 1], and i = 1, …, n, where A denotes the cardinal of the set A. This step has two purposes, namely to neutralize the effect arising from the (possibly) difference in measurement scale of the original observations and to mitigate the problems of evaluating the copula likelihood function in the boundaries of the region [0, 1] × [0, 1]. This latter task is done by dividing the ranks by n + 1 so that the pseudoobservations, (or , never attain 0 or 1. Pseudo observations are nonparametric estimates of U = F_{1}(X_{1}) (or V = F_{2}(X_{2})), furthermore, the function C is the distribution of (U, V), which leads us to infer that the dependence exposed by equation (1) is revealed when exploring the dispersion between the pseudoobservations. See the scatterplot for the three cases: (1) V_{T} vs. V_{F}, (2) H × V_{F} vs. D_{max}, (3) H_{f} vs. D_{max}, of the values U vs. V represented by the pseudoobservations , Figures 1b, 2b and 3b.
In order to guarantee some conditions required by the models, we test for stochastic independence and exchangeability. Originally proposed by [7], the independence test applied here, is based on the distance between the empirical copula and the product copula Π(u, v) = uv and is implemented using simulations executed by the indepTest() function from the copula Rpackage. We test H_{0}: U and V are independent and, H_{0} is rejected with pvalue < 0.001 in all the cases. Such evidence supports the search for a model that represents the dependence. A rather desirable property of the dependence is the exchangeability, a condition requested by many families of copulas including the Archimedean and the elliptical ones. A copula is exchangeable if the property C(u, vθ) = C(v, uθ) is verified for all (u, v) ∈ [0, 1]^{2} and the test proposed by [8], based on the empirical copula, is implemented using simulations by the function exchTest() of the copula Rpackage. Then, we test if H_{0}: U and V are exchangeable, and H_{0} is not rejected in all the cases, with pvalues much greater than the generally used significance level of 5%. Such evidence supports to use the models given by Archimedean and elliptical copulas. In order to define the appropriate copula in the three situations we use the function fitCopula(), with arguments (a) copula and (b) method being (a) “claytonCopula(dim = 2)”, “frankCopula(dim = 2)”, “gumbelCopula(dim = 2)”, “joeCopula(dim = 2)”, “normalCopula(dim = 2)”, “tCopula(dim = 2)” and being (b) method = “mpl” (maximum pseudo likelihood) which is the maximum loglikelihood (MLL) method evaluated on the pseudo observations.
Given a copula C its density c is computed and the loglikelihood is given by which is maximized in the underlying parameters to obtain , related to the model C and the set . Note that five of these models have one parameter while the tStudent copula model has two parameters, so that a penalty is applied to this model in order to promote a fair selection. We consider the Bayesian Information Criterion (BIC) for this purpose, see [6],
(5)where N is the total number of parameters of C and n = 30 is the number of observations in the dataset. The BIC takes into account the tradeoff between the maximum loglikelihood, , and the penalty imposed by the second term, , which is done to enforce parsimony, i.e., simpler models having fewer parameters. According to the BIC, the higher the value taken by the equation (5), the better the model.
In the next subsection, we show the results produced by the selection procedure (selecting the best copulas) and also the estimation of the parameters. For the latter, we show the classical estimator as well as two Bayesian versions.
Estimation
We note that in the three cases, the two best models (copulas) are the Gaussian and the Gumbel–Hougaard (see Tabs. 1 and 2). It is clear that due to the size of the database (n = 30) the reliability of the classical parameter estimates may not be the best possible. This aspect can be addressed in a timely manner, but for now we can get some conclusions about the type of relationship in each case.
Parameters estimated by Maximum LogLikelihood (MLL) and BIC values for the copula between the pseudoobservations ranks(V_{T}) and ranks(V_{F}); degrees of tStudent = 3.529 also estimated by MLL using the tCopula() function of the copula Rpackage.
Parameters estimated by Maximum LogLikelihood (MLL) and BIC values for the copula between the pseudoobservations ranks(dam factor) and ranks(D_{max}). Left: dam factor = H × V_{F}, = 12378.75; Right: dam factor = H_{f}, = 1479.85. In both cases the degrees η are estimated by MLL using the tCopula() function of the copula Rpackage.
The Gaussian copula confirms that assuming linearity between X_{1} and X_{2} could not be indicated. If the relationship between these variables were in fact linear, the Gumbel–Hougaard copula could have detected this fact by attributing a very large estimate for θ (in the direction of the M(u, v) copula), see Section Theoretical Background. For all the cases, (1) V_{T} vs. V_{F}, (2) H × V_{F} vs. D_{max}, and (3) H_{f} vs. D_{max}, the estimation of θ is moderate ∈ (2, 3] showing evidence of the nonapplicability of the M copula.
In order to introduce a level of confiability to the parametric estimation process, we conduct a Bayesian estimation. We apply Hamiltonian Monte Carlo (HMC) simulations through the rstan Rpackage in two Bayesian strategies, described below and in general terms, NonInformative (NI) and Informative (I). In Table 3, we show the results for each of the three cases and for each of the two best models pointed out by the BIC, see Tables 1 and 2. For the NonInformative cases of the Gaussian copula, we use an uniform prior distribution for ρ, properly accommodated in the interval [−1, 1], which is the range of possibilities of the ρ parameter. The Informative cases of Gaussian copula are implemented assuming a Beta priori distribution for ρ, accommodated in [−1, 1], where its mode is the one derived from the relation between Kendall’s tau coefficient and the copula association parameter. For instance, by means of the empirical estimation of Kendall’s tau coefficient we can obtain an estimation of the parameter (see iTau() function from copula R package). The NI cases of Gumbel–Hougaard copula are implemented assuming as a prior distribution on θ an improper prior distribution (proportional to 1), while the Informative cases are implemented by assuming a Gamma distribution on θ accommodated in [1, ∞) with mode indicated by the parameter estimation derived from the iTau() function.
Summaries of the Bayesian estimation.
While the informative scenario (I) aims to follow the trend of the data, the noninformative (NI) seeks objectivity, without risking in declaring trends. Table 3 shows convergence indicators of the simulations and a descriptive summary of the values taken by the posterior distributions of ρ (or θ) obtained by the simulations. We note that Bayesian estimates (by quadratic and multilinear loss) return values close to the classic estimates reported in Tables 1 and 2, which gives coherence to the adjustments.
In this section, we have determined the best representation of the dependence between X_{1} and X_{2} in each of the three cases, which happens to be given by the Gaussian copula in the first instance and by the Gumbel–Hougaard copula in the second instance. We also implemented the Bayesian estimation of the parameters involved, determining among other aspects an estimation of the dependence structure, given by in the case of Gaussian copula (Eq. (4)) and for the Gumbel–Hougaard copula (Example 2.1), where and are the Bayesian estimators given by a certain loss function.
Chances for large values of D_{max}
In this section, we approach the specific problem of calculating the probability that the material released by the dam will exceed a certain distance, given a specific risk represented by dam factor values. To address this issue, we use the copula determined by the BIC and the Bayesian estimate presented in Table 3. Given that the perspective via copulas translates the problem to the ranks of the observations, we first address the questions for the ranks of the observations, in this paper denoted by U and V. Now we operate with the vector (U, V) represented by the pseudoobservations coming from the paired values of dam factor and D_{max} transformed to [0, 1] by marginal scaling ranks. Then, for each threshold v ∈ [0, 1] and interval (a, b] ⊂ [0, 1],
Under the Gaussian copula, we estimate the probability (6) using the Bayesian estimator by quadratic loss function of ρ, say In the same way, under the Gumbel–Hougaard copula, we estimate the probability (6) using the Bayesian estimator by quadratic loss function of θ, say . Then, the estimators of (6) are,
(8)respectively. Each one of the estimates (Eqs. (7) and (8)) presents two versions, one under the NonInformative setting and another under the Informative setting, see Table 3. Tables 4 and 5 record the estimated values according to equations (7) and (8). Table 4 shows the estimates when the dam factor is H × V_{F} and Table 5 shows the estimates when the dam factor is H_{f}. We note that the highest probabilities occur with the estimate implemented by the Gumbel–Hougaard copula (Eq. (8)) for the intervals of the extremes (0, 0.25] and (0.75, 1], while for the central intervals, the Gaussian copula (Eq. (7)) offers the highest values. This behavior is displayed for both types of dam factor (Tabs. 4 and 5). The Gumbel–Hougaard copula is well known as having more realistic properties in extreme values (see [9]); in the copula’s framework, this means near to the values 0 or 1.
Case H × V_{F}vs. D_{max} (see Tab. 3). From left to right: interval (a, b], v lower threshold of V, P_{i} = Prob_{i} (V > vU ∈ (a, b]) following equation (8), i = 1, 2 and equation (7), i = 3, 4, Non Informative settings i = 1, 3 and Informative settings i = 2, 4. In bold, the highest probability per line.
Case H_{f} vs. D_{max} (see Tab. 3). From left to right: interval (a, b], v lower threshold of V, P_{i} = Prob_{i} (V > vU ∈ (a, b]) following equation (8), i = 1, 2 and equation (7), i = 3, 4, Non Informative settings i = 1, 3 and Informative settings i = 2, 4. In bold, the highest probability per line.
In Tables 6 and 7, we associate the [a, b) intervals (a, b ∈ [0, 1]) with dam factor intervals in the originally observed scale. We also associate the v values with the D_{max} thresholds in the original scale, according to the original data. Since it is a small database of 30 observations, each interval (in Tab. 6) on the original scale corresponds to a small number of cases, on average approximately 30/4.
Relation between the intervals (a, b] and the dam factor intervals.
Relation between v and the D_{max} thresholds.
To compute probabilities in real cases, it is enough to identify on the original scale the dam factor interval associated with U ∈ (a, b] and also the threshold in V, v associated with D_{max}. For instance, consider the case 11 of [1], first line in Table 8. Both versions of dam factor are in the interval (66.5, 2880], which is associated with (a, b] = (0.75, 1], see Table 6. This means that, according to the estimations given by Tables 4 and 5, the probability P(V > 0.75U ∈ (0.75, 1]) is ≥0.6. While the threshold v = 0.75 is related to D_{max} = 25 (see Tab. 7). This already indicates that there is a high probability of the released material traveling 25 km or more. Besides, the computation by the Gumbel–Hougaard copula offers estimates that exceed 0.7. In fact, this case has an observed D_{max} = 120. This simple exercise of using Tables 4 and 5 shows the ability of the copulas to identify risks in real situations, for example, if one wants to calculate the probability of the material released reaching a certain point as a city, a river, etc. The practical impact of these estimates also exposes the need of defining an estimator of the dam factor. That, according to the notions used here H × V_{F} and H_{f} = H × × V_{F}, implies estimating V_{F}.
Special cases.
The next subsection is intended to estimate V_{F}, since an adequate estimate of V_{F} guarantees a reliable estimate of the dam factor. For such, we will take the V_{T} value as an available information.
Expected Value for V_{F} given V_{T}
As proved in [10], the expected value can be computed by integrating the copula function,
Under the Gaussian copula we estimate the equation (9) by means of the Bayesian estimator by quadratic loss function of ρ, say (see Tab. 3). The estimator of equation (9) is defined as,
In the same way, and under the Gumbel–Hougaard copula we estimate the equation (9) using the Bayesian estimator by quadratic loss function of θ, say (see Tab. 3). Here is the estimator:
In Table 9, we clearly visualize the effect of the copula on the magnitude of the mean values, the Gumbel–Hougaard copula offers the highest values in the extreme intervals and the Gaussian copula does it in the central intervals. As expected, with increasing the U interval (V_{T} scaled to [0, 1]) the mean of V increases (V_{F} scaled to [0, 1]).
, i = 1, 2 computed using Gumbel–Hougaard copula (Eq. (11)), , i = 3, 4 computed using Gaussian copula (Eq. (10)), i = 1, 3 from Non Informative settings, i = 2, 4 from Informative settings. In brackets, the standard deviation of the expected values. In bold, the highest mean expected values per line.
Original scale. , i = 1, 2 computed using Gumbel–Hougaard copula, , i = 3, 4 computed using Gaussian copula, i = 1, 3 from Non Informative settings, i = 2, 4 from Informative settings.
In Table 10, we record in the first column the ranges of V_{F} values corresponding to the intervals to the left of Table 9, these last ranges made in the percentages of V_{F}. Table 10 offers naive estimates for V_{F}, based on V_{T} intervals. For example, is the estimator of V_{F} when V_{T} ∈ (c, d] and computed from the Gumbel–Hougaard copula with an Informative setting.
Next, we compare the intervals in which the naive estimates fall, with those reported in Table 8 and, according to Table 6. Regarding line 1 of Table 8: since V_{T} = 12.34, this value is associated with the interval (7.040, 74.00] of Table 10, which brings us to an average of V_{F} estimated as (V_{F}│V_{T} ∈ (7.04, 74]) = 32. When calculating the dam factor(i), we obtain H × (V_{F}V_{T} ∈ (7.04, 74]) = 15 × 32 = 480 ∈ (66.5, 2880], note that the observed value 135 (Tab. 8) is also in the interval (66.5, 2880]. Computing the dam factor(ii), we obtain H × × (V_{F}V_{T} ∈ (7.04, 74]) = 15 × 32/12.34 × 32 = 1244.733 ∈ (13.85, 1675.64], which is in fact the interval in which was found the observed data (98.46). In other words, the estimates and observed values are accommodated in the same risk interval, reported in Table 6. If we consider the Samarco case (second case in Tab. 8), we have that the estimation is (V_{F}V_{T} ∈ (7.04, 74]) = 32, since the observed V_{T} = 55 ∈ (7.04, 74], and in fact the observed value is V_{F} = 32. Then, the dam factor estimates are the same as those recorded in Table 8.
Conclusion
In this article, we model the dependence between each version of the dam factor and D_{max} with dam factor given by the risk factor of a reservoir that depends on its dam’s height (total volume) H (V_{T}) and the volume of material released V_{F} after a collapse. D_{max} is given by the maximum distance reached by the dispensed material. In this way, it is possible to accurately describe the performance of the dependence relationship between each dam factor and D_{max}. Furthermore, using the BIC criterion we can point out the best representation, given in our case by the Gaussian copula (see Tab. 2). We were also able to delineate the dependence between the volume available from a reservoir V_{T} and the material’s volume V_{F} capable of being expelled after a collapse (see Tab. 1).
Through a Bayesian study, we offer scenarios that aim to give flexibility to the representation of the dependence. Under an assumption of lack of prior information, the NI scenario would be the one indicated (see Tab. 3). With the alliance between Bayesian tools and copula models we can determine tail probabilities of D_{max} values scaled to [0, 1] and conditioned at dam factor intervals (see Tabs. 4 and 5). With them, we determine risks for reservoirs, depending on the dam factor.
We also present in this article a Bayesian and nonparametric method to infer the dam factor of a reservoir (see Sect. Expected Value for VF Given VT). We show its efficiency for some available cases in the literature. Note that such a method could be improved by univariate modeling techniques in future research studies. We see that the Gaussian copula has a competitor, which is the Gumbel–Hougaard copula (Tabs. 1 and 2), which offers slightly different (and higher) probabilities for extreme events (see Tabs. 4 and 5). On the other hand, when using the Gumbel–Hougaard copula in calculating the dam factor (see Tabs. 9 and 10), it produces results comparable to those of the Gaussian copula, which shows that the notion of mean hides certain stochastic characteristics. For a complement of other aspects addressed in this problem, see [11]. We consider this work as the starting point of other approaches enriched with marginal models, since here we treat the marginal effect only from a nonparametric perspective (ranks), which may show restrictions, in the presence of few data.
Acknowledgments
R. Rodrigues de Moraes gratefully acknowledge the partial financial support provided by CAPES and CNPq with fellowships from the Master Program in Statistics of University of Campinas. Also, the authors wish to thank the two referees for their many useful comments and suggestions on an earlier draft of this paper.
References
 Concha Larrauri P, Lall U (2018), Tailings dams failures: Updated statistical model for discharge volume and runout. Environments 5, 2, 28. [Google Scholar]
 Rico M, Benito G, DiezHerrero A (2008), Floods from tailings dam failures. J Hazard Mater 154, 1–3, 79–87. [Google Scholar]
 Nelsen RB (2007), An introduction to copulas. Springer Science & Business Media. [Google Scholar]
 Sklar M (1959), Fonctions de repartition à n dimensions et leurs marges. Publ Inst Statist Univ Paris 8, 229–231. [Google Scholar]
 Mikusinski P, Sherwood H, Taylor MD (1991), The Fréchet bounds revisited. Real Anal Exch 17, 2, 759–764. [CrossRef] [Google Scholar]
 Schwarz G (1978), Estimating the dimension of a model. Ann Stat 6, 2, 461–464. [Google Scholar]
 Deheuvels P (1981), A non parametric test for independence. Publ Inst Stat Univ Paris 26, 29–50. [Google Scholar]
 Genest C, Nešlehová J (2012), Tests of symmetry for bivariate copulas. Ann Inst Stat Math 64, 4, 811–834. [Google Scholar]
 Genest C, Rivest LP (1989), A characterization of Gumbel’s family of extreme value distributions. Stat Prob Lett 8, 3, 207–211. [CrossRef] [Google Scholar]
 GonzálezLópez VA, Rodrigues de Moraes R (2020), A copulabased quantifying of the relationship between race inequality among neighbourhoods in São Paulo and age at death, 4open 3, 11. [CrossRef] [EDP Sciences] [Google Scholar]
 Rodrigues de Moraes R (2020), Eventos Caudais na Prática. Modelagem Bayesiana via Cópulas (Unpublished Master’s Thesis). [Google Scholar]
Cite this article as: Canno Ferreira Fais LM, GonzálezLópez VA, Rodrigues DS & Rodrigues de Moraes R 2020. A copula based representation for tailings dam failures. 4open, 3, 12
All Tables
Parameters estimated by Maximum LogLikelihood (MLL) and BIC values for the copula between the pseudoobservations ranks(V_{T}) and ranks(V_{F}); degrees of tStudent = 3.529 also estimated by MLL using the tCopula() function of the copula Rpackage.
Parameters estimated by Maximum LogLikelihood (MLL) and BIC values for the copula between the pseudoobservations ranks(dam factor) and ranks(D_{max}). Left: dam factor = H × V_{F}, = 12378.75; Right: dam factor = H_{f}, = 1479.85. In both cases the degrees η are estimated by MLL using the tCopula() function of the copula Rpackage.
Case H × V_{F}vs. D_{max} (see Tab. 3). From left to right: interval (a, b], v lower threshold of V, P_{i} = Prob_{i} (V > vU ∈ (a, b]) following equation (8), i = 1, 2 and equation (7), i = 3, 4, Non Informative settings i = 1, 3 and Informative settings i = 2, 4. In bold, the highest probability per line.
Case H_{f} vs. D_{max} (see Tab. 3). From left to right: interval (a, b], v lower threshold of V, P_{i} = Prob_{i} (V > vU ∈ (a, b]) following equation (8), i = 1, 2 and equation (7), i = 3, 4, Non Informative settings i = 1, 3 and Informative settings i = 2, 4. In bold, the highest probability per line.
, i = 1, 2 computed using Gumbel–Hougaard copula (Eq. (11)), , i = 3, 4 computed using Gaussian copula (Eq. (10)), i = 1, 3 from Non Informative settings, i = 2, 4 from Informative settings. In brackets, the standard deviation of the expected values. In bold, the highest mean expected values per line.
Original scale. , i = 1, 2 computed using Gumbel–Hougaard copula, , i = 3, 4 computed using Gaussian copula, i = 1, 3 from Non Informative settings, i = 2, 4 from Informative settings.
All Figures
Figure 1
Scatterplot for V_{T} (×10^{6} m^{3}) vs. V_{F} (×10^{6} m^{3}) with (a) original observations and (b) observations transformed to [0, 1] by scaling ranks, U vs. V. 

In the text 
Figure 2
Scatterplot for H × V_{F} vs. D_{max} with (a) original observations and (b) observations transformed to [0, 1] by scaling ranks, U vs. V. H in meters (m), D_{max} in kilometres (km). 

In the text 
Figure 3
Scatterplot for H_{f} vs. D_{max} with (a) original observations and (b) observations transformed to [0, 1] by scaling ranks, U vs. V. H in meters (m), D_{max} in kilometres (km). 

In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.