Academia.eduAcademia.edu

Estimating the quality of landslide susceptibility models

2006, Geomorphology

We present a landslide susceptibility model for the Collazzone area, central Italy, and we propose a framework for evaluating the model reliability and prediction skill. The landslide susceptibility model was obtained through discriminant analysis of 46 thematic environmental variables and using the presence of shallow landslides obtained from a multi-temporal inventory map as the dependent variable for statistical analysis. By comparing the number of correctly and incorrectly classified mapping units, it is established that the model classifies 77.0% of 894 mapping units correctly. Model fitting performance is investigated by comparing the proportion of the study area in each probability class with the corresponding proportion of landslide area. We then prepare an ensemble of 350 landslide susceptibility models using the same landslide and thematic information but different numbers of mapping units. This ensemble is exploited to investigate the model reliability, including the role of the thematic variables used to construct the model, and the model sensitivity to changes in the input data. By studying the variation of the model's susceptibility estimate, the error associated with the susceptibility assessment for each mapping unit is determined. This result is shown on a map that complements the landslide susceptibility map. Prediction skill of the susceptibility model is then estimated by comparing the forecast with two recent event inventory maps. The susceptibility model is found capable of predicting the newly triggered landslides. A general framework for testing a susceptibility model is proposed, including a scheme for ranking the quality of the susceptibility assessment.

Geomorphology 81 (2006) 166 – 184 www.elsevier.com/locate/geomorph Estimating the quality of landslide susceptibility models Fausto Guzzetti ⁎, Paola Reichenbach, Francesca Ardizzone, Mauro Cardinali, Mirco Galli IRPI CNR, via Madonna Alta 126, 06128 Perugia, Italy Received 18 July 2005; received in revised form 13 November 2005; accepted 19 April 2006 Available online 22 May 2006 Abstract We present a landslide susceptibility model for the Collazzone area, central Italy, and we propose a framework for evaluating the model reliability and prediction skill. The landslide susceptibility model was obtained through discriminant analysis of 46 thematic environmental variables and using the presence of shallow landslides obtained from a multi-temporal inventory map as the dependent variable for statistical analysis. By comparing the number of correctly and incorrectly classified mapping units, it is established that the model classifies 77.0% of 894 mapping units correctly. Model fitting performance is investigated by comparing the proportion of the study area in each probability class with the corresponding proportion of landslide area. We then prepare an ensemble of 350 landslide susceptibility models using the same landslide and thematic information but different numbers of mapping units. This ensemble is exploited to investigate the model reliability, including the role of the thematic variables used to construct the model, and the model sensitivity to changes in the input data. By studying the variation of the model's susceptibility estimate, the error associated with the susceptibility assessment for each mapping unit is determined. This result is shown on a map that complements the landslide susceptibility map. Prediction skill of the susceptibility model is then estimated by comparing the forecast with two recent event inventory maps. The susceptibility model is found capable of predicting the newly triggered landslides. A general framework for testing a susceptibility model is proposed, including a scheme for ranking the quality of the susceptibility assessment. © 2006 Elsevier B.V. All rights reserved. Keywords: Landslide susceptibility; Statistically based model; Discriminant analysis; Quality; Uncertainty; Validation; Landslide prediction; Map 1. Introduction Susceptibility is the propensity of an area to generate landslides. In mathematical form, landslide susceptibility is the probability of spatial occurrence of known slope failures, given a set of geoenvironmental conditions (Guzzetti et al., 2005). Assuming landslides will occur in the future because of the same conditions that produced them in the past (Guzzetti et al., 1999), ⁎ Corresponding author. Tel.: +39 075 5014 413; fax: +39 075 5014 420. E-mail address: [email protected] (F. Guzzetti). 0169-555X/$ - see front matter © 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.geomorph.2006.04.007 susceptibility assessments can be used to predict the geographical location of future landslides (Chung and Fabbri, 1999; Guzzetti et al., 2005). Many methods have been proposed to evaluate landslide susceptibility at the basin scale, including direct geomorphological mapping, heuristic approaches, statistical classification methods and physically based models (Carrara et al., 1995; Soeters and van Westen, 1996; Chung and Fabbri, 1999; Guzzetti et al., 1999, and references therein). Statistical classification methods are particularly suited to determining landslide susceptibility over large and complex areas (e.g., Cardinali et al., 2002). Such methods provide quantitative estimates of “where” F. Guzzetti et al. / Geomorphology 81 (2006) 166–184 landslides are expected, based on detailed information on the distribution of past landslides and a set of thematic environmental information. The former becomes the dependent variable and the latter the independent variables for the statistical modelling. Landslide susceptibility does not forecast “when” or “how frequently” a landslide will occur or how large and destructive the slope failure will be (Guzzetti et al., 2005). In recent years, many landslide susceptibility assessments (often referred to as hazard assessments) have been published. We counted at least 40 papers in major international journals in the 6-year period from 2000 to 2005 discussing landslide susceptibility. The majority of the published papers present a statistically based susceptibility model and discuss the data and the method used to prepare the model, but provide little or no information on the quality of the proposed model. This is a limitation of much previous research (Chung and Fabbri, 2003), including some of our own work (e.g., Carrara et al., 1995; Guzzetti et al., 1999; Cardinali et al., 2002). Any attempt to ascertain landslide susceptibility in a region needs proper validation. Validation should establish the quality (i.e., reliability, robustness, degree of fitting and prediction skill) of the proposed susceptibility estimate. The quality of a landslide susceptibility model can be ascertained using the same landslide data used to obtain the susceptibility estimate, or by using independent landslide information not available to construct the model. The former allows for (i) evaluating the degree of match between the predicted susceptibility levels in a given region, and the distribution and abundance of known landslides in the same region; (ii) evaluating the role of the thematic information in constructing the model; (iii) assessing the ability of the model to cope with variations in the input data; and (iv) determining the error associated with the obtained susceptibility estimate. The latter allows for determining the prediction skill of the model to forecast the location of new or reactivated landslides (Chung and Fabbri, 2003; Guzzetti et al., 2005). In this paper, we provide a comprehensive validation of a landslide susceptibility model prepared through discriminant analysis of thematic information for the Collazzone area in central Umbria (Fig. 1). The landslide susceptibility model is first presented. A set of tests is then performed, aimed at evaluating the quality and robustness of the model. We further test the ability of the model to predict new landslides by comparing the susceptibility estimate against the distribution of slope failures that occurred after the model was prepared. Results obtained are discussed, and 167 Fig. 1. Location of the Collazzone study area in Umbria, central Italy. Shaded relief image shows the hilly morphology of the area. a general framework is proposed for evaluating and ranking the quality of a statistically based landslide susceptibility model. 2. The study area The Collazzone area extends for 78.9 km2 in central Umbria, Italy (Fig. 1). Elevation in the area ranges from 145 m to 634 m above sea level, with an average value of 273 m (standard deviation = 96.1 m). Terrain gradient computed from a 10 m × 10 m DTM ranges from 0° to 63.7° degree, with a mean value of 9.9° and a standard deviation of 6.4°. In the area the terrain is hilly, valleys are asymmetrical, and the lithology and attitude of bedding control the morphology of the slopes. Gravel, sand, clay, travertine, layered sandstone and marl, and thinly layered limestone, Lias to Holocene in age, crop out in the area. Soils range in thickness from a few decimetres to more than 1 m; they have a fine or medium texture and exhibit a xenic moisture regime, typical of the Mediterranean climate. Precipitation is most abundant in October and November; with a mean annual rainfall in the period from 1921 to 2001 of 884 mm. Snow falls on the area on average every 2–3 years. Landslides are abundant in the area, and range in age, type, morphology and volume from very old, partly eroded, large and deep-seated slides to young, shallow slides and flows. Slope failures are triggered chiefly by 168 F. Guzzetti et al. / Geomorphology 81 (2006) 166–184 meteorological events, including intense and prolonged rainfall and rapid snow melt. 3. Landslide susceptibility model For the Collazzone area, we prepared a landslide susceptibility model using discriminant analysis of 46 environmental thematic variables, including morphology, hydrology, lithology, structure, bedding attitude and land use. To determine landslide susceptibility, we subdivided the study area into mapping units, i.e., portions of the ground that contain a set of conditions that differ from the adjacent units across distinct boundaries (Guzzetti et al., 1999). To establish the terrain subdivisions, we adopted the approach proposed by Carrara et al. (1991), which has proven to be reliable in predicting landslide susceptibility in Umbria (Carrara et al., 1991, 1995, 1999; Guzzetti et al., 1999; Cardinali et al., 2002). We obtained the terrain subdivision using specialised software that, starting from a 10 m × 10 m DTM, generated drainage and divide lines. The DTM was prepared by automatic interpolation of 10 and 5-m interval contour lines obtained from 1:10,000 scale topographic maps. By combining the drainage and divide lines, the software automatically identified 894 elementary slopes units (i.e., sub-basins) which represent the mapping units of reference for the determination of landslide susceptibility in the Collazzone area. The procedure also calculated a number of morphological and hydrological parameters: some correspond to those acquired by traditional methods (e.g., channel length, stream order, link length, etc.); others were specifically designed to model the spatial distribution of landslides (e.g., slope unit area, slope unit terrain gradient, slope unit aspect, slope unit terrain roughness, etc.) (Carrara et al., 1991, 1995). As the dependent variable for the statistical analysis, we used the presence or absence of shallow landslides that occurred before 1997 (Table 1, Fig. 2) in the 894 mapping units into which the study area was partitioned. The distribution of landslides was obtained from a detailed, multi-temporal landslide inventory map prepared at 1:10,000 scale. The landslide inventory was prepared through the systematic interpretation of five sets of aerial photographs (Table 2), supplemented by field surveys. Two geomorphologists carried out the interpretation of the aerial photographs in the 5-month period from July to November 2002. The investigators looked at each pair of aerial photographs using a mirror stereoscope (4× magnification) and a continue-zoom stereoscope (3× to 20× magnification). Field surveys carried out in the period from 1998 to 2004 were conducted to map new landslides triggered by intense or prolonged rainfall, and to test the inventory obtained through photo-interpretation. The field surveys allowed mapping 230 new or reactivated landslides, most of which occurred in the period from October to December 2004. The field surveys also revealed that shallow landslides are uncommon in forested terrain that covers 23.9% of the study area. In the multi-temporal inventory, landslides were classified according to the type of movement, and the Table 1 Descriptive statistics of landslide data sets for the Collazzone study area Inventory A B C D Multi-temporal landslide inventory prepared through the interpretation of five sets of aerial photographs (Table 2). Landslides are older than 1941 to December 2004. Subset of the multi-temporal inventory showing shallow landslides and used to prepare the susceptibility model shown in Fig. 3. Landslides are older than 1941 to 1996. Snowmelt induced landslides occurred in January 1997 (Fig. 10A). Rainfall-induced landslides occurred in autumn 2004 (Fig. 10B). Type Number Area Total (km2) Percent (%) Minimum (m2) Maximum (m2) All landslides Deep-seated landslides Shallow landslides 2760 363 2397 12.51 7.70 6.53 15.8 9.76 8.28 51 3815 51 173,518 173,518 64,691 Shallow landslides 1759 5.77 7.31 103 43,204 413 7 406 153 1 152 0.78 0.14 0.64 0.38 0.05 0.33 0.98 0.17 0.81 0.48 0.06 0.42 78 10,199 78 51 47,884 51 44,335 44,335 9882 47,884 47,884 12,098 All landslides Deep-seated landslides Shallow landslides All landslides Deep-seated landslides Shallow landslides 169 F. Guzzetti et al. / Geomorphology 81 (2006) 166–184 Fig. 2. Multi-temporal landslide inventory map showing shallow landslides (see Table 1). Original map scale 1:10,000. estimated age, activity, depth, and velocity. Landslide type was defined according to Cruden and Varnes (1996). Landslide age, activity, depth, and velocity were determined based on the type of movement, the morphological characteristics and appearance of the landslides on the aerial photographs, the local lithological and structural setting, and the date of the aerial photographs. Landslide age was defined as recent, old or very old, despite ambiguity in the definition of the age of a mass movement based on its appearance (McCalpin, 1984). Overall, the multi-temporal inventory map shows 2760 landslides (Table 1A). The subset of shallow landslides used to prepare the susceptibility model includes 1759 landslides, covering 5.77 km2 of the study area (Table 2B). To account for possible cartographic and drafting errors in the production of the multi-temporal inventory map (e.g., landslides erroneously mapped as crossing a divide or a stream line), we established an empirical threshold to decide if a mapping unit contained or was free of landslides. Slope units having less than 2% of the area covered by shallow slope failures were considered free of landslides, whilst slope units having 2% or more of their area covered were considered as containing landslides. Independent variables used in the statistical analysis of the shallow landslides included morphological, hydrological, lithological, structural and land-use information. We obtained 26 variables describing hydrology and morphology from the same DTM used to perform the subdivision of the study area into slope units. Hydrological variables included slope unit drainage channel length, gradient, order and magnitude, and slope unit area and upstream contributing area. Morphological variables included slope unit mean elevation, standard deviation of elevation, mean length, mean terrain gradient and standard deviation of terrain gradient, slope unit aspect (in six classes), slope unit terrain roughness, and mean terrain gradient for the upper, intermediate and lower portions of the slope unit. From the latter three statistics, derivative variables describing the shape of the slope unit profile (concave, convex, irregular, etc.) were obtained. Since most of the morphological variables describe average terrain conditions in a slope unit, local testing of the variables in the field was problematic and was not performed. We compiled lithological and structural data, including the attitude of bedding, through detailed lithological and structural mapping at 1:10,000 scale. The lithological map did not show the distribution and thickness of the soils or the colluvial deposits. We obtained information on land use from a map compiled in 1977 by the Umbria Regional Government, largely revised and updated by interpreting the most recent aerial photographs (Table 2). To determine landslide susceptibility we adopted discriminant analysis, a multivariate technique introduced by Fisher (1936) to classify samples into alternative groups on the basis of a set of measurements (Michie et al., 1994; Brown, 1998; SPSS, 2004). More precisely, the goal of discriminant analysis is to classify cases into one of several mutually exclusive groups based on their values for a set of predictor variables. The grouping variable must be categorical and the predictor variables can be interval or dichotomous. For landslide susceptibility assessment most commonly two groups are established, namely, (i) mapping units free of landslides (G0, stable slopes); and (ii) mapping units having landslides (G1, unstable slopes). The assumption Table 2 Aerial photographs used to prepare the multi-temporal landslide inventory map (1 to 4) (Fig. 2) and the inventory of snowmelt induced landslides (5) (Fig. 10A) for the Collazzone area ID Year Period Type Nominal scale 1 2 3 4 5 1941 1954 1977 1985 1997 Summer Spring–Summer June July April Panchromatic Panchromatic Colour Panchromatic Panchromatic 1:18,000 1:33,000 1:13,000 1:15,000 1:20,000 170 F. Guzzetti et al. / Geomorphology 81 (2006) 166–184 is made that the two groups are distinct, and that a mapping unit r pertains only to one group. In the context of landslide susceptibility, the scope of discriminant analysis is to determine the group membership of a mapping unit by finding a linear combination of the environmental variables which maximizes the differences between the populations of stable and unstable slopes with minimal error. To obtain this, consider a set of m environmental variables ν1, ν2, …, νm for each mapping unit, r, by means of which it is desired to discriminate the region between the groups of stable (G0) and unstable (G1) slopes, and let Z be a linear combination of the input variables, such as Z = β1ν1(r) + β2ν2(r) + … + βmνm(r). For discriminant analysis, the task is to determine the β coefficients by means of some criterion that will enable Z to serve as an index for differentiating between members of the two groups. The linear discriminant function Z transforms the original sets of measurements into a discriminant score, which represents the sample position along a line defined by the same discriminant function. To measure how far apart the two groups are along this line, different “distances” can be used (e.g., Euclidean, diagonal or Mahalonobis distances; Michie et al., 1994; Gorsevski et al., 2003). A larger “distance” indicates that it is easy to discriminate between the two groups. Posterior probabilities are then used to express the likelihood of a mapping unit belonging to one group or the other (Brown, 1998). Thus, when probabilities are derived from a discriminant analysis, they represent the likelihood of a mapping unit pertaining to one of the two groups established a priori. The relative contribution of each independent environmental variable to the discriminating function can be evaluated by studying the standardized discriminant function coefficients (SDFC). The SDFC show the relative importance (i.e., the “weight”) of each variable as a predictor of slope instability. Variables with large coefficients (in absolute value) are more strongly associated with the presence or the absence of landslides. The sign of the coefficient indicates whether the variable is positively or negatively correlated to instability within a mapping unit. Using the landslide and environmental data available for the Collazzone area, a discriminant function automatically selected 16 (out of 46) variables as the best predictors of the presence (or absence) of landslides in the 894 slope units in which the study area was partitioned. A step-wise procedure was used. The procedure entered and removed variables in a stepwise fashion, based on a minimum tolerance value of 0.001; the tolerance of a variable candidate for inclusion in the analysis was the proportion of the within-groups variance not accounted for by other variables in the analysis (SPSS, 2004, p. 522). Table 3 lists the 16 selected environmental variables and the associated standardized discriminant function coefficients. For convenience, a threshold of |0.250| for the SDFC was selected to outline the variables more strongly associated with the presence (negative SDFC) or the absence (positive SDFC) of landslides. This threshold is heuristic and its significance for landslide susceptibility must be verified. Fig. 3 portrays the results of the landslide susceptibility model for shallow landslides. The map expresses the probability that each slope unit contains shallow landslides in the multi-temporal inventory map shown in Fig. 2. If a slope unit has a high probability of containing a known shallow landslide, the same mapping unit is classified as landslide prone. On the contrary, if a slope unit has a low probability of having known shallow landslides, the mapping unit is considered stable. Intermediate values of probability indicate the inability of the model to classify the mapping unit with the available thematic information (80 mapping units, 8.95%), and not necessarily conditions of marginal or intermediate stability. The first question to ask when a landslide susceptibility model is prepared through a statistical classification technique is “how well has the model performed in classifying the mapping units?” This involves determining the degree of model fit. A straightforward way of testing model fit consists of counting the number of cases (i.e., the number of mapping units) correctly Table 3 Variables selected by a stepwise discriminant function as the best predictors of landslide occurrence (Fig. 3) Variable description Variable SDFC Slope unit mean terrain gradient Slope unit elevation standard deviation Slope unit length Slope unit terrain gradient (upper portion) Cultivated area Bedding dipping out of the slope Slope unit with convex slope (downslope profile) Travertine Slope unit facing S-SE Slope unit drainage channel order Recent alluvial deposit Gravel and coarse continental sediments Slope unit terrain gradient standard deviation Marl Downslope concave profile Limestone SLO_ANG ELV_STD SLO_LEN ANGLE3 SS FRA CONV −0.398 −0.370 −0.287 −0.282 −0.276 −0.241 −0.135 TRAVERTI TR2 ORDER ALLUVIO GHIAIA ANG_STD MARNE CC CARBO 0.105 0.133 0.140 0.144 0.179 0.219 0.285 0.303 0.833 Values in boldface are variables more strongly associated with the presence (negative SDFC) or the absence (positive SDFC) of landslides. 171 F. Guzzetti et al. / Geomorphology 81 (2006) 166–184 Fig. 3. Map showing spatial probability of shallow landslide occurrence (landslide susceptibility). Study area subdivided into 894 slope units. Square bracket, class limit included; round bracket, probability class limit not included. See Tables 3 and 4 for model classification results. classified by the model. Table 4 shows the results for the model shown in Fig. 3. The susceptibility model correctly classifies 688 of the 894 mapping units in which the study area was partitioned. The figure represents a measure of the “overall goodness of fit” of the model. Of the 206 misclassified cases, 121 are mapping units free of landslides that were classified as “unstable” by the model, and 85 are mapping units that showed landslides in the inventory map and were attributed to the “stable” group by the model. The former may be the result of errors in the inventory map (e.g., unrecognized landslides, or landslides removed or concealed by erosion, land use changes, ploughing or other human actions). The latter are mapping units that have environmental conditions typical of stable slopes, and where landslides took place owing to local conditions not accounted for by the model (e.g., local structural conditions, particularly thick soil, local land use or surface drainage modifications). Further inspection of Table 4 reveals that the susceptibility model is more efficient in correctly classifying mapping units that have landslides (84.1%), and less efficient in classifying mapping units free of slope failures (66.4%). We attribute the difference to the larger number of mapping units with shallow landslides, when compared to the mapping units free of slope failures. Indeed, in the study area 534 mapping units (59.7%) have shallow landslides and 360 mapping units (40.3%) are free of shallow landslides (Fig. 2). An alternative way of measuring the reliability of a susceptibility model–in terms of its ability to classify known landslides–involves the use of Cohen's Kappa index (Cohen, 1960; Hoehler, 2000). To compute this index we prepared Table 5 that shows the proportion (i.e., the observed probability) of mapping units in each of the four classification cases listed in Table 4. Table 5 also shows the marginal totals obtained by summing the proportions along the table rows and columns. In Table 5 values in parentheses represent the expected proportion on the basis of chance associations, i.e., the joint probability of the marginal proportions. The Kappa index is obtained as: j¼ PC PE 1 PC lVxV1 ð1Þ where, PC is the proportion of mapping units correctly classified as stable or unstable, and PE is the proportion of mapping units for which the agreement is expected by chance. In this case, κ = 0.513. Landis and Kock (1977) have suggested that for 0.41 ≤ κ ≤ 0.60, the strength of Table 4 Comparison between slope units classified as stable or unstable by the statistical model (Fig. 3) and slope units free of and containing shallow landslides in the multi-temporal inventory map shown in Fig. 2 Predicted groups (model) Group 0 Actual Groups (inventory) Group 0 slope units free of shallow landslides in inventory map Group 1 slope units containing shallow landslides in inventory map Total number of slope units = 894. Overall percentage of slope units correctly classified = 77.0%, 688 slope units. Group 1 Stable slope units Unstable slope units 239 (66.4%) 121 (33.6%) 360 (100%) 85 (15.9%) 449 (84.1%) 534 (100%) 172 F. Guzzetti et al. / Geomorphology 81 (2006) 166–184 Table 5 Comparison between the proportion of slope units classified as stable or unstable by the susceptibility model (Fig. 3) and the proportion of slope units free of and containing shallow landslides in the multi-temporal inventory map (Fig. 2) Predicted groups (model) Group 0 Actual Groups (inventory) Group 0 slope units free of shallow landslides in inventory map Group 1 slope units containing shallow landslides in inventory map Marginal totals Group 1 Marginal totals Stable slope units Unstable slope units 0.267 (0.146) 0.135 (0.257) 0.403 0.095 (0.216) 0.502 (0.381) 0.597 0.362 0.638 1.000 κ = 0.513, moderate agreement. the agreement between the observed and the predicted values is moderate. In addition to Cohen's Kappa index, other indexes can be computed to measure the performance of a statistical classification. Most commonly, the indexes are obtained from figures listed in a contingency table similar to Table 5. Table 6 shows 13 statistical indexes obtained from Table 4. For a discussion of the significance and properties of the individual performance indexes listed in Table 6, see Mason (2003) and references therein. Tables 4 and 5 provide a combined estimate of model fit, but do not provide a detailed description of the model performance of the different susceptibility classes (Chung and Fabbri, 1999, 2003). To determine this, we compare the total area of known landslides in each susceptibility class with the percentage area of the susceptibility class. Fig. 4 shows the percentage of the study area ranked from most to least susceptible (x-axis) against the cumulative percentage of landslide area in each susceptibility class (y-axis). Most of the landslides shown in the multi-temporal inventory (Fig. 2) are in areas classified as susceptible by the model, and only 6.4% of the slope failures are in areas classified as not or weakly susceptible (probability ≤ 0.45) by the model. The latter is in agreement with the reduced number of mapping units (85, 9.51%) having landslides and erroneously attributed to the “stable” group by the model (Table 4). Fig. 4 provides a quantitative indication of the ability of the susceptibility model to match (“fit”) the known distribution of shallow landslides in the Collazzone area (Fig. 2). 4. Uncertainty in the landslide susceptibility model Any landslide susceptibility prediction has a level of uncertainty. Sources of uncertainty include (i) errors and incompleteness in the landslide and thematic Table 6 Statistical indexes measuring the performance of the susceptibility model shown in Fig. 3 Index Value Range Hit rate, or Sensitivity False alarm rate Specificity False alarm ratio Positive predictive value Negative predictive value Proportion correct Proportion correct by chance Cohen's Kappa coefficient Heidke skill score Peirce's skill score Critical success index Yule's Q 0.664 0.159 0.841 0.262 0.738 0.788 0.770 0.527 0.513 0.107 0.505 0.537 0.825 [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [− ∞,1] [− 1,1] [− 1,1] [0,1] [− 1,1] Values computed using figures listed in Table 4. Fig. 4. Analysis of fitting performance of landslide susceptibility model shown in Fig. 3. F. Guzzetti et al. / Geomorphology 81 (2006) 166–184 information available to complete the analysis; (ii) an imperfect understanding of landslide processes and their geographical and temporal evolution; (iii) limitations in the techniques used to determine the susceptibility; and (iv) the inherent natural variability of the landslide phenomena (Carrara et al., 1992, 1999; Ardizzone et al., 2002). Determining the errors associated with the geomorphological, geological and other thematic information is no trivial task. Improving the understanding of the landslide processes is feasible, but requires time and resources often not available to landslide investigators. The characteristics of the methods used to ascertain landslide susceptibility are known (Carrara et al., 1995; Soeters and van Westen, 1996; Chung and Fabbri, 1999; Guzzetti et al., 1999), but their limitations and drawbacks when applied to specific areas, data sets, and landslide types remain poorly investigated. Despite these problems, we argue that determining the errors associated with a landslide susceptibility assessment is of primary importance. Different types of uncertainty contribute to the model error, including: (i) uncertainty in the model classification due to the type, abundance and reliability of the available thematic information; (ii) uncertainty in the classification of individual mapping units; and (iii) uncertainty in the ability of the model to predict future landslides (prediction skill). In the following, we propose a framework to test our susceptibility model. Tests aim at (i) investigating the role of the thematic information in the production of the susceptibility model; (ii) determining the model sensitivity and robustness to variations in the input data; (iii) determining the error associated with the susceptibility prediction obtained for each mapping unit; and (iv) testing the model prediction against independent landslide information. 5. Analysis of model reliability 5.1. Construction of an ensemble of landslide susceptibility models To determine the reliability of the landslide susceptibility assessment shown in Fig. 3, we prepared an ensemble of landslide susceptibility models. The ensemble contains 350 different susceptibility models obtained from the same set of 46 independent thematic variables and the same multi-temporal landslide map (Fig. 2) but using a different number of terrain units, from 268 (30%) to 849 (95% of the 894) units. To obtain the ensemble we adopted the following strategy. First, a subset containing 30% of the mapping units (268 units) was obtained by 173 random selection from the entire set of 894 mapping units. The random selection was repeated 50 times, obtaining a group of 50 different subsets, each containing 268 mapping units. This collection of 50 subsets of mapping units became group G30 for the analysis (30% selected mapping units). The selection process was repeated, changing the number of the selected units heuristically. We obtained collections with 45%, 55%, 65%, 75%, 85%, and 95% mapping units, respectively. These collections, each listing 50 subsets of mapping units, became groups G45, G55, G65, G75, G85 and G95. Overall, the ensemble contains 350 subsets of mapping units, i.e., 7 groups (from G30 to G95) each containing 50 subsets. Landslide susceptibility models were prepared for each subset of the ensemble, obtaining 350 different susceptibility models, i.e., 350 different forecasts of landslide susceptibility for the Collazzone area. The large number of susceptibility forecasts was exploited to study the errors associated with the landslide susceptibility model shown in Fig. 3. 5.2. Role of the independent thematic variables The role of the 46 independent thematic variables used to construct the landslide susceptibility model is first considered. For this purpose, group G85 is used. For this group, Table 7 lists the number and the percentage of the 50 models that selected (or did not select) the 46 variables, and whether the variables were selected as predictors of slope stability (S) or of slope instability (I). Inspection of Table 7 reveals that of the 46 considered variables, 38 (82.6%) were selected in at least one of the 50 models encompassing G85, and 8 (17.4%) variables were never selected as predictors of landslide occurrence. Of the 38 selected variables, 15 (39.5%) were selected by 25 or more models, and 7 (18.4%) were selected by 45 or more models. The 50 stepwise discriminant functions constructed from G85 selected from as few as 11 to as many as 18 variables (mode 14 variables). All the selected variables, with the exception of drainage magnitude (MAGN), were either always selected as negatively (I, in Table 7) or always selected as positively (S, in Table 7) in association with the presence of landslides. We take this as an indication of the consistency of the role of the thematic variables in explaining the known distribution of landslides, which contributes to the reliability of the susceptibility model. Inspection of Table 7 further indicates that more than 75% of the prepared models used the same set of 10 variables. These variables included: four variables describing morphology (ELV_STD, ANG_STD, SLO_ 174 F. Guzzetti et al. / Geomorphology 81 (2006) 166–184 Table 7 Independent thematic variables selected, or not selected, by 50 discriminant functions as the best predictors of landslide occurrence Variables SDFC Susceptibility models # Slope unit elevation standard deviation Limestone Bedding dipping out of the slope Gravel and coarse continental sediments Marl Slope unit terrain gradient standard deviation Slope unit length Slope unit mean terrain gradient Cultivated area Slope unit facing S-SE Downslope concave slope Slope unit drainage channel order Recent alluvial deposit Slope unit with convex slope (downslope profile) Sandstone Travertine Slope unit terrain gradient (upper portion) Forested area Slope unit area Slope unit drainage channel length Slope unit surface roughness index Slope unit slope (lower portion) Slope unit mean elevation Concave profile downslope Drainage channel mean slope Continental deposit Sand Slope unit drainage channel magnitude Urban area Bedding dipping into the slope Bedding dipping across the slope Slope unit facing N-NE Standard deviation of terrain unit length Slope unit with convex–concave slope (downslope profile) Slope unit with irregular slope (downslope profile) Clay Cultivated area with trees Vineyards Drainage basins total area upstream the slope unit Slope unit terrain gradient (intermediate portion) Concave–convex profile downslope Slope unit rectilinear profile Fruits trees Pasture Slope unit facing S-SW Deposit of ancient landslide ELV_STD CARBO FRA GHIAIA MARNE ANG_STD SLO_LEN SLO_ANG SS TR2 CC ORDER ALLUVIO CONV AREN TRAVERTI ANGLE3 BOSCO SLO_ARE LINK_LEN R ANGLE1 ELV_M CONC LNK_ANG CONTI SABBIA MAGN URB REG TRA TR1 LEN_STD COC_COV IRR ARGILLA SA VIG AREAT_K ANGLE2 COV_COC RET FRUTT PASCOLO TR3 FRA_OLD Predictor % 50 100 I 50 100 S 49 98 I 47 94 S 47 94 S 45 90 S 45 90 I 41 82 I 40 80 I 38 76 S 33 66 S 30 60 S 30 60 S 27 54 I 25 50 S 0.105 23 46 S −0.282 21 42 I 21 42 S 13 26 I 10 20 I 10 20 I 5 10 I 4 8 I 4 8 I 3 6 S 3 6 I 3 6 I 2 4 I/S 2 4 S 2 4 S 2 4 I 2 4 I 1 2 S 1 2 S 1 2 S 1 2 I 1 2 I 1 2 S Variables were never selected as predictors of landslide occurrence −0.370 0.833 −0.241 0.179 0.285 0.219 −0.287 −0.398 −0.276 0.133 0.303 0.134 0.144 −0.135 Group G85 used for the analysis. LEN, SLO_ANG), three variables describing lithology (CARBO, GHIAIA, MARNE), one variable for the attitude of bedding (FRA), one variable describing slope aspect (TR2), and one variable describing a land use type (SS). The 10 variables are also present in Table 3. Comparison of Tables 3 and 7 reveals that, with the exception of AREN, all the 16 variables selected to construct the susceptibility model shown in Fig. 3 are listed in Table 7 as the most selected variables. We take this as further indication of the ability of the selected variables to explain the known distribution of landslides. F. Guzzetti et al. / Geomorphology 81 (2006) 166–184 5.3. Model sensitivity The sensitivity of the susceptibility model to changes in the input data is then investigated. In general, results of a robust (least sensitive) statistical model should not change significantly if the input data are changed within a reasonable range. To investigate the sensitivity of the susceptibility model to changes in the input data, we use the entire ensemble of models, and we study the variation in the overall percentage of mapping units correctly classified by the 350 models. Three cases are considered: (i) mapping units selected by the adopted random selection procedure and classified by the discriminant functions (selected units, i.e., “training” or “modelling” set, Fig. 5A); (ii) mapping units not selected by the random selection procedure and 175 classified by the discriminant functions constructed on the corresponding subset of selected units (non-selected units, i.e., “classification” or “validation” set, Fig. 5B); and (iii) all mapping units, irrespective of the fact that they pertained to the selected (training) or the nonselected (classification) sets (Fig. 5C). In Fig. 5A, the box plots show that an increase in the number of the selected mapping units results in a slight decrease of the median value (50th percentile) of the model fit, and in a significant decrease of the variability (10th to 90th percentile range) of the model fit. This was expected. Given the large number of the available thematic variables (46), a reduced number of cases (268 mapping units for G30) allows for a (apparently) better model classification (mean = 78.36% for G30) at the expense of model variability, which is large (std. dev. = 2.59% for G30). Further inspection of Fig. 5A indicates that a reduction in the percentage of mapping units correctly classified, and in the corresponding scatter in the susceptibility estimates, become negligible for percentages of the considered mapping units exceeding 75%. Thus, susceptibility models obtained using 75% or more of mapping units do not differ significantly–in terms of the number of correctly classified units–from the model obtained using the entire set of 894 terrain units. We take this as indication of the model ability to cope with significant (up to 25%) random variation in the input data. Fig. 5B provides similar results for the non-selected subsets. The overall model fit and its scatter increase with a decreasing number of non-selected units. Comparison of Fig. 5A and B reveals that models prepared using the selected units result in a better classification (i.e., in a larger model classification) when compared to the models obtained using the nonselected units. This was also expected. Any statistical classification provides better results on the training set, and performs less efficiently when applied to the validation set (Michie et al., 1994). Fig. 5C shows the result for the collection of the selected (training) and the non-selected (validation) subsets. The box plots show the cumulative effect of the mapping units correctly classified in the training and in the validation sets. For this reason, the scatter around the median is reduced, particularly for percentages of mapping units exceeding 75%. 5.4. Uncertainty in the susceptibility estimate of individual mapping units Fig. 5. Sensitivity analysis for landslide susceptibility model shown in Fig. 3. (A) Training set. (B) Validation set. (C) All mapping units. Numbers of elements in each group are shown. The adopted approach to ascertain landslide susceptibility provides a unique value for the probability of 176 F. Guzzetti et al. / Geomorphology 81 (2006) 166–184 spatial occurrence of known landslides (i.e., of landslide susceptibility) for each mapping unit (Fig. 3). The approach does not provide a measure of the error associated with the probability estimate. This is a limitation. To obtain an assessment of the error associated with the susceptibility assessment we use group G85. This group is selected as a compromise between model sensitivity and a sufficiently reduced number of selected mapping units to account for model variability. For each mapping unit, Fig. 6 shows the comparison between the mean value of the 50 probability estimates obtained using group G85 (x-axis) and the single probability estimate obtained for the model shown in Fig. 3 (y-axes), prepared using the entire set of 894 slope units. The correlation between the two estimates of landslide susceptibility is very high. We take this as indication that the two classifications are virtually identical. Based on this result, Fig. 7 relates, for the 894 mapping units, the probability estimate of landslide spatial occurrence (x-axis), ranked from low (left) to high (right) values, to the variation of the model estimate (y-axis), measured by 2 standard deviations (2σ) of the obtained probability estimate. The measure of 2σ is low (< 0.05) for mapping units classified as highly susceptible (probability > 0.80) and as largely stable (probability < 0.20). The scatter in the model estimate is larger for intermediate values of the probability (i.e., for probability values between 0.45 and 0.55), indicating that for Fig. 6. For 894 mapping units in the Collazzone area, comparison of the mean value for 50 probability estimates obtained from group G85 (x-axis), and the single probability value obtained for the susceptibility model shown in Fig. 3 (y-axis). Fig. 7. Landslide susceptibility model error. For 894 mapping units in the Collazzone area, the graph shows the mean value of 50 probability estimates (x-axis) against two standard deviations (2σ) of the probability estimate (y-axis). Statistics obtained from group G85. Black line shows estimated model error obtained by linear regression fit (least square method). these mapping units not only is the model incapable of satisfactorily classifying the terrain as stable or unstable, but also that the obtained estimate is highly variable, and hence, unreliable. Fig. 7 indicates that the variation in the model estimate can be approximated by the Fig. 8. Map showing estimated model error (2σ) for the landslide susceptibility model shown in Fig. 3. Model error computed using Eq. (2). Square bracket, class limit included; round bracket, class limit not included. Larger values indicate increased uncertainty in the probabilistic estimate of landslide susceptibility. F. Guzzetti et al. / Geomorphology 81 (2006) 166–184 following quadratic equation, obtained by linear regression fit (least square method): y¼ 0:309x2 þ 0:308x 0VxV1 ðr2 ¼ 0:605Þ ð2Þ where, x is the estimated value of the probability of pertaining to an unstable mapping unit (i.e., the landslide susceptibility estimate), and y is 2σ of the model estimate. We consider the value of 2 standard deviations of the model estimate a proxy for the model error. We use Eq. (2) to estimate quantitatively the model error for each mapping unit, based on the computed probability estimate. For each mapping unit, Fig. 8 shows the error associated with the probability estimate (i.e., to landslide susceptibility), computed using the quadratic Eq. (2). Fig. 8 provides a quantitative measure of the error associated with the quantitative landslide susceptibility assessment shown in Fig. 3. To investigate further the relationship between the predicted probability of spatial landslide occurrence and its variation (error), the 894 mapping units were ranked according to the mean value of the computed probability estimates obtained from group G85. In Fig. 9, the rank of the mapping unit (x-axis) was plotted against statistics of the probability estimates (y-axis). The thick line shows the average value of the landslide susceptibility estimates, and the thin lines show ± 2σ of the estimate. The measure of 2 standard deviations, a proxy for model error, varies with the predicted probability of spatial landslide occurrence. The variation is small for mapping units predicted as highly unstable, increases to a maximum value towards the centre of the graph where unclassified mapping units are shown, and decreases again for mapping units predicted as highly stable. Fig. 9. Variation in the estimate of landslide susceptibility. For 894 slope units, ranked from low (left) to high (right) susceptibility values (x-axis), the graph shows the probability of the spatial occurrence of landslides (y-axis). Probability estimates obtained from group G85. 177 6. Analysis of model prediction skill The tests described in the previous section were aimed at determining the (statistical) reliability and robustness of the susceptibility model, and at estimating the error associated with the quantitative forecast. We performed all tests using the same landslide information used to construct the susceptibility model, i.e., the multitemporal shallow landslide inventory map (Fig. 2, Table 1B). The performed tests do not provide insight on the ability of the susceptibility model to predict the occurrence of new or reactivated (i.e., “future”) landslides, which is the goal of any susceptibility assessment (Chung and Fabbri, 1999; Guzzetti et al., 1999; Chung and Fabbri, 2003; Guzzetti et al., 2005). To evaluate the ability of the susceptibility model to predict future landslides, we exploit the spatial distribution of shallow slope failures obtained from two recent landslide event inventories. The first inventory shows 413 landslides triggered by rapid snowmelt in January 1997 (Fig. 10A, Table 1C). Landslides shown in this inventory were mapped at 1:10,000 scale through field surveys and the interpretation of 1:20,000 scale aerial photographs flown 4 months after the event (Cardinali et al., 2000; Guzzetti et al., 2003). The second event inventory shows 153 landslides triggered by heavy rainfall in the period from October to December 2004 (Fig. 10B, Table 1D). The rainfall-induced landslides were mapped directly in the field at 1:10,000 scale. Using the two recent event inventories, three tests are performed to determine the ability of the susceptibility model to predict future landslides. The first test consists of computing the proportion of the event's landslide area in each susceptibility class, and showing the results using cumulative statistics. Fig. 11 shows the percentage of the study area, ranked from most to least susceptible (x-axis), against the cumulative percentage of the area of the triggered landslides in each susceptibility class (yaxis), for the snowmelt-induced landslides in January 1997 (dashed line), and for the rainfall-induced landslides in autumn 2004 (dotted line). Statistics given in Fig. 11 provide a quantitative estimate of the model prediction skill. The forecasting performance of the susceptibility model is better for the 1997 snowmelt-induced landslides, and slightly poorer for the 2004 rainfall-induced landslides. We attribute the difference to the larger number of snowmelt-induced landslides, a function of the different severity of the triggering events. In the study area, rapid snowmelt in January 1997 was a more severe trigger of landslides than the autumn 2004 178 F. Guzzetti et al. / Geomorphology 81 (2006) 166–184 Fig. 10. Recent landslide event inventory maps. (A) 406 shallow landslides triggered by rapid snowmelt in January 1997. (B) 152 shallow landslides triggered by heavy rainfall in the period from October to December 2004. Original maps at 1:10,000 scale. Fig. 11. Analysis of the prediction skill of landslide susceptibility model shown in Fig. 3. Dashed line, shallow landslides triggered by rapid snowmelt in January 1997 (Fig. 10A). Dotted line, shallow landslides triggered by heavy rainfall in autumn 2004 (Fig. 10B). Continuous line, model fitting performance (Fig. 4). rainfall period. Fig. 11 shows that the prediction performance is similar to (for rainfall-induced landslides) or even higher (for snow melt-induced landslides) than the model fitting performance (Fig. 4, and thin line in Fig. 11). This is surprising, because a model fitting performance is usually higher than its prediction skill (Chung and Fabbri, 2003; Guzzetti et al., 2005). The remaining two tests explore further the relationship between the predicted susceptibility classes and the distribution and abundance of the triggered landslides. Fig. 12A shows that 60.7% of the snowmelt-induced landslide areas in January 1997, and 48.3% of the rainfall-induced landslide areas in autumn 2004 occurred in mapping units classified as highly unstable (probability > 0.80). Further, 88.7% of the snowmeltinduced landslide areas, and 75.1% of the rainfallinduced landslide areas occurred in unstable or highly unstable slope units (probability > 0.55). Conversely, only 2.5% of the snowmelt-induced landslide areas and only 4.2% of the rainfall-induced landslide areas were found in mapping units classified as highly stable (probability ≤ 0.20). Fig. 12B shows similar results, but considers the number of triggered landslides. To obtain these statistics in the GIS, we established the central F. Guzzetti et al. / Geomorphology 81 (2006) 166–184 179 a landslide susceptibility model (Chung and Fabbri, 2003); and (iii) a scheme designed to evaluate (Fig. 7) and map (Fig. 8) the error associated with the susceptibility estimate obtained for individual mapping units. The latter is an improvement over existing modelling approaches, which is especially relevant when landslide susceptibility assessments are used for planning purposes (Guzzetti et al., 2000) or to determine landslide hazard (Guzzetti et al., 2005, 2006). Based on the results obtained in the Collazzone area, and aided by the scarce literature on the validation of landslide susceptibility models (Carrara et al., 1992; Chung and Fabbri, 1999, 2003, 2005; Ardizzone et al., 2002; Fabbri et al., 2003; Remondo et al., 2003), a general framework for establishing the quality of a landslide susceptibility assessment is proposed, including an objective scheme for ranking the quality of the assessment. It is proposed that any landslide susceptibility model be tested to Fig. 12. Analysis of the relationships between the predicted susceptibility classes and the distribution and abundance of the triggered landslides. (A) Cumulative percentage of landslide area in the susceptibility classes. (B) Cumulative percentage of the number of triggered landslides in the susceptibility classes (x-axis). point of each landslide polygon and counted the number of landslide central points in each mapping unit. About 55.8% of the snowmelt-induced landslides and 53.3% of the rainfall-induced landslides occurred in mapping units classified as highly unstable (probability > 0.80). Conversely, only 2.2% of the snowmelt-induced landslides and only 3.3% of the rainfall-induced landslides occurred in mapping units classified as highly stable (probability ≤ 0.20). Fig. 12 confirms the aptitude of the susceptibility model to predict where (i.e., in which mapping unit) the snowmelt-induced landslides occurred in January 1997, and where the rainfall-induced landslides occurred in autumn 2004. 7. Discussion In the previous sections, we have shown how the quality (i.e., reliability, robustness, degree of fitting and prediction skill) of a statistically based, landslide susceptibility model can be assessed quantitatively. The adopted evaluation procedure includes: (i) standard methods used to evaluate the “goodness of fit” of a statistical classification (e.g., Tables 4 and 5); (ii) tests proposed in the literature to determine the degree of model fitting (Fig. 4) and the prediction skill (Fig. 11) of (i) determine the degree of model fit; (ii) establish the aptitude of the thematic information to construct the model, including an assessment of the sensitivity of the model to changes in the landslide and the thematic information used to construct the model; (iii) determine the error associated with the probabilistic estimate obtained for each mapping unit; and (iv) test the skill of the model prediction to forecast “future” landslides. Determining the degree of model fit consists of establishing how well the model describes the known distribution of landslides (Tables 4 and 5). The task is easily performed in a GIS by using the same landslide information used to construct the susceptibility model. For the purpose, contingency tables (Tables 4 and 5) and cumulative statistics of the abundance of landslides in the susceptibility classes (Fig. 4) can be used. For the test to be significant, the landslide information must be representative, accurate, and complete. To evaluate the role of the thematic information in the construction of the landslide susceptibility model (Table 7) and to evaluate the model sensitivity (Fig. 5), we studied the thematic variables entered (and not entered) in a large set of discriminant classification functions constructed on a sub-set of randomly selected mapping units (group G85). In this scheme, the random selection procedure accounted for the variability in the input data. The expected error (i.e., the level of uncertainty) associated with the probabilistic susceptibility estimate 180 F. Guzzetti et al. / Geomorphology 81 (2006) 166–184 obtained for each mapping unit was determined by investigating the variability of the estimate in the mapping units. For this purpose, we established two standard deviations (2σ) of the model estimate as a quantitative measure of the model uncertainty, and we modelled the expected error using equation 2 (Fig. 7). To show the geographical distribution of the expected error we prepared the map shown in Fig. 8. Testing the skill of the susceptibility model to forecast “future” landslides can best be accomplished by using landslide information not available to construct the susceptibility model. This study used independent shallow landslide information obtained from two eventinventory maps showing new slope failures triggered by rapid snow melting (Fig. 10A) and by intense rainfall (Fig. 10B). Chung and Fabbri (2003, 2005) obtained a similar result by splitting a multi-temporal inventory into two temporal subsets, i.e., a training set containing landslides that occurred before an established date, and a classification set showing landslides that occurred after that date. The landslide classification set was used to test the forecasting performance of the model constructed from the training set. We argue that our scheme for testing the model prediction performance is superior, given the availability of new and independent landslide information (i.e., the event inventories shown in Fig. 10). In this scheme, to construct the susceptibility model the entire information about past landslides is exploited and not a limited subset. As a potential drawback, this scheme is more limited because a reduced number of landslides (406 snowmelt-induced and 152 rainfallinduced shallow landslides) is used to ascertain the model prediction skill. Chung and Fabbri (2003, 2005) also proposed splitting the study area geographically into two subareas of equal size: a training (modelling) area and a validation (classification) area. In this scheme, the model is constructed in the training area and its prediction performance is evaluated in the validation area. Splitting the study area into two adjacent sub-areas can be problematic. The approach assumes that the independent thematic variables remain the same in the training and the classification areas. A rock type or land use class may be present in an area (e.g., the training area) but may not be represented in another area (e.g., the verification area), making it difficult (or even impossible) to apply the classification function obtained in the training area. Further, the approach assumes that the role of the ensemble of thematic variables in explaining the known distribution of landslides does not change geographically. In places, validity of this assumption may be difficult to establish. Table 8 lists a set of criteria for ranking and comparing the quality of landslide susceptibility assessments. Based on the listed criteria, when no information is available on the quality of a landslide susceptibility model, the resulting zoning map has the lowest possible level of quality (level 0). We consider this level of quality unacceptable for modern, statistical or physically based susceptibility models. When estimates of model fit are available, the susceptibility assessment has the least acceptable quality level (level 1). When the error associated with the predicted susceptibility estimate for each mapping unit is established, the susceptibility assessment has a higher level of quality (level 2). Lastly, when the prediction skill of the model is known, the susceptibility assessment has a still higher quality rank (level 4). The proposed scheme allows for summing the individual quality levels. As an example, a susceptibility assessment for which the fitting performance (level 1) and prediction skill (level 4) were determined is quality level 5. When, for the same susceptibility assessment, the error associated with the predicted susceptibility for each mapping unit is established (level 2), the quality level becomes 7. Adopting the proposed scheme, the landslide susceptibility model prepared for the Collazzone area has the highest quality level (level 7). The established set of tests does not guarantee, as such, the quality of the susceptibility estimate. To obtain this, the results of the tests must be matched against established acceptance thresholds. Defining acceptance thresholds is not an easy task. In the following, based on the experience gained in landslide susceptibility assessments completed in southern (Carrara, 1983), central (Carrara et al., 1991, 1995; Guzzetti et al., 1999, 2006; Cardinali et al., 2002; Carrara et al., 2003) and northern (Ardizzone et al., 2002; Carrara et al., 2003; Guzzetti et Table 8 Proposed criteria and levels of quality for landslide susceptibility models and associated maps Description Level No information available, or no test performed to determine the quality and prediction skill of the landslide susceptibility assessment. Estimates of degree of model fit are available (tests performed using the same landslide information used to obtain the susceptibility estimate). Estimates of the error associated with the predicted susceptibility value in each terrain unit are available (tests performed using the same landslide information used to obtain the susceptibility estimate). Estimates of the model prediction performance are available (tests performed using independent landslide information, not used to obtain the susceptibility model). 0 1 2 4 F. Guzzetti et al. / Geomorphology 81 (2006) 166–184 al., 2005) Italy, we propose acceptance thresholds, and then compare the results of our tests to the proposed thresholds. We consider acceptable a susceptibility model with an overall degree of model fit greater than about 75%. If the overall model fit is greater that 80%, we regard the classification as very satisfactory. An extremely high value of the overall model fit (e.g., ≥ 90%) is an indication that the model matches too closely the original landslide inventory map. When such a case arises, the model prediction is virtually indistinct from a prediction made using solely the landslide inventory, making the model suspicious. The case may arise, where the spatial distribution of landslides is “trivial” to forecast or where the number of mapping units is very small compared to the number of the explanatory variables (e.g., Campus et al., 1999). An additional indication of the higher quality of the model consists of a reduced number (e.g., ≤ 15%) of mapping units with landslides erroneously classified as “stable” areas by the model. The overall fit obtained for the susceptibility model prepared for the Collazzone area was 77.0% (Table 4), and the proportion of mapping units with landslides erroneously classified as stable areas was 9.5% (85 units). A statistical model obtained using a reduced number of geomorphologically meaningful explanatory variables is less expensive and, thus, superior to a model which uses a very large number of variables. Further, use of a stable combination of variables provides for a robust model that can cope with uncertainty in the input data. The discriminant function used to construct the susceptibility model shown in Fig. 3 selected 16 of the 46 available thematic variables (34.8%). Our analysis (Table 7) revealed that the selected variables were highly consistent in classifying the mapping units as stable or unstable in a large number of models. We consider this an indication of the robustness of the selected model. Apart from the example discussed in this work (Figs. 7 and 8), we are not aware of any other susceptibility assessment for which the error associated with the probabilistic estimate of landslide occurrence was determined for individual mapping units. Establishing an acceptance threshold is therefore difficult. Inspection of Fig. 7 reveals that most mapping units have an expected error (2σ) lower than 10% of the probability estimate. This figure is taken as a quality acceptance threshold for the model error. In the model presented here (Fig. 3), there are only 21 mapping units (2.35%) that do not match this criterion. Most of the latter units are in the “unclassified” probability range (Fig. 7). 181 To appraise the fitting performance and the prediction skill of a landslide susceptibility model, Chung and Fabbri (2003) proposed comparing the proportion of landslide area in each susceptibility class (AL) with the proportion of the susceptibility class (AS) in the study area. For a successful classification, the “effectiveness ratio” AL/AS should be greater than one in the areas predicted as landslide prone by the model, and less than one in the areas predicted as stable by the model. A very effective prediction class has a ratio close to zero or very large, depending whether the class predicts stability or instability. Where the effectiveness ratio is near one, the proportion of landslides in the susceptibility class is not different from the average landslide density in the study area, and the performance of the susceptibility class in determining the known (“fitting” performance) or the future (“prediction” skill) location of landslides is weak. Chung and Fabbri (2003) considered “effective” a susceptibility class with a ratio larger than at least 3 (unstable areas) or less than at most 0.2 (stable areas), and “significantly effective” a susceptibility class with a ratio larger than at least 6 or less than at most 0.1. We regard these criteria as very hard to match, particularly in complex areas where landslides are large and numerous, and where the landscape exhibits considerable geomorphological variability. We consider “effective” a susceptibility class with an effectiveness ratio larger than 1.5 or smaller than 0.5, corresponding to a 50% increase or a 50% decrease from the expected proportion of landslides in the susceptibility class. Fig. 13 shows the efficacy of the susceptibility model shown in Fig. 3 in describing the known distribution of landslides (“fitting” performance, Fig. 13A), and the location of “future” landslides (“prediction” skill, Fig. 13B and C). Based on the established criteria, 12 of the 20 landslides susceptibility classes are “effective” in explaining the distribution of the known (past) landslides used to construct the model. In Fig. 13A, the black and the white bars exceeding the 1.5 and the 0.5 thresholds, respectively, show these effective classes. In the figure, the three cross-hachured bars represent terrain units classified as unstable (spatial probability in the range from 0.55 to 0.70) where landslides were not abundant in the multi-temporal inventory map. Comparison of Fig. 13B and C indicates that the individual susceptibility classes were better predictors of the presence (black bars) or the absence (white bars) of the snowmelt-induced landslides than of the rainfallinduced landslides. For the latter, the number of “ineffective” classes is also larger. It should be clear that the proposed acceptance thresholds are not absolute or fixed. The proposed limits 182 F. Guzzetti et al. / Geomorphology 81 (2006) 166–184 in errors that affect a susceptibility assessment. These factors include (i) imprecision and incompleteness in the landslide information used to construct and test the susceptibility model (Carrara et al., 1992; Ardizzone et al., 2002); (ii) quality, abundance, precision and completeness of the thematic data used to obtain the susceptibility assessment (Carrara et al., 1992; Soeters and van Westen, 1996; Carrara et al., 1999); (iii) characteristics and limitations of the statistical technique used to obtain the classification, including the experience of the investigator in applying the selected statistical tools (Carrara et al., 1992; Michie et al., 1994); and (iv) selection of the appropriate mapping unit (e.g., slope units, unique condition units, grid cells, etc.) (Carrara et al., 1995; Guzzetti et al., 1999). The listed factors require choices from the investigator, which inevitably introduce uncertainty in the susceptibility assessment. For this work we assumed the landslide information used to construct the model (Fig. 2, Table 1B) and to test the model prediction skill (Fig. 10A and B) was accurate and complete. We also assumed the thematic data were correct and complete and relevant to the distribution of landslides in the Collazzone area. We further assumed that the terrain subdivision adopted to ascertain landslide susceptibility was precise in describing the morphology of the area, and apt to explain the size and abundance of landslides in the study area. Finally, landslide susceptibility is just one of the three components of landslide hazard (Guzzetti et al., 1999, 2005, 2006). In addition to landslide susceptibility, to determine landslide hazard one has to ascertain the temporal frequency (i.e., recurrence) of the landslides, and the statistics of landslide size. These probabilistic assessments are affected by errors which should be identified and determined. 8. Conclusions Fig. 13. Analysis of the effectiveness of the landslide susceptibility classification. were selected heuristically, based on the results of landslide susceptibility assessments carried out in Italy in the last 25 years. The acceptance criteria need to be tested in other areas and by different investigators. Depending on the geomorphological setting and the complexity of a study area, other investigators may select different thresholds. The framework discussed for the evaluation of the quality of a landslide susceptibility model considers the most relevant sources of errors in a statistically based susceptibility assessment, but other factors exist resulting Where accurate landslide inventory maps have been prepared, the availability of user-friendly GIS software and of digital cartographic databases containing morphological, geological, land use and other environmental information has made it easy for geomorphologists to obtain digital landslide susceptibility maps. These maps attempt to zone an area based on the propensity of a territory to produce new or reactivated landslides, and they have proven to be very valuable for land use planning, policymaking, and civil defence (Brabb, 1984; Varnes and IAEG, 1984; Guzzetti et al., 2000; Glade et al., 2005). As in any other prediction, a landslide susceptibility assessment requires proper validation to ascertain its quality and prediction skills. Unfortunately, inspection of the literature reveals that F. Guzzetti et al. / Geomorphology 81 (2006) 166–184 this is rarely performed. Forecast without validation is of as little use in science as in common life (Jollifee and Stephenson, 2003). Landslide susceptibility assessments are no exception, and lack of proper verification (or falsification) jeopardizes the use of a susceptibility map. Indeed, if a geomorphologist cannot define the reliability, robustness, degree of fitting and prediction skill (i.e., the quality) of a susceptibility assessment, why should a planner, policy-maker or civil defence manager use the prediction? In this paper, we have proposed a framework to address this problem. The framework is based on a set of tests and related acceptance criteria aimed at establishing and ranking the quality of a landslide susceptibility assessment, including the degree of model fit, the robustness of the model to changes in the input data, the error associated with the probabilistic estimate, and the model prediction skill. The proposed framework has been successfully tested in the Collazzone area. The experiment has demonstrated that adopting a simple, statistical classification method applied to good-quality data allows geomorphologists to prepare and validate a landslide susceptibility model, at least for areas of the size of the Collazzone study area. It remains to be demonstrated if the same set of tests and related acceptance criteria are applicable to larger areas, extending for hundreds or even thousands of square kilometres (e.g., Cardinali et al., 2002). Two noteworthy improvements over existing landslide susceptibility modelling efforts have been obtained. The first improvement consists of providing an estimate of the error associated with the probability of landslide spatial occurrence (i.e., susceptibility) obtained for each mapping unit in which a study area is partitioned. This is particularly significant where landslide susceptibility assessments are prepared to be used for planning purposes or for establishing land use regulations (Guzzetti et al., 2000). The second improvement consists of having established clearly defined criteria and associated acceptance thresholds for determining and ranking the quality of a landslide susceptibility assessment. If adopted, the proposed framework will provide for quantitative comparisons of the results obtained by different investigators working in different areas, and using different methods, to predict landslide susceptibility. Ultimately, this will add to the credibility of our products and the quality of our science. Acknowledgements Work supported by CNR-GNDCI (Publication Number 2893) and CNR-IRPI grants. We thank Earl E. 183 Brabb, the editor, and two anonymous referees for their comments. References Ardizzone, F., Cardinali, M., Carrara, A., Guzzetti, F., Reichenbach, P., 2002. Uncertainty and errors in landslide mapping and landslide hazard assessment. Natural Hazards and Earth System Sciences 2 (1–2), 3–14. Brabb, E.E., 1984. Innovative approach to landslide hazard and risk mapping. Proceedings of the 4th International Symposium on Landslides, Toronto, vol. 1, pp. 307–324. Brown, C.E., 1998. Applied Multivariate Statistics in Geohydrology and Related Sciences. Springer-Verlag. 248 pp. Campus, S., Forlati, F., Pegoraro, C., 1999. Studio propedeutico alla valutazione della pericolosità geologica inerente l'instabilità dei versanti mediante tecniche GIS ed approccio statistico multivariato. In: Piemonte, Regione (Ed.), Eventi Alluvionali in Piemonte, Regione Piemonte, Torino, pp. 288–295. Cardinali, M., Ardizzone, F., Galli, M., Guzzetti, F., Reichenbach, P., 2000. Landslides triggered by rapid snow melting: the December 1996–January 1997 event in Central Italy. In: Claps, P., Siccardi, F. (Eds.), Proceedings 1st Plinius Conference on Mediterranean Storms. Bios Publisher, Cosenza, pp. 439–448. Cardinali, M., Carrara, A., Guzzetti, F., Reichenbach, P., 2002. Landslide hazard map for the Upper Tiber River basin. CNR, Gruppo Nazionale per la Difesa dalle Catastrofi Idrogeologiche, Publication n. 2116, scale 1:100,000. Carrara, A., 1983. A multivariate model for landslide hazard evaluation. Mathematical Geology 15, 403–426. Carrara, A., Cardinali, M., Detti, R., Guzzetti, F., Pasqui, V., Reichenbach, P., 1991. GIS Techniques and statistical models in evaluating landslide hazard. Earth Surface Processes and Landforms 16 (5), 427–445. Carrara, A., Cardinali, M., Guzzetti, F., 1992. Uncertainty in assessing landslide hazard and risk. ITC Journal 2, 172–183. Carrara, A., Cardinali, M., Guzzetti, F., Reichenbach, P., 1995. GIS technology in mapping landslide hazard. In: Carrara, A., Guzzetti, F. (Eds.), Geographical Information Systems in Assessing Natural Hazards. Kluwer Academic Publisher, Dordrecht, The Netherlands, pp. 135–175. Carrara, A., Guzzetti, F., Cardinali, M., Reichenbach, P., 1999. Use of GIS technology in the prediction and monitoring of landslide hazard. Natural Hazards 20 (2–3), 117–135. Carrara, A., Crosta, G.B., Frattini, P., 2003. Geomorphological and historical data in assessing landslide hazard. Earth Surface Processes and Landforms 28 (10), 1125–1142. Chung, C.-J.F., Fabbri, A.G., 1999. Probabilistic prediction models for landslide hazard mapping. Photogrammetric Engineering & Remote Sensing 65 (12), 1389–1399. Chung, C.-J.F., Fabbri, A.G., 2003. Validation of spatial prediction models for landslide hazard mapping. Natural Hazards 30 (3), 451–472. Chung, C.-J.F., Fabbri, A.G., 2005. Systematic procedures of landslide hazard mapping for risk assessment using spatial prediction models. In: Glade, T., Anderson, M.G., Crozier, M.J. (Eds.), Landslide Risk Assessment. John Wiley, pp. 139–174. Cohen, J., 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20, 37–46. Cruden, D., Varnes, D., 1996. Landslide types and processes. In: Turner, A., Schuster, R. (Eds.), Landslides: Investigation and 184 F. Guzzetti et al. / Geomorphology 81 (2006) 166–184 Mitigation. Transportation Research Board, National Research Council, vol. 247, pp. 36–75. Fabbri, A.G., Chung, C.-J.F., Cendrero, C., Remondo, J., 2003. Is prediction of future landslides possible with a GIS? Natural Hazards 30 (3), 487–503. Fisher, R.A., 1936. The use of multiple measurements in taxonomic problems. Annales Eugenics 7, 179–188. Glade, T., Anderson, M.G., Crozier, M.J. (Eds.), 2005. Landslide Risk Assessment. John Wiley and Sons. 832 pp. Gorsevski, P.V., Gessler, P.E., Jankowski, P., 2003. Integrating a fuzzy k-means classification and a Bayesian approach for spatial prediction of landslide hazard. Journal of Geographical Systems 5 (3), 223–251. Guzzetti, F., Carrara, A., Cardinali, M., Reichenbach, P., 1999. Landslide hazard evaluation: an aid to a sustainable development. Geomorphology 31, 181–216. Guzzetti, F., Cardinali, M., Reichenbach, P., Carrara, A., 2000. Comparing landslide maps: a case study in the upper Tiber River Basin, central Italy. Environmental Management 25 (3), 247–363. Guzzetti, F., Reichenbach, P., Cardinali, M., Ardizzone, F., Galli, M., 2003. Impact of landslides in the Umbria Region, Central Italy. Natural Hazards and Earth System Sciences 3 (5), 469–486. Guzzetti, F., Reichenbach, P., Cardinali, M., Galli, M., Ardizzone, F., 2005. Landslide hazard assessment in the Staffora basin, northern Italian Apennines. Geomorphology 72, 272–299. Guzzetti, F., Galli, M., Reichenbach, P., Ardizzone, F., Cardinali, M., 2006. Landslide hazard assessment in the Collazzone area, Umbria, central Italy. Natural Hazards and Earth System Sciences 6, 115–131. Hoehler, F.K., 2000. Bias and prevalence effects on kappa viewed in terms of sensitivity and specificity. Journal of Clinical Epidemiology 53 (5), 499–503. Jollifee, I.T., Stephenson, D.B., 2003. Forecast Verification. A Practitioner's Guide in Atmospheric Science. John Wiley. 240 pp. Landis, J.R., Kock, G.G., 1977. The measurement of observer agreement for categorial data. Biometrics 33, 159–174. Mason, I.B., 2003. Binary events. Chapter 3. In: Jollifee, I.T., Stephenson, D.B. (Eds.), Forecast Verification. A Practitioner's Guide in Atmospheric Science. John Wiley, pp. 37–76. McCalpin, J., 1984. Preliminary age classification of landslides for inventory mapping. 21st Annual Symposium on Engineering Geology and Soils Engineering, April 5–6, 1984, Pocatello, Idaho. 13 pp. Michie, D., Spiegelhalter, D.J., Taylor, C.C. (Eds.) 1994. Machine Learning, Neural and Statistical Classification. Internet version available at https://www.amsta.leeds.ac.uk/~charles/statlog/. Remondo, J., Gonzalez, A., Diaz De Teran, J.R., Cendrero, A., Fabbri, A., Chung, C.-J.F., 2003. Validation of Landslide Susceptibility Maps; Examples and Applications from a Case Study in Northern Spain. Natural Hazards 30, 437–449. Soeters, R., van Westen, C.J., 1996. Slope instability recognition analysis and zonation. In: Turner, A.K., Schuster, R.L. (Eds.), Landslide Investigation and Mitigation, National Research Council. Transportation Research Board Special Report, vol. 247. National Academy Press, Washington, D.C., pp. 129–177. SPSS, 2004. SPSS 13.0 Command Syntax Reference. SPSS Inc. Chicago. 1994 p. Varnes, D.J., and IAEG Commission on Landslides and other MassMovements, 1984. Landslide Hazard Zonation: A Review of Principles and Practice. UNESCO Press, Paris. 63 pp.