Climate Change Data Portal
DOI | 10.1016/j.ecoinf.2019.05.003 |
Classification and regression with random forests as a standard method for presence-only data SDMs: A future conservation example using China tree species | |
Zhang, Lei1; Huettmann, Falk2; Liu, Shirong3; Sun, Pengsen3; Yu, Zhen4; Zhang, Xudong1; Mi, Chunrong5 | |
发表日期 | 2019 |
ISSN | 1574-9541 |
EISSN | 1878-0512 |
卷号 | 52页码:46-56 |
英文摘要 | The random forests (RF) algorithm is a superb learner and classifier in machine learning applications. This ensemble model is also one of the most popular species distribution model algorithms (SDMs) available to date. RF by default can produce categorical and numerical species distribution maps based on its classification tree (CT) and regression tree (RT) algorithms, respectively. Statistically, CT can also produce numerical predictions (class probability). Many real-world applications (e.g. conservation planning) employ binary presence-absence outputs that use classification thresholds to make these conversions. However, there is little available information regarding the difference in model performance between CT and RT for inference settings. Here, under an ensemble modeling framework, 52 forest tree species with presence-only data for all of China were selected for comparison of the performance of CT and RT algorithms in projecting the distribution and potential range shifts of these species under current and future climates. Five climatic variables were used to develop CT and RT models. Eight threshold-setting approaches were employed to convert numerical predictions into binary predictions. With regard to probabilistic predictions, the relative performance of CT and RT depended on the choice of the evaluation criteria. For both RT and CT, threshold-setting methods significantly altered the determination of thresholds, model performance, and subsequently projections of species range shifts under climate change. The four threshold selection methods (MaxKappa, MaxOA, MaxTSS, and MinROCdist) based on the composite model accuracy measures most often achieved significantly higher model performance than CT default threshold method and other threshold methods. They consistently projected that species' geographical ranges changed in response to climate change with the same direction and magnitude. We argue for choosing RT rather than CT as the SDM if model discrimination capacity (the ability to differentiate between occurrences of presence and absence) is viewed as more important than model reliability (the agreement between predicted relative indexes of occurrence and observed proportions of occurrence), and vice versa. In line with gradient theory, we can recommend the use of numerical predictions for species distribution modeling since they help to convey more information than binary predictions. Binary conversion of model outputs should only be carried out when it is clearly justified by the application's objective. The four aforementioned threshold methods are promising objective methods for binary conversions of continuous predictions when presence-only data are available. This study proposes guidelines on how machine learning can be used for specific applied and theoretical applications in a SDM context. |
WOS研究方向 | Environmental Sciences & Ecology |
来源期刊 | ECOLOGICAL INFORMATICS
![]() |
文献类型 | 期刊论文 |
条目标识符 | http://gcip.llas.ac.cn/handle/2XKMVOVA/99955 |
作者单位 | 1.Chinese Acad Forestry, Res Inst Forestry, Beijing 10091, Peoples R China; 2.UAF, Dept Biol & Wildlife, Inst Arctic Biol, EWHALE LAB, Fairbanks, AK USA; 3.Chinese Acad Forestry, Res Inst Forest Ecol Environm & Protect, Key Lab Forest Ecol & Environm State Forestry & G, Beijing 10091, Peoples R China; 4.Iowa State Univ Sci & Technol, Dept Ecol Evolut & Organismal Biol, Ames, IA 50011 USA; 5.Chinese Acad Sci, Inst Zool, Beijing 100101, Peoples R China |
推荐引用方式 GB/T 7714 | Zhang, Lei,Huettmann, Falk,Liu, Shirong,et al. Classification and regression with random forests as a standard method for presence-only data SDMs: A future conservation example using China tree species[J],2019,52:46-56. |
APA | Zhang, Lei.,Huettmann, Falk.,Liu, Shirong.,Sun, Pengsen.,Yu, Zhen.,...&Mi, Chunrong.(2019).Classification and regression with random forests as a standard method for presence-only data SDMs: A future conservation example using China tree species.ECOLOGICAL INFORMATICS,52,46-56. |
MLA | Zhang, Lei,et al."Classification and regression with random forests as a standard method for presence-only data SDMs: A future conservation example using China tree species".ECOLOGICAL INFORMATICS 52(2019):46-56. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。