Software System Analysis and Design Assignment 6

系统分析 第六次作业

从零开始的随机森林(2):决策树

Attention: This blog is a reading note of Tree - Decision Tree with sklearn source code in Chinese, which is written in English. As missing or misunderstanding might be included, please reference the original text if needed. We believe that the original text containing some misunderstandings on information theory itself, and we correct it in our version.

上一篇博客中,我们介绍了有关信息论的原理和一些直观的理解,这是构建决策树(Decision Tree)算法的基础。在本篇中,我们将介绍分类决策树和回归决策树,以及在 Scikit-learn 开源库中其对应的实现。由于 sk-learn 使用 CART 算法构建决策树,我们也将主要聚焦于这个算法,同时会提及另外两种常用算法:ID3 和 C4.5 算法的原理和优劣。

从零开始的随机森林(1):信息论

Attention: This blog is a reading note of Tree - Information Theory in Chinese, which is written in English. As missing or misunderstanding might be included, please reference the original text if needed. We believe that the original text containing some misunderstandings on information theory itself, and we correct it in our version.

注意:本系列主要关注涉及分类决策和回归树/森林的理论,以及回归树/森林的代码实现。

近年来,基于决策树的集成算法在机器学习和数据挖掘界被广泛应用,并受到了大多数人的好评和欢迎。随机森林(Random Forest)是这批算法中的佼佼者,其拥有很高的准确率,同时兼具很高的运行效率和对于异常数据与过拟合的鲁棒性。因而,我们撰写了这一系列的博客,希望能够系统地介绍随机森林的原理和实现的方法。

随机森林的本质是决策树的集成,我们需要先阐述决策树背后的原理。而在此之前,我们需要先了解他们共同依赖的理论基础:信息论。(注意:本篇仅涉及对于其理论的直观认识,如需要严格的数学论证,请参阅原论文)

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×