The rapid growth of the Web in the last decade makes it the largest publicly accessible data source in the world. Web mining aims to discover useful information or knowledge from Web hyperlinks, page contents, and usage logs. Based on the primary kinds of data used in the mining process, Web mining tasks can be categorized into three main types: Web structure mining, Web Content mining and Web usage mining. Web structure mining discovers knowledge from hyperlinks, which represent the structure of the Web. Web content mining extracts useful information/knowledge from Web page contents. Web usage mining mines user access patterns from usage logs, which record clicks made by every user.
The goal of this book is to present these tasks, and their core mining algorithms. The book is intended to be a text with a comprehensive coverage, and yet, for each topic, sufficient details are given so that readers can gain a reasonably complete knowledge of its algorithms or techniques without referring to any external materials. Four of the chapters, structured data extraction, information integration, opinion mining, and Web usage mining, make this book unique. These topics are not covered by existing books, but yet they are essential to Web data mining. Traditional Web mining topics such as search, crawling and resource discovery, and link analysis are also covered on detail in this book.
Although the book is entitled Web Data Mining, it also includes the main topics of data mining and information retrieval since Web mining uses their algorithms and techniques extensively. The data mining part mainly consists of chapters on association rules and sequential patterns, supervised learning (or classification), and unsupervised learning (or clustering), which are the three most important data mining tasks. The advanced topic of partially (semi-) supervised learning is included as well. For information retrieval, its core topics that are crucial to Web mining are described. This book is thus naturally divided into two parts. The first part, which consists of Chaps. 2-5, covers data mining foundations. The second part, which contains Chaps.6-12, covers Web specific mining.
Two main principles have guided the writing of this book. First, the basic content of the book should be accessible to undergraduate students, and yet there are sufficient in-depth materials for graduate students who plan to pursue Ph.D. degrees in Web data mining or related areas. few assumptions are made in the book regarding the prerequisite knowledge of readers. One with a basic understanding of algorithms and probability concepts should have no problem with this book. Second, the book should examine the Web mining technology from a practical point of view. This is important because most Web mining technology tasks have immediate real-world applications. In the past few years, I was fortunate to have worked directly or indirectly with many researchers and engineers in several search engine and e-commerce companies, and also traditional companies that are interested in exploiting the information on the Web in their businesses. During the process, I gained practical experiences and first-hand knowledge of real world problems. I try to pass those non-confidential pieces of information and knowledge along in the book. The book, thus, should have a good balance of theory and practice. I hope that it will not be a learning text for students, but also a valuable source of information/knowledge and even ideas for Web mining researchers and practitioners.
Preface
《Web数据挖掘》热门书评
-
Preface
3有用 0无用 终南长安 2009-10-24
The rapid growth of the Web in the last decade makes it the largest publicly accessible data source in the world. Web mining aims to discover useful i...
-
从目录看,内容很厚实的一本书
2有用 2无用 ucherish 2009-08-05
第一部分 数据挖掘基础第1章 概述31.1 什么是万维网31.2 万维网和互联网的历史简述41.3 Web数据挖掘51.3.1 什么是数据挖掘61.3.2 什么是Web数据挖掘71.4 各章概要81.5 如何阅读本书10文献评注10第2章 关联规则和序列模式122.1 关联规则的基本概念122.2 ...
-
英语太烂了。。。
1有用 0无用 jefflee 2009-02-20
看了第一章前4页,明显有 Chinglish 痕迹,两页居然找到4个错误或者表达不清的地方。似乎内容还不错...
-
书不错,但中文版的错别字较多~
1有用 0无用 uNiCorn 2011-02-18
主要在看结构化数据抽取那块,,自己之前在想的一些问题发现已经有不少人去研究了,收益很多。同样是一本实用性很强的书,对于不是专门弄学术的同学还是比较有价值的。看了参考文献,数据抽取方面的几个算法都是作者本人发的paper,怪不得讲的很多。另:书后面N多的参考文献真的是很不错~,而且不少paper都还比...
-
国内哪里有卖英文原版的书
1有用 0无用 阿杜 2011-11-07
最近在看电子版原版的,刚刚看到第二章的关联规则,MS-Apriori算法实现有点难理解,从目录上看整体感觉挺不错,想买本原版的书来看,还是比较喜欢纸质版的书,就是没找到哪里有卖原版的,谁给推荐一下哪里有卖的??...
书名: Web数据挖掘
作者: 刘兵
出版社: 清华大学出版社
译者: 俞勇
出版年: 2009-4
页数: 375
定价: 49.00元
装帧: 平装
丛书: 世界著名计算机教材精选
ISBN: 9787302193388