Saturday, April 30, 2005
A Survey of Web Metrics - Web Mining related paper
Similarity Queries - Web Mining related paper
HTML Similarities - Web Mining related paper
Friday, April 29, 2005
Elsevier.com
Elsevier.com
Author Gateway - Getting Published - LaTeX file guidelines
IEEE Intelligent Systems
AI Reference Shelf
Journal Informations in the Reference List ...
Elsevier Author Gateway
Elsevier Author Gateway
Elsevier Author Gateway
Elsevier Author Gateway
Machine Learning and Natural Language Processing Lab
Machine Learning and Natural Language Processing Lab: "Link Statistical Methods in Medical Research"
Potential Utility of Data-Mining Algorithms for Early Detection of Potentially Fatal/Disabling Adverse Drug Reactions: A Retrospective Evaluation -- H
Elsevier Author Gateway
ScienceDirect - Artificial Intelligence in Medicine - List of Issues
Tuesday, April 12, 2005
Statistical Data Mining Tutorials
Monday, April 11, 2005
EntropyBasedLinkAnalysis.pdf (application/pdf-Objekt)
Saturday, April 09, 2005
A sequential algorithm for training text classifiers
Discovering informative content blocks from Web documents
Template detection via data mining and its applications
Friday, April 08, 2005
kdd2003-webNoise.pdf (application/pdf-Objekt)
Sunday, April 03, 2005
treeFinderSys.pdf
treeFinderSys.pdf (application/pdf-Objekt)
XML Tree Finder System: a First Step towards XML Data Mining
Final Report
Anguo Dong
Supervisor: Dr.Reda Alhajj
Computer Science Department
University of Calgary
April 5, 2004
Abstract
The problem of searching frequent trees from a
collection of tree-structured XML data modeling is
considered. The aim of this XML Tree Finder system(
XTFS) is to find the tree whose exact or perturbed
copies are frequent in a collection of the labeled
trees. The definition of the labeled tree will be
given later.Frequent here means that the tree we find
is the Maximal Common Tree of the collection of the
labeled tree.
XML Tree Finder System: a First Step towards XML Data Mining
Final Report
Anguo Dong
Supervisor: Dr.Reda Alhajj
Computer Science Department
University of Calgary
April 5, 2004
Abstract
The problem of searching frequent trees from a
collection of tree-structured XML data modeling is
considered. The aim of this XML Tree Finder system(
XTFS) is to find the tree whose exact or perturbed
copies are frequent in a collection of the labeled
trees. The definition of the labeled tree will be
given later.Frequent here means that the tree we find
is the Maximal Common Tree of the collection of the
labeled tree.
WISDOM: Web Intrapage Informative Structure Mining Based on Document Object Model
WISDOM: Web Intrapage Informative Structure Mining Based on Document Object Model
Hung-Yu Kao, Jan-Ming Ho, Ming-Syan Chen, IEEE
To increase the commercial value and accessibility of pages, most content sites tend to publish their pages with intrasite redundant information, such as navigation panels, advertisements, and copyright announcements. Such redundant information increases the index size of general search engines and causes page topics to drift. In this paper, we study the problem of mining intrapage informative structure in news Web sites in order to find and eliminate redundant information. Note that intrapage informative structure is a subset of the original Web page and is composed of a set of fine-grained and informative blocks. The intrapage informative structures of pages in a news Web site contain only anchors linking to news pages or bodies of news articles. We propose an intrapage informative structure mining system called WISDOM (Web Intrapage Informative Structure Mining based on the Document Object Model) which applies Information Theory to DOM tree knowledge in order to build the structure. WISDOM splits a DOM tree into many small subtrees and applies a top-down informative block searching algorithm to select a set of candidate informative blocks. The structure is built by expanding the set using proposed merging methods. Experiments on several real news Web sites show high precision and recall rates which validates WISDOM's practical applicability.
Index Terms- Index Terms- Intrapage informative structure, DOM, entropy, information extraction.
Hung-Yu Kao, Jan-Ming Ho, Ming-Syan Chen, IEEE
To increase the commercial value and accessibility of pages, most content sites tend to publish their pages with intrasite redundant information, such as navigation panels, advertisements, and copyright announcements. Such redundant information increases the index size of general search engines and causes page topics to drift. In this paper, we study the problem of mining intrapage informative structure in news Web sites in order to find and eliminate redundant information. Note that intrapage informative structure is a subset of the original Web page and is composed of a set of fine-grained and informative blocks. The intrapage informative structures of pages in a news Web site contain only anchors linking to news pages or bodies of news articles. We propose an intrapage informative structure mining system called WISDOM (Web Intrapage Informative Structure Mining based on the Document Object Model) which applies Information Theory to DOM tree knowledge in order to build the structure. WISDOM splits a DOM tree into many small subtrees and applies a top-down informative block searching algorithm to select a set of candidate informative blocks. The structure is built by expanding the set using proposed merging methods. Experiments on several real news Web sites show high precision and recall rates which validates WISDOM's practical applicability.
Index Terms- Index Terms- Intrapage informative structure, DOM, entropy, information extraction.
Advanced Data Mining
Advanced Data Mining: "Lecture Notes on Graphical Modeling: Part 2 Directed Graphs"