Modeling Tree Structures, Machine Learning, and Information Extraction
During the last decade, the World Wide Web has evolved to the most important public data store on world. The recent data formats used on the Web are heterogeneous and still evolving. The web community is highly interested in adequate information representing so that information on the Web can be accessed and extracted more easily. A major challenge in that perspective is adaptive information extraction that can exploit the tree structure of web documents. Tree structure is available in the recent Web formats, HTML and XML, to encompassed textual information. In this project, we want to integrate tree structures and emerging machine learning techniques into adaptive information extraction systems.