Deriving semantic objects from the structured web

Abstract : This thesis focuses on the extraction and analysis of Web data objects, investigated from different points of view: temporal, structural, semantic. We first survey different strategies and best practices for deriving temporal aspects of Web pages, together with a more in-depth study on Web feeds for this particular purpose, and other statistics. Next, in the context of dynamically-generated Web pages by content management systems, we present two keyword-based techniques that perform article extraction from such pages. Keywords, automatically acquired, guide the process of object identification, either at the level of a single Web page (SIGFEED), or across different pages sharing the same template (FOREST). We finally present, in the context of the deep Web, a generic framework that aims at discovering the semantic model of a Web object (here, data record) by, first, using FOREST for the extraction of objects, and second, representing the implicit rdf:type similarities between the object attributes and the entity of the form as relationships that, together with the instances extracted from the objects, form a labeled graph. This graph is further aligned to an ontology like YAGO for the discovery of the unknown types and relations.
Keywords : Deep web
Document type :
Theses
Complete list of metadatas

Cited literature [134 references]  Display  Hide  Download

https://pastel.archives-ouvertes.fr/tel-01124278
Contributor : Abes Star <>
Submitted on : Friday, March 6, 2015 - 6:37:32 AM
Last modification on : Thursday, October 17, 2019 - 12:36:09 PM
Long-term archiving on : Sunday, June 7, 2015 - 3:25:49 PM

File

2012ENST0060.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01124278, version 1

Collections

Citation

Marilena Oita. Deriving semantic objects from the structured web. Other [cs.OH]. Télécom ParisTech, 2012. English. ⟨NNT : 2012ENST0060⟩. ⟨tel-01124278⟩

Share

Metrics

Record views

323

Files downloads

327