T. Abdessalem, B. Cautis, and N. Derouiche, ObjectRunner, Proc. 36th International Conference on Very Large Data Bases (VLDB), pp.1585-1588, 2010.
DOI : 10.14778/1920841.1921045

N. Derouiche, B. Cautis, and T. Abdessalem, Automatic Extraction of Structured Web Data with Domain Knowledge, 2012 IEEE 28th International Conference on Data Engineering, 2012.
DOI : 10.1109/ICDE.2012.90

T. Abdessalem, B. Cautis, and N. Derouiche, Lightweight, targeted extraction of structured web data, Proc. 26èmes journées Bases de Données Avancées, 2010.

. Articles-en-cours4-]-n, B. Derouiche, T. Cautis, A. Abdessalem, and . Goel, A Semantic Indexing-based Approach to Top-k Retrieval of Structured Web Sources, Cyganiak, and Zachary G. Ives. Dbpedia : A nucleus for a web of open data

E. Agichtein and L. Gravano, Querying text databases for efficient information extraction, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405), pp.113-124, 2003.
DOI : 10.1109/ICDE.2003.1260786

A. Arasu and H. Garcia-molina, Extracting structured data from Web pages, Proceedings of the 2003 ACM SIGMOD international conference on on Management of data , SIGMOD '03, pp.337-348, 2003.
DOI : 10.1145/872757.872799

S. Abiteboul, R. Hull, and V. Vianu, Foundations of Databases, 1995.

. Bcs-+-07-]-michele, M. J. Banko, S. Cafarella, M. Soderland, O. Broadhead et al., Open information extraction from the web, Proceedings of the 20th international joint conference on Artifical intelligence, IJCAI'07, pp.2670-2676, 2007.

L. Blanco, N. Dalvi, and A. Machanavajjhala, Highly efficient algorithms for structural clustering of large websites, Proceedings of the 20th international conference on World wide web, WWW '11, pp.437-446, 2011.
DOI : 10.1145/1963405.1963468

]. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor, Freebase, Proceedings of the 2008 ACM SIGMOD international conference on Management of data , SIGMOD '08, pp.1247-1250, 2008.
DOI : 10.1145/1376616.1376746

[. Bizer, The web of linked data: a global public dataspace on the web, Procceedings of the 13th International Workshop on the Web and Databases, WebDB '10, 2010.
DOI : 10.1145/1859127.1859129

R. Balakrishnan and S. Kambhampati, Sourcerank : relevance and trust assessment for deep web sources based on inter-source agreement, Proceedings of the 20th international conference on World wide web, pp.227-236, 2011.

S. Brin, Extracting Patterns and Relations from the World Wide Web, Selected papers from the International Workshop on The World Wide Web and Databases, pp.172-183, 1999.
DOI : 10.1007/10704656_11

K. Shui-lung-chuang, . Chen-chuan, C. Chang, and . Zhai, Context-aware wrapping : synchronized data extraction, Proceedings of the 33rd international conference on Very large data bases, VLDB '07, pp.699-710, 2007.

M. J. Cafarella, D. Downey, S. Soderland, and O. Etzioni, KnowItNow, Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing , HLT '05, pp.563-570, 2005.
DOI : 10.3115/1220575.1220646

[. Cho, H. Garcia-molina, and L. Page, Efficient crawling through URL ordering, Proceedings of the seventh international conference on World Wide Web 7, WWW7, pp.161-172, 1998.
DOI : 10.1016/S0169-7552(98)00108-1

J. Michael, A. Cafarella, D. Z. Halevy, E. Wang, Y. Wu et al., Webtables : exploring the power of tables on the web, Proc. VLDB Endow, vol.1, pp.538-549, 2008.

J. Michael, A. Y. Cafarella, Y. Halevy, D. Z. Zhang, E. Wang et al., Uncovering the relational web, WebDB, 2008.

C. Chang and S. Kuo, OLERA: Semisupervised Web-Data Extraction with Visual Support, IEEE Intelligent Systems, vol.19, issue.06, pp.56-64, 2004.
DOI : 10.1109/MIS.2004.71

[. Chang and S. Lui, IEPAD, Proceedings of the tenth international conference on World Wide Web , WWW '01, pp.681-688, 2001.
DOI : 10.1145/371920.372182

W. Cohen and A. Mccallum, Information extraction and integration : an overview, KDD, 2004.

[. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan, GATE : A framework and graphical development environment for robust NLP tools and applications, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 2002.

G. Valter-crescenzi, P. Mecca, and . Merialdo, Roadrunner : Towards automatic data extraction from large web sites, Proceedings of the 27th International Conference on Very Large Data Bases, VLDB '01, pp.109-118, 2001.

P. Valter-crescenzi, P. Merialdo, and . Missier, Clustering Web pages based on their structure, Data & Knowledge Engineering, vol.54, issue.3, pp.279-299, 2005.
DOI : 10.1016/j.datak.2004.11.004

L. Jeff, C. , and Y. Papakonstantinou, Supporting top-k keyword search in xml databases, ICDE, pp.689-700, 2010.

[. Chakrabarti, M. Van-den, B. Berg, and . Dom, Focused crawling: a new approach to topic-specific Web resource discovery, Proceedings of the eighth international conference on World Wide Web, WWW '99, pp.1623-1640, 1999.
DOI : 10.1016/S1389-1286(99)00052-3

D. Cai, S. Yu, J. Wen, and W. Ma, Extracting Content Structure for Web Pages Based on Visual Representation, Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications, APWeb'03, pp.406-417, 2003.
DOI : 10.1007/3-540-36901-5_42

F. Michelangelo-diligenti, S. Coetzee, C. L. Lawrence, M. Giles, and . Gori, Focused crawling using context graphs, Proceedings of the 26th International Conference on Very Large Data Bases, VLDB '00, pp.527-534, 2000.

M. Fernández, I. Cantador, V. López, D. Vallet, P. Castells et al., Semantically enhanced Information Retrieval: An ontology-based approach, Web Semantics: Science, Services and Agents on the World Wide Web, vol.9, issue.4, pp.434-452, 2011.
DOI : 10.1016/j.websem.2010.11.003

R. Fagin, A. Lotem, and M. Naor, Optimal aggregation algorithms for middleware, Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems , PODS '01, pp.614-656, 2003.
DOI : 10.1145/375551.375567

G. Gottlob, C. Koch, R. Baumgartner, M. Herzog, and S. Flesca, The Lixto data extraction project -back and forth between theory and practice, PODS, 2004.

A. Hemnani and S. Bressan, Extracting information from semistructured web documents, OOIS Workshops, 2002.

C. Hsu and M. Dung, Generating finite-state transducers for semi-structured data extraction from the Web, Information Systems, vol.23, issue.8, 1998.
DOI : 10.1016/S0306-4379(98)00027-1

A. Marti and . Hearst, Automatic acquisition of hyponyms from large text corpora, Proceedings of the 14th conference on Computational linguistics, pp.539-545, 1992.

A. Hogue and D. R. Karger, Thresher, Proceedings of the 14th international conference on World Wide Web , WWW '05, 2005.
DOI : 10.1145/1060745.1060762

M. Hpzc07-]-bin-he, Z. Patel, K. Zhang, and . Chang, Accessing the deep web, Commun. ACM, vol.50, pp.94-101, 2007.

[. Hoffart, F. M. Suchanek, K. Berberich, E. Lewis-kelham, G. De-melo et al., YAGO2, Proceedings of the 20th international conference companion on World wide web, WWW '11, 2011.
DOI : 10.1145/1963192.1963296

URL : https://hal.archives-ouvertes.fr/inria-00591780

G. Panagiotis, E. Ipeirotis, P. Agichtein, L. Jain, and . Gravano, To search or to crawl ? : towards a query optimizer for text-centric tasks, SIGMOD Conference, pp.265-276, 2006.

K. Järvelin and J. Kekäläinen, Ir evaluation methods for retrieving highly relevant documents, SIGIR, 2000.

N. Jindal and B. Liu, A Generalized Tree Matching Algorithm Considering Nested Lists for Web Data Extraction, SDM, 2010.
DOI : 10.1137/1.9781611972801.81

M. Kayed and K. F. Shaalan, A survey of web information extraction systems, IEEE TKDE, 2006.

[. Kushmerick, Wrapper induction for information extraction, p.9819266, 1997.

B. Liu, R. Grossman, and Y. Zhai, Mining data records in Web pages, Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '03, 2003.
DOI : 10.1145/956750.956826

B. Liu, Web Data Mining : Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications), 2006.
DOI : 10.1007/978-3-642-19460-3

W. Liu, X. Meng, and W. Meng, Vide : A vision-based approach for deep web data extraction WICCAP : From semi-structured data to structured data, ECBS, 2004.

G. Limaye, S. Sarawagi, and S. Chakrabarti, Annotating and searching web tables using entities, types and relationships, Proceedings of the VLDB Endowment, vol.3, issue.1-2, 2010.
DOI : 10.14778/1920841.1921005

W. Hady, R. Lauw1, F. Schenkel, M. Suchanek, and G. Theobald, Harvesting knowledge from web data and text, CIKM, 2010.

J. Madhavan, L. Afanasiev, L. Antova, and A. Halevy, Harnessing the deep web : Present and future, 4th Biennial Conference on Innovative Data Systems Research (CIDR), 2009.

L. Mcdowell and M. J. Cafarella, Ontology-driven, unsupervised instance population, Web Semantics: Science, Services and Agents on the World Wide Web, vol.6, issue.3, 2008.
DOI : 10.1016/j.websem.2008.04.002

K. Luke, M. Mcdowell, and . Cafarella, Ontology-driven, unsupervised instance population, Web Semant, 2008.

[. Muslea, S. Minton, and C. A. Knoblock, A hierarchical approach to wrapper induction, Proceedings of the third annual conference on Autonomous Agents , AGENTS '99, 1999.
DOI : 10.1145/301136.301191

R. Donald and . Morrison, Patricia-practical algorithm to retrieve information coded in alphanumeric, J. ACM, vol.15, pp.514-534, 1968.

F. Menczer, G. Pant, and P. Srinivasan, Topical web crawlers, ACM Transactions on Internet Technology, vol.4, issue.4, pp.378-419, 2004.
DOI : 10.1145/1031114.1031117

S. Raghavan and H. Garcia-molina, Crawling the hidden web, Proceedings of the 27th International Conference on Very Large Data Bases, VLDB '01, pp.129-138, 2001.

A. Juan-raposo, M. Pan, J. Álvarez, Á. Hidalgo, and . Vi-na, The wargo system : Semi-automatic wrapper generation in presence of complex data access modes, DEXA, 2002.

S. Sarawagi, Information extraction. Foundations and Trends in Databases, 2008.

M. Fabian, G. Suchanek, G. Kasneci, and . Weikum, Yago : a core of semantic knowledge, WWW, 2007.

K. Simon and G. Lausen, ViPER, Proceedings of the 14th ACM international conference on Information and knowledge management , CIKM '05, 2005.
DOI : 10.1145/1099554.1099672

P. Senellart, A. Mittal, D. Muschick, R. Gilleron, and M. Tommasi, Automatic wrapper induction from hidden-web sources with domain knowledge, Proceeding of the 10th ACM workshop on Web information and data management, WIDM '08, 2008.
DOI : 10.1145/1458502.1458505

URL : https://hal.archives-ouvertes.fr/inria-00337098

S. Soderland, Learning information extraction rules for semi-structured and free text, Machine Learning, 1999.

M. Fabian, M. Suchanek, G. Sozio, and . Weikum, Sofie : a selforganizing framework for information extraction, WWW, pp.631-640, 2009.

W. Su, J. Wang, and F. H. Lochovsky, ODE, ACM Transactions on Database Systems, vol.34, issue.2, 2009.
DOI : 10.1145/1538909.1538914

M. Theobald, R. Schenkel, and G. Weikum, Efficient and selftuning incremental query expansion for top-k query processing, 2005.

W. Wentao, L. Hongsong, W. Haixun, and Z. Kenny, Towards a probabilistic taxonomy of many concepts, PVLDB, 2011.

J. Wang and F. H. Lochovsky, Data extraction and label assignment for web databases, Proceedings of the twelfth international conference on World Wide Web , WWW '03, 2003.
DOI : 10.1145/775152.775179

W. Yang, Identifying syntactic differences between two programs, Software: Practice and Experience, vol.28, issue.7, pp.739-755, 1991.
DOI : 10.1002/spe.4380210706

R. Yangarber and R. Grishman, Nyu : Description of the proteus/pet system as used for muc-7, Proceedings of the Seventh Message Understanding Conference, 1998.

Y. Zhai and B. Liu, Web data extraction based on partial tree alignment, Proceedings of the 14th international conference on World Wide Web , WWW '05, 2005.
DOI : 10.1145/1060745.1060761