ObjectRunner, Proc. 36th International Conference on Very Large Data Bases (VLDB), pp.1585-1588, 2010. ,
DOI : 10.14778/1920841.1921045
Automatic Extraction of Structured Web Data with Domain Knowledge, 2012 IEEE 28th International Conference on Data Engineering, 2012. ,
DOI : 10.1109/ICDE.2012.90
Lightweight, targeted extraction of structured web data, Proc. 26èmes journées Bases de Données Avancées, 2010. ,
A Semantic Indexing-based Approach to Top-k Retrieval of Structured Web Sources, Cyganiak, and Zachary G. Ives. Dbpedia : A nucleus for a web of open data ,
Querying text databases for efficient information extraction, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405), pp.113-124, 2003. ,
DOI : 10.1109/ICDE.2003.1260786
Extracting structured data from Web pages, Proceedings of the 2003 ACM SIGMOD international conference on on Management of data , SIGMOD '03, pp.337-348, 2003. ,
DOI : 10.1145/872757.872799
Foundations of Databases, 1995. ,
Open information extraction from the web, Proceedings of the 20th international joint conference on Artifical intelligence, IJCAI'07, pp.2670-2676, 2007. ,
Highly efficient algorithms for structural clustering of large websites, Proceedings of the 20th international conference on World wide web, WWW '11, pp.437-446, 2011. ,
DOI : 10.1145/1963405.1963468
Freebase, Proceedings of the 2008 ACM SIGMOD international conference on Management of data , SIGMOD '08, pp.1247-1250, 2008. ,
DOI : 10.1145/1376616.1376746
The web of linked data: a global public dataspace on the web, Procceedings of the 13th International Workshop on the Web and Databases, WebDB '10, 2010. ,
DOI : 10.1145/1859127.1859129
Sourcerank : relevance and trust assessment for deep web sources based on inter-source agreement, Proceedings of the 20th international conference on World wide web, pp.227-236, 2011. ,
Extracting Patterns and Relations from the World Wide Web, Selected papers from the International Workshop on The World Wide Web and Databases, pp.172-183, 1999. ,
DOI : 10.1007/10704656_11
Context-aware wrapping : synchronized data extraction, Proceedings of the 33rd international conference on Very large data bases, VLDB '07, pp.699-710, 2007. ,
KnowItNow, Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing , HLT '05, pp.563-570, 2005. ,
DOI : 10.3115/1220575.1220646
Efficient crawling through URL ordering, Proceedings of the seventh international conference on World Wide Web 7, WWW7, pp.161-172, 1998. ,
DOI : 10.1016/S0169-7552(98)00108-1
Webtables : exploring the power of tables on the web, Proc. VLDB Endow, vol.1, pp.538-549, 2008. ,
Uncovering the relational web, WebDB, 2008. ,
OLERA: Semisupervised Web-Data Extraction with Visual Support, IEEE Intelligent Systems, vol.19, issue.06, pp.56-64, 2004. ,
DOI : 10.1109/MIS.2004.71
IEPAD, Proceedings of the tenth international conference on World Wide Web , WWW '01, pp.681-688, 2001. ,
DOI : 10.1145/371920.372182
Information extraction and integration : an overview, KDD, 2004. ,
GATE : A framework and graphical development environment for robust NLP tools and applications, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 2002. ,
Roadrunner : Towards automatic data extraction from large web sites, Proceedings of the 27th International Conference on Very Large Data Bases, VLDB '01, pp.109-118, 2001. ,
Clustering Web pages based on their structure, Data & Knowledge Engineering, vol.54, issue.3, pp.279-299, 2005. ,
DOI : 10.1016/j.datak.2004.11.004
Supporting top-k keyword search in xml databases, ICDE, pp.689-700, 2010. ,
Focused crawling: a new approach to topic-specific Web resource discovery, Proceedings of the eighth international conference on World Wide Web, WWW '99, pp.1623-1640, 1999. ,
DOI : 10.1016/S1389-1286(99)00052-3
Extracting Content Structure for Web Pages Based on Visual Representation, Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications, APWeb'03, pp.406-417, 2003. ,
DOI : 10.1007/3-540-36901-5_42
Focused crawling using context graphs, Proceedings of the 26th International Conference on Very Large Data Bases, VLDB '00, pp.527-534, 2000. ,
Semantically enhanced Information Retrieval: An ontology-based approach, Web Semantics: Science, Services and Agents on the World Wide Web, vol.9, issue.4, pp.434-452, 2011. ,
DOI : 10.1016/j.websem.2010.11.003
Optimal aggregation algorithms for middleware, Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems , PODS '01, pp.614-656, 2003. ,
DOI : 10.1145/375551.375567
The Lixto data extraction project -back and forth between theory and practice, PODS, 2004. ,
Extracting information from semistructured web documents, OOIS Workshops, 2002. ,
Generating finite-state transducers for semi-structured data extraction from the Web, Information Systems, vol.23, issue.8, 1998. ,
DOI : 10.1016/S0306-4379(98)00027-1
Automatic acquisition of hyponyms from large text corpora, Proceedings of the 14th conference on Computational linguistics, pp.539-545, 1992. ,
Thresher, Proceedings of the 14th international conference on World Wide Web , WWW '05, 2005. ,
DOI : 10.1145/1060745.1060762
Accessing the deep web, Commun. ACM, vol.50, pp.94-101, 2007. ,
YAGO2, Proceedings of the 20th international conference companion on World wide web, WWW '11, 2011. ,
DOI : 10.1145/1963192.1963296
URL : https://hal.archives-ouvertes.fr/inria-00591780
To search or to crawl ? : towards a query optimizer for text-centric tasks, SIGMOD Conference, pp.265-276, 2006. ,
Ir evaluation methods for retrieving highly relevant documents, SIGIR, 2000. ,
A Generalized Tree Matching Algorithm Considering Nested Lists for Web Data Extraction, SDM, 2010. ,
DOI : 10.1137/1.9781611972801.81
A survey of web information extraction systems, IEEE TKDE, 2006. ,
Wrapper induction for information extraction, p.9819266, 1997. ,
Mining data records in Web pages, Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '03, 2003. ,
DOI : 10.1145/956750.956826
Web Data Mining : Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications), 2006. ,
DOI : 10.1007/978-3-642-19460-3
Vide : A vision-based approach for deep web data extraction WICCAP : From semi-structured data to structured data, ECBS, 2004. ,
Annotating and searching web tables using entities, types and relationships, Proceedings of the VLDB Endowment, vol.3, issue.1-2, 2010. ,
DOI : 10.14778/1920841.1921005
Harvesting knowledge from web data and text, CIKM, 2010. ,
Harnessing the deep web : Present and future, 4th Biennial Conference on Innovative Data Systems Research (CIDR), 2009. ,
Ontology-driven, unsupervised instance population, Web Semantics: Science, Services and Agents on the World Wide Web, vol.6, issue.3, 2008. ,
DOI : 10.1016/j.websem.2008.04.002
Ontology-driven, unsupervised instance population, Web Semant, 2008. ,
A hierarchical approach to wrapper induction, Proceedings of the third annual conference on Autonomous Agents , AGENTS '99, 1999. ,
DOI : 10.1145/301136.301191
Patricia-practical algorithm to retrieve information coded in alphanumeric, J. ACM, vol.15, pp.514-534, 1968. ,
Topical web crawlers, ACM Transactions on Internet Technology, vol.4, issue.4, pp.378-419, 2004. ,
DOI : 10.1145/1031114.1031117
Crawling the hidden web, Proceedings of the 27th International Conference on Very Large Data Bases, VLDB '01, pp.129-138, 2001. ,
The wargo system : Semi-automatic wrapper generation in presence of complex data access modes, DEXA, 2002. ,
Information extraction. Foundations and Trends in Databases, 2008. ,
Yago : a core of semantic knowledge, WWW, 2007. ,
ViPER, Proceedings of the 14th ACM international conference on Information and knowledge management , CIKM '05, 2005. ,
DOI : 10.1145/1099554.1099672
Automatic wrapper induction from hidden-web sources with domain knowledge, Proceeding of the 10th ACM workshop on Web information and data management, WIDM '08, 2008. ,
DOI : 10.1145/1458502.1458505
URL : https://hal.archives-ouvertes.fr/inria-00337098
Learning information extraction rules for semi-structured and free text, Machine Learning, 1999. ,
Sofie : a selforganizing framework for information extraction, WWW, pp.631-640, 2009. ,
ODE, ACM Transactions on Database Systems, vol.34, issue.2, 2009. ,
DOI : 10.1145/1538909.1538914
Efficient and selftuning incremental query expansion for top-k query processing, 2005. ,
Towards a probabilistic taxonomy of many concepts, PVLDB, 2011. ,
Data extraction and label assignment for web databases, Proceedings of the twelfth international conference on World Wide Web , WWW '03, 2003. ,
DOI : 10.1145/775152.775179
Identifying syntactic differences between two programs, Software: Practice and Experience, vol.28, issue.7, pp.739-755, 1991. ,
DOI : 10.1002/spe.4380210706
Nyu : Description of the proteus/pet system as used for muc-7, Proceedings of the Seventh Message Understanding Conference, 1998. ,
Web data extraction based on partial tree alignment, Proceedings of the 14th international conference on World Wide Web , WWW '05, 2005. ,
DOI : 10.1145/1060745.1060761