Recently, significant progress has been made in research on what we call semantic matching (SM), in Web search, question answering, online advertisement, cross language information retrieval, and other tasks. Advanced technologies based on machine learning have been developed.
Let us take Web search as example of the problem that also pervades the other tasks. When comparing the textual content of query and documents, Web search still heavily relies on the term-based approach, where the relevance scores between queries and documents are calculated on the basis of the degree of matching between query terms and document terms. This simple approach works rather well in practice, partly because there are many other signals in web search (hypertext, user logs, etc.), that complement it. However, when considering the long tail of web searches, it can suffer from data sparseness, e.g., Trenton does not match New Jersey Capital. Query document mismatches occur when searcher and author use different terms (representations), and this phenomenon is prevalent due to the nature of human language.
The fundamental reason for mismatch is that little language analysis is conducted in search. A more realistic approach beyond bag-of-words, referred to as semantic matching (SM), is to conduct deeper query and document analysis to encode text with richer representations and then perform query-document matching with such representations by extracting and utilizing the semantic information. SM is expected to solve the query document mismatch challenge.
The need for SM is particularly strong when we consider Information Retrieval tasks beyond query-document matching: for instance, successfully answering a complex informational query demands not only retrieving a set of appropriate documents, but also aggregating and synthesizing the information in the documents, which is relevant for the query. For instance, Online Reputation Management usually implies finding, understanding and aggregating thousands of facts, comments and opinions about an entity in order to understand threats to its reputation at a given point in time. This cannot be done without SM techniques.
SM can be extended to phrases or sentences. Indeed, there are initiatives such as the semantic textual similarity (STS) evaluation campaign, which go beyond term level matching and aim at capturing the semantic relations between entire phrases as well as those between entire sentences.
The main purpose of the workshop SMIR 2014 is to bring together IR and NLP researchers working on or interested in semantic matching, to share latest research results, express opinions on the related issues, and discuss future directions.
Papers should be submitted electronically via the submission site (https://www.easychair.org/conferences/?conf=smir2014). Submitted papers should be in the ACM Conference style, see the ACM template page, and may not exceed 8 pages. All submissions will be reviewed by at least three members of the program committee. The review is double-blind; please anonymize your submission. The papers accepted at the workshop will be published in the proceedings at CEUR.
Topics of Interests
We solicit submissions on all aspects of semantic matching in information retrieval and natural language processing. Particular areas of interest include, but not limited to:
09:00~09:10 Introduction to SMIR 2014 workshop
Julio Gonzalo
09:10~10:00: Keynote speech I: Distributional Semantics for IR 10:00~10:30: Coffee break 10:30~10:55: Talk I: Semantic Matching in Search 10:55~11:45: Keynote speech II: Leveraging Cold-Start Knowledge Base Population for Information Access 11:45~13:15: Lunch 13:15~13:40: Talk II: Semantic Matching using Kernel Methods 13:40~13:55: Poster booster 13:55~14:55: Poster session 14:55~15:25: Coffee break 15:25-16:15: Keynote speech III: Six Tweets per Second 16:15-17:05: Panel Julio Gonzalo, UNED, Spain Hang Li, Noah's Ark Lab, Huawei, Hong Kong Alessandro Moschitti, Qatar Computing Research Institute, Qatar Jun Xu, Noah's Ark Lab, Huawei, Hong Kong Apoorv Agarwal, Columbia University Eneko Agirre, University of the Basque Country Marco Baroni, University of Trento Roberto Basili, University of Rome Chris Biemann, University of Darmstadt Faisal Chowdhury, IBM Watson Research Center Danilo Croce, University of Rome Mona Diab, George Washington University Wei Gao, Qatar Computing Research Institute Michael Glass, IBM Watson Research Center Jiafeng Guo, Chinese Academy of Science Zhengdong Lv, Noah's Ark Lab, Huawei Walid Magdy, Qatar Computing Research Institute Donald Metzler, Google Inc. Rada Mihalcea, University of Michigan Siddharth Patwardhan, IBM Watson Research Center Aliaksei Severyn, University of Trento Yangqiu Song, UIUC Quan Wang, Chinese Academy of Science Ji-rong Wen, Remin University
Eduard Hovy
Hang Li
Douglas W. Oard
Alessandro Moschitti
Maarten de Rijke
Eduard Hovy, Douglas W. Oard, and Maarten de Rijke
Organizers
Program Committee
Important Dates
Contact Us