In conjunction with the 37th Annual ACM SIGIR Conference (SIGIR 2014)


Recently, significant progress has been made in research on what we call semantic matching (SM), in Web search, question answering, online advertisement, cross language information retrieval, and other tasks. Advanced technologies based on machine learning have been developed.

Let us take Web search as example of the problem that also pervades the other tasks. When comparing the textual content of query and documents, Web search still heavily relies on the term-based approach, where the relevance scores between queries and documents are calculated on the basis of the degree of matching between query terms and document terms. This simple approach works rather well in practice, partly because there are many other signals in web search (hypertext, user logs, etc.), that complement it. However, when considering the long tail of web searches, it can suffer from data sparseness, e.g., Trenton does not match New Jersey Capital. Query document mismatches occur when searcher and author use different terms (representations), and this phenomenon is prevalent due to the nature of human language.

The fundamental reason for mismatch is that little language analysis is conducted in search. A more realistic approach beyond bag-of-words, referred to as semantic matching (SM), is to conduct deeper query and document analysis to encode text with richer representations and then perform query-document matching with such representations by extracting and utilizing the semantic information. SM is expected to solve the query document mismatch challenge.

The need for SM is particularly strong when we consider Information Retrieval tasks beyond query-document matching: for instance, successfully answering a complex informational query demands not only retrieving a set of appropriate documents, but also aggregating and synthesizing the information in the documents, which is relevant for the query. For instance, Online Reputation Management usually implies finding, understanding and aggregating thousands of facts, comments and opinions about an entity in order to understand threats to its reputation at a given point in time. This cannot be done without SM techniques.

SM can be extended to phrases or sentences. Indeed, there are initiatives such as the semantic textual similarity (STS) evaluation campaign, which go beyond term level matching and aim at capturing the semantic relations between entire phrases as well as those between entire sentences.

The main purpose of the workshop SMIR 2014 is to bring together IR and NLP researchers working on or interested in semantic matching, to share latest research results, express opinions on the related issues, and discuss future directions.

Paper Submission

Papers should be submitted electronically via the submission site ( Submitted papers should be in the ACM Conference style, see the ACM template page, and may not exceed 8 pages. All submissions will be reviewed by at least three members of the program committee. The review is double-blind; please anonymize your submission. The papers accepted at the workshop will be published in the proceedings at CEUR.

Topics of Interests

We solicit submissions on all aspects of semantic matching in information retrieval and natural language processing. Particular areas of interest include, but not limited to:

  • Semantic representation of natural language
  • Semantic similarity between natural language expressions, phrases, sentences and paragraphs
  • Semantic parsing of natural language
  • Semantic matching in search
  • Semantic matching in question answering
  • Semantic matching in online advertisement
  • Semantic matching in image retrieval
  • Semantic matching in entity link
  • Semantic matching in online reputation management
  • Semantic matching in paraphrasing and textual entailment
  • Semantic matching in cross-language retrieval
  • Semantic matching in recommendation systems
  • Supervised and unsupervised machine learning techniques for matching, including deep learning, kernel methods
  • Semantic matching using sentence/text/document structure
  • Semantic matching using link open data

Accepted Papers

The online proceedings are available now.

Invited Paper Presentations

  •  Jun Araki and Jamie Callan. An Annotation Similarity Model in Passage Ranking for Historical Fact Validation.
  •  Hadas Raviv, Oren Kurland, and David Carmel. Query Performance Prediction for Entity Retrieval.
  •  Milad Shokouhi, Rosie Jones, Umut Ozertem, Karthik Raghunathan, and Fernando Diaz. Mobile Query Reformulations.
  •  Damiano Spina, Julio Gonzalo, and Enrique Amigo. Learning Similarity Functions for Topic Detection in Online Reputation Monitoring.
  •  Qi Zhang, Jihua Kang, Jin Qian, and Xuanjing Huang. Continuous Word Embeddings for Detecting Local Text Reuses at the Semantic Level.
  •  Jiashu Zhao and Jimmy Huang. An Enhanced Context-sensitive Proximity Model for Probabilistic Information Retrieval.

Program (July 11, 2014)

09:00~09:10 Introduction to SMIR 2014 workshop
 Julio Gonzalo

09:10~10:00: Keynote speech I: Distributional Semantics for IR
 Eduard Hovy

10:00~10:30: Coffee break

10:30~10:55: Talk I: Semantic Matching in Search
 Hang Li

10:55~11:45: Keynote speech II: Leveraging Cold-Start Knowledge Base Population for Information Access
 Douglas W. Oard

11:45~13:15: Lunch

13:15~13:40: Talk II: Semantic Matching using Kernel Methods
 Alessandro Moschitti

13:40~13:55: Poster booster

13:55~14:55: Poster session

14:55~15:25: Coffee break

15:25-16:15: Keynote speech III: Six Tweets per Second
 Maarten de Rijke

16:15-17:05: Panel
 Eduard Hovy, Douglas W. Oard, and Maarten de Rijke


Julio Gonzalo, UNED, Spain

Hang Li, Noah's Ark Lab, Huawei, Hong Kong

Alessandro Moschitti, Qatar Computing Research Institute, Qatar

Jun Xu, Noah's Ark Lab, Huawei, Hong Kong

Program Committee

Apoorv Agarwal, Columbia University

Eneko Agirre, University of the Basque Country

Marco Baroni, University of Trento

Roberto Basili, University of Rome

Chris Biemann, University of Darmstadt

Faisal Chowdhury, IBM Watson Research Center

Danilo Croce, University of Rome

Mona Diab, George Washington University

Wei Gao, Qatar Computing Research Institute

Michael Glass, IBM Watson Research Center

Jiafeng Guo, Chinese Academy of Science

Zhengdong Lv, Noah's Ark Lab, Huawei

Walid Magdy, Qatar Computing Research Institute

Donald Metzler, Google Inc.

Rada Mihalcea, University of Michigan

Siddharth Patwardhan, IBM Watson Research Center

Aliaksei Severyn, University of Trento

Yangqiu Song, UIUC

Quan Wang, Chinese Academy of Science

Ji-rong Wen, Remin University

Important Dates

  • Paper Submission Due: May 10, 2014
  • Author Notification Date: May 30, 2014
  • Camera Ready: June 15, 2014

Contact Us