The fields of artificial intelligence and computational linguistics have seen tremendous advances in recent decades. Because of this, once novel-approaches frequently are quickly superseded by newer innovations.
For this reason, the world’s premier international NLP conference held by the Association of Computational Linguistics annually recognizes seminal work that has had a particularly long-lasting influence and impact.
This week, researchers from Allen Institute for Artificial Intelligence, CEO Oren Etzioni and research scientist Jesse Dodge received the ACL’s 10-year Test-of-Time Paper Award for two separate papers published in 2012.
Etzioni, along with a team that includes AI2’s Director of Engineering Michael Schmitz, received the award for “Open Language Learning For Information Extraction.” Dodge and his co-authors received their award in the same category for “Midge: Generating Image Descriptions From Computer Vision Detections.” Both works have been heavily cited over the past decade, including hundreds of highly influential citations.
“I’m honored by this award,” said Etzioni. “I’m absolutely delighted because it is for a project that we focused on intensively for more than five years, which introduced the novel idea of Open Information Extraction.”
At the time of the papers’ publication, Etzioni was a computer science professor at University of Washington, a venture partner at the Madrona Venture Group and the founder of several startups. The following year, he was appointed to head AI2, a then-brand-new AI research institute funded by the late Microsoft co-founder Paul Allen. Etzioni became professor emeritus at UW in October 2020.
In their ground-breaking paper, Etzioni’s team introduced OLLIE, a system that extracts relations in text mediated by nouns, adjectives, and more without requiring a pre-specified vocabulary. This process was followed by a second, context-analysis step that increased precision by including contextual information from the text. This then-new approach allowed OLLIE to leverage the contextual information found in surrounding sentences, utilizing parts of speech previously unused by other information extraction (IE) systems.
The results were a major improvement over existing then-state-of-the-art IE systems such as ReVerb and WOE, transforming the field of information extraction from that point on.
“Papers come and go but only a few are remembered,” said Etzioni. “My goal as a researcher has been to have a lasting impact on the field of AI, and ultimately on the creation of beneficial AI technologies. This research project, more than any other in my career, has achieved some of that impact, so I’m beyond delighted to be part of the team receiving this award.”
This 2012 paper builds off of an earlier 2007 paper, “Open Information Extraction from the Web,” which has been cited nearly 3,000 times. The work was done by a team at UW’s computer science department that included Stephen Soderland, who passed away in 2017. Research team member Mausam, who is now a professor of computer science at Indian Institute of Technology Delhi, accepted the award, dedicating it to Soderland.
The paper published by Dodge’s team presents Midge, an image-captioning system that was one of the first of its kind. Instead of using human-annotated data, Midge generated natural language descriptions from detections by computer vision models. When the paper was published, it was early days for automated image caption generation and therefore difficult to have such work accepted for a major computational linguistics conference. Image captioning would become a far more prestigious subfield years later, due in part to the success of this paper.
Midge takes a fascinating approach to image captioning by using “triples.” For this purpose, the object, action, and spatial relationships from the image are mapped onto the nouns, verbs and prepositions in the image description.
The system was trained using 700,000 Flickr images along with their associated descriptions. The result was a “fully automatic vision-to-language system that generated syntactically and semantically well-formed descriptions with naturalistic variation.” Again, this approach was a major improvement over other AI methods available at that time.
“We really cared about natural language generation, even though at the time research on it wasn’t valued in the community,” Dodge said. “It was really hard to publish papers on generation at the main NLP conferences, but Margaret Mitchell (the first author) did a really great job writing up this work and it was accepted.”
Dodge added: “Margaret drove image captioning research for years, and eventually other people started to work on it too. It was her continuing to work on this topic that led to this award.”
These awards continue a streak of long-term impact honors at AI2 in recent years. In 2021, Senior Researcher and UW professor Yejin Choi and coauthors were honored with an ACL 10-year Test-of-Time Award for “Finding Deceptive Opinion Spam by Any Stretch of the Imagination.” The paper introduced a classifier capable of detecting deceptive opinion spam with very high accuracy.
Another team that included Choi also received the Computer Vision Foundation’s 2021 Longuet-Higgins Prize for the 2011 paper, “Baby talk: Understanding and generating simple image descriptions,” which continued the quest to automatically develop correct, relevant natural language descriptions of images.
Earlier this year, Oren Etzioni and coauthors were honored with the Most Impact Paper Award at IUI 2022 in recognition of their important 2003 publication, “Towards a theory of natural language interfaces to databases.” This paper introduced a theoretical framework for reliable natural language interfaces along with a new system called Precise NLI that reduces the semantic interpretation challenge in NLIs to a graph matching problem. By expertly mapping natural language questions to their corresponding SQL queries for a broad class of natural language questions, this research was foundational to the way we access and interact with databases today.
In another recent honor, the Most Impact Paper Award at IUI 2019 went to Chief Scientist and General Manager of AI2’s Semantic Scholar Dan Weld and coauthor Krzysztof Gajos for their 2004 paper “SUPPLE: Automatically generating user interfaces.” SUPPLE introduced an algorithm that could generate user interfaces adapted to people’s motor and vision abilities, providing revolutionary benefits to users with physical disabilities by automatically modulating software interfaces according to the specific needs of the user.
These are just a few of the two dozen papers that have won awards for AI2. Etzioni reflected on the significant success of the institute.
“At AI2, we have strived for an environment as open and collaborative as the one I was part of at UW Computer Science (now called The Allen School), combined with a relentless focus on impact,” he said. “This has attracted a team that is both collegial and innovative.”
“Open Language Learning For Information Extraction” was published in EMNLP by the team of Mausam, Michael Schmitz, Robert Bart, Stephen Soderland, and Oren Etzioni. “Midge: Generating Image Descriptions From Computer Vision Detections” was published in EACL by Margaret Mitchell, Jesse Dodge, Amit Goyal, Kota Yamaguchi, K. Stratos, Xufeng Han, Alyssa C. Mensch, A. Berg, Tamara L. Berg, and Hal Daumé.