Cognitive Science Program presents

What Big Data Can Tell Us about What Words Mean: Saving Weighted Dictionaries with Word Embeddings

Dustin Stoltz
Department of Sociology and Anthropology

VIRTUAL:https://lehigh.zoom.us/j/92365141692

Much effort has been expended building weighted dictionaries based on hand-coding words' semantic content. Such pre-made dictionaries are used for a variety of tasks, such as sentiment analysis, extracting cognitive content, measuring the abstractness of text, and for identifying the sensorimotor norms of words. Despite being intuitive methods for identifying motives, desires, ideas, connotations, and themes, they cannot overcome the problems associated with the long-tail distribution of words in a corpus. Furthermore, even very large weighted dictionaries -- such as, crowd-sourced dictionaries of about 40 thousand words -- will encounter the problem of out-of-vocabulary words. Text analysts can overcome these problems using word embeddings. Embeddings use the co-occurrence of words in large corpora to assign words locations in a multi-dimensional space of meaning. This can be used to weight words by measuring their distance from anchors in this space of meaning. I propose a method which combines the strengths of these two approaches bydefining and validating these anchors using hand-weighted dictionaries.

Event Details

See Who Is Interested

  • Jorgo Damtew Tesfa
  • Amin Hosseiny Marani
  • William Hollier
  • Kai Teh Tzeng

4 people are interested in this event

Lehigh University Events Calendar Powered by the Localist Community Event Platform © All rights reserved