VOLUME XL | APR 2020

Constructing Semantic Models From Words, Images, and Emojis

Abstract
A number of recent models of semantics combine linguistic information, derived from text corpora, and visual information, derived from image collections, demonstrating that the resulting multimodal models are better than either of their unimodal counterparts, in accounting for behavioral data. Empirical work on semantic processing has shown that emotion also plays an important role especially in abstract concepts; however, models integrating emotion along with linguistic and visual information are lacking. Here, we first improve on visual and affective representations, derived from state‐of‐the‐art existing models, by choosing models that best fit available human semantic data and extending the number of concepts they cover. Crucially then, we assess whether adding affective representations (obtained from a neural network model designed to predict emojis from co‐occurring text) improves the model’s ability to fit semantic similarity/relatedness judgments from a purely linguistic and linguistic–visual model. We find that, given specific weights assigned to the models, adding both visual and affective representations improves performance, with visual representations providing an improvement especially for more concrete words, and affective representations improving especially the fit for more abstract words.

Farah Al-Mansour

Farah is a Middle Eastern-Canadian sociologist from Ottawa, examining the role of social structures in fostering personal growth. Her passion is highlighting stories of human adaptability, and promoting inclusive group strategies for realizing untapped potential.

Read Full Article (External Site)