Those multimodal networks show a tight clustering of related concepts while keeping ideas a few steps apart, a layout that supports quick associations and flexible retrieval. The researchers also find that highly connected concepts are rarer than a perfect power law would predict, pointing to limits on how central any single idea can become. Importantly, these network patterns relate to human behavior: they predict similarity judgments and even reaction times in word tasks, suggesting that the structure of environmental signals helps shape cognitive performance.
Thinking about semantic organization this way has practical implications for learning, accessibility, and technology. If meaning emerges from repeated co-occurrence across senses, educational tools and assistive systems might better scaffold connections by presenting information in complementary modalities. The article invites questions about how different environments—cultures, media, or developmental stages—might produce distinct network fingerprints and how those fingerprints influence growth, inclusivity, and the design of systems that support human potential.
Abstract
Humans organize semantic knowledge into complex networks that encode relations between concepts. The structure of those networks has broad implications for human cognitive processes, and for theories of semantic development. Evidence from large lexical networks such as those derived from word associations suggest that semantic networks are characterized by high sparsity and clustering while maintaining short average paths between concepts, a phenomenon known as a “small-world” network. It has also been argued that those networks are “scale-free,” meaning that the number of connections (or degree) between concepts follows a power-law distribution, whereby most concepts have few connections, while a few have many. However, the scale-free property is still debated, and the extent to which the lexical evidence reflects the naturally occurring semantic regularities of the environment has not been investigated systematically. To address this, we collected and analyzed semantic descriptors, human evaluations, and similarity judgments from four large datasets of naturalistic stimuli across three modalities (visual, auditory, and audio-visual) comprising 7916 stimuli and 610,841 human responses. By connecting concepts that co-occur as descriptors of the same stimuli, we construct “multimodal” semantic networks. We show that these networks exhibit a clear small-world structure with a degree distribution that is best captured by a truncated power law (i.e., the most-connected concepts are less common than predicted by a perfect power law). We further show that these networks are predictive of human sensory judgments on these domains, as well as reaction times in an independent lexical decision task. Finally, we show that multimodal networks also share overlapping themes with previously analyzed lexical networks, which upon a more rigorous reanalysis are revealed to be truncated too. Our findings shed new light on the origins of the structure of semantic networks by tying it to the semantic regularities of the environment.