|
Canessa, E., & Chaigneau, S. E. (2020). Mathematical regularities of data from the property listing task. J. Math. Psychol., 97, 19 pp.
Abstract: To study linguistically coded concepts, researchers often resort to the Property Listing Task (PLT). In a PLT, participants are asked to list properties that describe a concept (e.g., for DOG, subjects may list “is a pet”, “has four legs”, etc.), which are then coded into property types (i.e., superficially dissimilar properties such as “has four legs” and “is a quadruped” may be coded as “four legs”). When the PLT is done for many concepts, researchers obtain Conceptual Properties Norms (CPNs), which are used to study semantic content and as a source of control variables. Though the PLT and CPNs are widely used across psychology, there is a lack of a formal model of the PLT, which would provide better analysis tools. Particularly, nobody has attempted analyzing the PLT's listing process. Thus, in the current work we develop a mathematical description of the PLT. Our analyses indicate that several regularities should be found in the observable data obtained from a PLT. Using data from three different CPNs (from 3 countries and 2 different languages), we show that these regularities do in fact exist and generalize well across different CPNs. Overall, our results suggest that the description of the regularities found in PLT data may be fruitfully used in the study of concepts. (C) 2020 Elsevier Inc. All rights reserved.
|
|
|
Canessa, E., Chaigneau, S. E., Lagos, R., & Medina, F. A. (2021). How to carry out conceptual properties norming studies as parameter estimation studies: Lessons from ecology. Behav. Res. Methods, 53, 354–370.
Abstract: Conceptual properties norming studies (CPNs) ask participants to produce properties that describe concepts. From that data, different metrics may be computed (e.g., semantic richness, similarity measures), which are then used in studying concepts and as a source of carefully controlled stimuli for experimentation. Notwithstanding those metrics' demonstrated usefulness, researchers have customarily overlooked that they are only point estimates of the true unknown population values, and therefore, only rough approximations. Thus, though research based on CPN data may produce reliable results, those results are likely to be general and coarse-grained. In contrast, we suggest viewing CPNs as parameter estimation procedures, where researchers obtain only estimates of the unknown population parameters. Thus, more specific and fine-grained analyses must consider those parameters' variability. To this end, we introduce a probabilistic model from the field of ecology. Its related statistical expressions can be applied to compute estimates of CPNs' parameters and their corresponding variances. Furthermore, those expressions can be used to guide the sampling process. The traditional practice in CPN studies is to use the same number of participants across concepts, intuitively believing that practice will render the computed metrics comparable across concepts and CPNs. In contrast, the current work shows why an equal number of participants per concept is generally not desirable. Using CPN data, we show how to use the equations and discuss how they may allow more reasonable analyses and comparisons of parameter values among different concepts in a CPN, and across different CPNs.
|
|
|
Canessa, E., Chaigneau, S. E., Moreno, S., & Lagos, R. (2020). Informational content of cosine and other similarities calculated from high-dimensional Conceptual Property Norm data. Cogn. Process., 21, 601–614.
Abstract: To study concepts that are coded in language, researchers often collect lists of conceptual properties produced by human subjects. From these data, different measures can be computed. In particular, inter-concept similarity is an important variable used in experimental studies. Among possible similarity measures, the cosine of conceptual property frequency vectors seems to be a de facto standard. However, there is a lack of comparative studies that test the merit of different similarity measures when computed from property frequency data. The current work compares four different similarity measures (cosine, correlation, Euclidean and Chebyshev) and five different types of data structures. To that end, we compared the informational content (i.e., entropy) delivered by each of those 4 x 5 = 20 combinations, and used a clustering procedure as a concrete example of how informational content affects statistical analyses. Our results lead us to conclude that similarity measures computed from lower-dimensional data fare better than those calculated from higher-dimensional data, and suggest that researchers should be more aware of data sparseness and dimensionality, and their consequences for statistical analyses.
|
|
|
Canessa, E., Chaigneau, S. E., Moreno, S., & Lagos, R. (2022). CPNCoverageAnalysis: An R package for parameter estimation in conceptual properties norming studies. Behav. Res. Methods, Early Access.
Abstract: In conceptual properties norming studies (CPNs), participants list properties that describe a set of concepts. From CPNs, many different parameters are calculated, such as semantic richness. A generally overlooked issue is that those values are
only point estimates of the true unknown population parameters. In the present work, we present an R package that allows us to treat those values as population parameter estimates. Relatedly, a general practice in CPNs is using an equal number of participants who list properties for each concept (i.e., standardizing sample size). As we illustrate through examples, this procedure has negative effects on data�s statistical analyses. Here, we argue that a better method is to standardize coverage (i.e., the proportion of sampled properties to the total number of properties that describe a concept), such that a similar coverage is achieved across concepts. When standardizing coverage rather than sample size, it is more likely that the set of concepts in a CPN all exhibit a similar representativeness. Moreover, by computing coverage the researcher can decide whether the
CPN reached a sufficiently high coverage, so that its results might be generalizable to other studies. The R package we make available in the current work allows one to compute coverage and to estimate the necessary number of participants to reach a target coverage. We show this sampling procedure by using the R package on real and simulated CPN data.
|
|