Who believes they are good navigators? A machine learning pipeline highlights the impact of gender, commuting time, and education

Published in Machine Learning with Applications, 2022

Recommended citation: Cheng, Y., He, C., Hegarty, M., & Chrastil, E. R. (2022). Who believes they are good navigators? A machine learning pipeline highlights the impact of gender, commuting time, and education. Machine Learning with Applications, 100419.

Large scale digital data, which are becoming more prevalent, offer the potential to alleviate reproducibility concerns in psychology research findings. However, large scale digital data are not sufficient in and of themselves, thus necessitating the need for the development of machine learning (ML) pipelines that are capable of handling high dimensional datasets at scale. Such ML-based methodologies enable the analysis of complex relationships, which allows for the consideration of complicated demographics, a factor that is likely to play a role in the generalizability of research. We introduce a novel ML pipeline and demonstrate its potential on a large-scale digital dataset, Sea Hero Quest, a mobile game with data from nearly 770,000 players (ages 19 to 70, men N = 404,455, women N = 367,173). We analyzed how demographics are related to self-reported navigation ability using exploratory analysis, supervised and unsupervised learning. The results suggest that gender is the most important demographic factor in predicting self-reported navigation ability, followed by daily commuting time, age, and education, such that men (compared to women), long commuters (compared to those whose commuting time is shorter than 1 h), and older people with tertiary education (compared to younger people with secondary education) tended to evaluate themselves as better navigators. The large-scale dataset and ML pipeline capture influential factors, such as daily commuting time and education level, which have often been overlooked and are difficult to investigate with in-laboratory studies that use limited samples and traditional analytical techniques.

Download paper here