Skip to content

Tag: Crowdsourcing

Designing for Information Quality in the Era of Repurposable Crowdsourced User-Generated Content

Conventional wisdom holds that expert contributors provide higher quality user-generated content (UGC) than novices. Using the cognitive construct of selective attention, we argue that this may not be the case in some crowd-sourcing UGC applications. We argue that crowdsourcing systems that seek participation mainly from contributors who are experienced or have high levels of proficiency in the crowdsourcing task will gather less diverse and therefore less repurposable data. We discuss the importance of the information diversity dimension of information quality for the use and repurposing of UGC and provide a theoretical basis for our position, with the goal of stimulating empirical research.

What Makes a Good Crowd? Rethinking the Relationship between Recruitment Strategies and Data Quality in Crowdsourcing

Conventional wisdom dictates that the quality of data collected in a crowdsourcing project is positively related to how knowledgeable the contributors are. Consequently, numerous crowdsourcing projects implement crowd recruitment strategies that reflect this reasoning. In this paper, we explore the effect of crowd recruitment strategies on the quality of crowdsourced data using classification theory. As these strategies are based on knowledge, we consider how a contributor’s knowledge may affect the quality of data he or she provides. We also build on previous research by considering relevant dimensions of data quality beyond accuracy and predict the effects of available recruitment strategies on these dimensions of data quality.

Do Crowds Go Stale? Exploring the Effects of Crowd Reuse on Data Diversity

Crowdsourcing is increasingly used to engage people to contribute data for a variety of purposes to support decision-making and analysis. A common assumption in many crowdsourcing projects is that experience leads to better contributions. In this research, we demonstrate limits of this assumption. We argue that greater experience in contributing to a crowdsourcing project can lead to a narrowing in the kind of data a contributor provides, causing a decrease in the diversity of data provided. We test this proposition using data from two sources-comments submitted with contributions in a citizen science crowdsourcing project, and three years of online product reviews. Our analysis of comments provided by contributors shows that the length of comments decreases as the number of contributions increases. Also, we find that the number of attributes reported by contributors decreases as they gain experience. These finding support our prediction, suggesting that the diversity of data provided by contributors declines over time.

Can Expertise Impair the Quality of Crowdsourced Data?

It is not uncommon for projects that collect crowdsourced data to be commissioned with incomplete knowledge of data contributors, data consumers, and/or the purposes for which the data collected are going to be used. Such unanticipated uses and users of data form the basis for open information environments (OIEs), and the information collected through systems designed to gather content from users have high quality when they are complete, accurate, current and provided in an appropriate format. However, as it is assumed that experts provide higher quality information, many types of OIEs have been designed for experts. In this paper, we question the appropriateness of this assumption in the context of citizen science systems – an exemplary category of OIE. We begin by arguing that experts are primarily efficient rule-based classifiers, which implies that they selectively focus only on attributes relevant to their classification task and ignore others. Drawing from existing literature, we posit that experts’ focus on only diagnostic features of an entity leads to a learned inattention to non-diagnostic attributes. This may improve the accuracy of the information provided, but at the expense of its completeness, currency, format and ultimately the novelty (for unanticipated uses) of information provided. On the other hand, we predict that non-experts and amateurs may use rules to a lesser extent, resulting in less selective attention and leading them to provide more novel information with less trade-off of one dimension of information quality for another. We propose hypotheses derived from this view, and outline two experiments we have designed to test them across four dimensions of information quality. We conclude by discussing the potential implications of this work for the design of crowdsourcing platforms and the recruitment of experts, amateurs, or novice data contributors in studies of data quality in crowdsourcing settings.