Do You Think It's Biased? How To Ask For The Perception Of ... - Bela Gipp

Preprint from:

T.Spinde, C. Kreuter, W. Gaissmaier, F. Hamborg, B. Gipp, H. Giese, "Do You Think It's Biased? How To Ask For The Perception Of Media Bias", in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), DOI: 10.1109/JCDL52503.2021.00018, 2021.

Do You Think It's Biased? How To Ask For The Perception Of Media Bias

Timo Spinde University of Konstanz

Konstanz, Germany Timo.Spinde@uni-konstanz.de

Felix Hamborg University of Konstanz

Konstanz, Germany Felix.hamborg@uni-konstanz.de

Christina Kreuter University of Konstanz

Konstanz, Germany Christina.kreuter@uni-konstanz.de

Bela Gipp University of Wuppertal

Wuppertal, Germany gipp@uni-wuppertal.de

Wolfgang Gaissmaier University of Konstanz

Konstanz, Germany Gaissmaier@uni-konstanz.de

Helge Giese University of Konstanz

Konstanz, Germany Helge.Giese@uni-konstanz.de

Abstract-- Media coverage possesses a substantial effect on the public perception of events. The way media frames events can significantly alter the beliefs and perceptions of our society. Nevertheless, nearly all media outlets are known to report news in a biased way. While such bias can be introduced by altering the word choice or omitting information, the perception of bias also varies largely depending on a reader's personal background. Therefore, media bias is a very complex construct to identify and analyze. Even though media bias has been the subject of many studies, previous assessment strategies are oversimplified, lack overlap and empirical evaluation. Thus, this study aims to develop a scale that can be used as a reliable standard to evaluate article bias. To name an example: Intending to measure bias in a news article, should we ask, "How biased is the article?" or should we instead ask, "How did the article treat the American president?". We conducted a literature search to find 824 relevant questions about text perception in previous research on the topic. In a multi-iterative process, we summarized and condensed these questions semantically to conclude a complete and representative set of possible question types about bias. The final set consisted of 25 questions with varying answering formats, 17 questions using semantic differentials, and six ratings of feelings. We tested each of the questions on 190 articles with overall 663 participants to identify how well the questions measure an article's perceived bias. Our results show that 21 final items are suitable and reliable for measuring the perception of media bias. We publish the final set of questions on .

Keywords--news bias, survey creation, perception of bias

I. INTRODUCTION

News media play a fundamental role in the democratic process. Many people consider news "articles a reliable source of information about current events, even though it is

This work was supported by the Hanns-Seidel-Foundation and the Federal Ministry of Education and Research of Germany.

10.1109/JCDL52503.2021.00018 ?2021 IEEE

also broadly believed and academically confirmed that news outlets are biased" [1]. Given the trust readers put into news articles "and the significant influence of media outlets on society and public opinion, media bias may potentially lead to the adoption of biased views by readers" [2]. The news, therefore, play an essential part in forming public opinion on political and other current issues [3]. Simultaneously, unrestricted access to unbiased information about any topic is crucial to develop a balanced viewpoint on different events [1]. The severity of biased news coverage is "amplified further by the fact that regular news consumers are typically not fully aware of its degree and scope" [2].

Even though media bias and its perception are prevalent and relevant issues in society and research, its reliable measurement poses a challenge. Recent computer scientific research aiming to build automated media bias detection systems reported that building a high-quality bias data set is difficult because readers struggle to agree on biased text documents [1], [4]. Many individual factors affect the perception of bias, such as topic knowledge, political ideology, or simply age and education [1]. Phenomena like the Hostile Media Effect (HME, describing the tendency to perceive media coverage of an issue as biased against one's views [5]) might also play a role, making it hard to objectively determine whether and how an article or clip is biased.

Throughout the different studies on media bias perception and identification, various definitions and methods were used to measure media bias. Still, there is a major lack of agreement about how study participants or readers react towards bias depending on how they were asked. Most existing studies focus only on specific aspects, for example, the already mentioned HME [6]?[8]. Some studies asked questions related to particular articles [9],

while others chose a more general approach [10]. Some ask about bias directly (e.g., "Regarding the web page that you viewed, would you say the portrayal of the presidential candidates was strictly neutral or biased in favor of one side or the other? "[11]), and some indirectly [2], [12], [13]. Some researchers tried experiments [14], while others use surveys [10].

While there is some overlap in questions across multiple studies (for example, questions similar to "Would you say that the content in this article was strictly neutral, or was it biased in favor of one side or the other?" [15] were used in different studies [11], [16], [17]), there is a large variety in methods and definitions used in prior research that limits studies' comparability on media bias perception. Furthermore, a standard of assessing media bias of articles as a general construct is essential to train automated classifiers or build data sets: Without a clear measurement of the construct, no classifier in the related areas can reach its full potential. Our project, therefore, aims to develop questions that can be used as a reliable standard to perform new analyses or reevaluate past studies, independent of the research area.

Our primary goal and contribution is to develop a reliable scale to evaluate articles in terms of media bias. We, therefore, conducted a literature review to find 824 relevant questions about text perception in previous research on the topic, which we summarized and condensed in a multiiterative process to a final set of 48 questions. We reduced the number of questions even more and uncovered communalities between questions empirically using exploratory factor analysis (EFA), a data reduction approach. We assess the perception of bias among various articles with a known bias rating, given different questions. The scale aims to improve the data collection on media bias. This paper describes the question testing process and summarizes and transparently visualizes the question set.

We organize the rest of the paper as follows: First, in Section 2, we present a literature review on existing studies about media bias. In Section 3 we describe our methodology, followed by our results in Section 4. Finally, we give an outlook on future work and a summary of the current project in the sections 5 and 6. We want to mention that we use the words "question" and "item" interchangeably during our work.

II. RELATED WORK

A. Public bias ratings

Different platforms try to address media bias in news outlets. For instance, the news aggregator Allsides publishes bias ratings for various news outlets1. The bias rating by Allsides represents subjective judgments made by their readers. They are organized in five classes [18]:

Left - Lean Left - Center - Lean Right - Right.

1 See , accessed on 2021-01-08.

Allsides combines different methods to create their ratings. They indicate which outlets and articles have been evaluated with which methods on each source page [18]. Altogether, they use the following methods2:

1. Blind Bias Survey. Allsides gathers readers "from all parts of the political bias spectrum to read and rate articles and headlines blindly -- without telling them the source of the content. (...) To assure that the survey audience reflects the social and political diversity of the US, they then normalize the data" [18].

2. Editorial Review. To some extent, the Allsides editorial staff reviewed "the works of any source. The reviews always include a diversity of individuals covering the full range of political bias from left to right" [18].

3. Third-Party Analysis. The third-party "analysis may include academic research, surveys, or analysis from third parties that have a published and transparent system for evaluating the bias of multiple sources" [18].

4. Independent Review. An AllSides "editor, or multiple editors, reviewed content from this source and came to a general conclusion on its bias; they also investigated what the media and other sources, both partisan and nonpartisan, reported about the political leanings of this source. This method is frequently used for initial bias ratings before more robust methods can be applied, or ratings for which the bias of an outlet is relatively easy to discern" [18].

5. Community Feedback. "For every article posted on Allsides, a user can indicate whether or not he agrees with the ratings. While the ratings are not determined by community votes. they are used to check the performance of the current ratings." [18]

B. Literature about bias

In various research areas, text perception and particularly bias detection have been investigated. For example, the influence of biased reporting towards citizens' use of traditional, citizen, and social media has been researched [19]. Other projects focused on hostile media perceptions [20], the influence of user-related variables on the perceptions of bias [8], [9], [15], [21]?[24], or the perception of bias in particular topics [10]. Topicdependent text perception [14], user comments [11], [25], and visual features [26] were also main interests in the existing research.

Apart from political or communication studies and psychology, there has especially been an increasing number of computer science publications about the automated detection of media bias or the related concepts of framing and sentiment analysis [1], [4], [12], [13], [27]?[33].

2 More about the methods can be found on

Independent of the research area, all the research mentioned above questioned either students, experts, or crowdsource workers about their perception of bias on a word, sentence, article, or image level. However, almost none reported a detailed process description on how they created the respective evaluation surveys or chose the questions that were handed to the participants. Also, especially in the computer science studies, except for the study by Spinde et al. [1], none asked for the participant's personal background. Still, as shown in some of the work from psychology and communication science, the personal background seems to be crucial information needed to understand how to interpret and use the collected feedback annotations. The data sets used in the various computer scientific approaches and projects did not reflect media bias's complexity. Instead, they primarily focused on technical approaches. We believe that bias can only be uncovered in an interdisciplinary approach and that data quality and comparability play a crucial role in training any classifier. Therefore, even more, a common and reliably evaluated question set is necessary.

III. METHODOLOGY

A. Literature search

To systematically find items relevant to media bias perception, we conducted an extensive search on PsychInfo and Google Scholar. Mainly, the search term "Perception of Media Bias" was used to identify relevant studies on both literature platforms. We excluded articles in languages other than English and German. We manually screened headlines and keywords for their connection to media, media bias, and media perception. If in doubt, we included articles to avoid missing relevant studies. From an original set of 405 potentially relevant papers, after extensive reading and abstract checking, we excluded all but 107 studies, for which we tried to obtain full texts. We excluded 29 more because the full-text reading showed a nonsufficient connection to the perception of media bias. We excluded another 17 studies because they did not use any items on the perception of media or media bias. Overall, we included 74 studies in our collection to create our questionnaire on media bias perception3.

B. Item collection and selection

Our paper collection led to a list of media bias and related variables that included the item's source, the response format if mentioned, and other important information. If available, we copied the original items from the supplementary material provided by the authors. If no supplemental materials were available, we extracted items from the articles' method and results sections. When the original wording of the item was named, the original wording was added to the list. If not, we used the provided description to reconstruct the wording as good as possible. This process resulted in a list of 824 items, which we then

3 They are included in the file upload at and in the tree visualization on , which we describe both in the remainder of this paper.

continued to reduce and filter in a process of three iterations. We illustrate the process in Fig.1. It is based on four main criteria, which we will summarize afterward:

1. The items relate to media bias.

2. The items cover different aspects of media bias.

3. The items measure media bias on an article level.

4. The items are usable for visual analog scales (VAS4).

At first, in the categorization iteration, we organized the questions into general categories (e.g., Political Background, Demographics, Perception of Media Bias, Influence of Media Bias). We only included items categorized into "Perception of Media Bias" and "Influence of Media Bias" for creating a list of potential items (419). The other categories were revisited later to find relevant background information items, for example, on demographics or political background.

To further reduce the number of items for assessment, we grouped items into the following bias measurement categories: Cause, Existence, Direction, Strength, and Influence. We then grouped semantically and topically similar items to find a construct that fitted as many items as possible without losing any relevant aspects. To name an example, one of the resulting constructs was: "Would you say that the "person/content/outlet was strictly neutral, biased against, or in favor of "side"? Overall, 42 constructs and 99 general items without constructs were left after this process. Since 141 items were still too many, we grouped the edited items by their content and chose items to cover every aspect of each content in a final iteration. If possible, we selected a construct. If a construct did not cover an aspect, we used one of the general remaining items. As a result of this process, we had to exclude some items for the following reasons:

1. We decided on a visual analog scale as the response format for the questionnaire. Most questions could be adapted to fit this format, but we removed questions where this was not possible.

2. Since the questionnaire is supposed to identify bias in an article, some questions were too unspecific or unfit for this questionnaire, for example, questions about media outlets.

3. Many studies did not include the original wording of their items, and in a few cases, it was not possible to create an adequate item out of the description given in the text.

4. Some items were too specific to the issue of their original study and were unfit to be included in a general questionnaire.

4 A Visual Analogue Scale (VAS) is a measurement instrument that tries to measure a characteristic or attitude that is believed to range across a continuum of values and cannot easily be directly measured [34] .

5. Various studies used semantic differentials to ask for their respondents' impressions of the articles. In the questionnaire, we only included semantic differentials that at least two different authors used. We applied the same procedure to questions on feelings. We excluded some items because they were only used once.

Fig. 1. Item reduction process in four main phases.

After this selection, exclusion, and merging process, the final questionnaire consisted of 25 items with varying answering formats, 17 semantic differentials, and six ratings of feelings. To cover third-person perception, we included three items twice, once asking about the article's impact on the participant directly and once asking about the impact on others. For the question about others, we used the term "another person" to keep the questionnaire as general as possible, as performed likewise in other research [17], [22], [35], [36]. Five items remained with the placeholder that was replaced with article-specific information.

We publish the complete set of final questions and original questions, and all other process information at . We also illustrate which questions were merged and excluded in which way in an interactive tree visualization on . 5 , accessed on 2021-01-08.

C. Design

We used the survey platform UniPark 5 for data collection and recruited participants via the recruiting platform Prolific 6 . The study ran on Oct. 20, 2020. Participants were welcomed to the study and given general information on the study's purpose and the data handling. After agreeing to participate, each participant read one of the 190 articles, which was randomly selected. We then asked each participant to rate all 48 items on five pages, separated based on differing anchors, on VAS. All VAS in the study ranged from ?10 to 10 and recorded only integer numbers. The order in which the pages and the items on each page were presented was randomized. In addition, an item that asked whether participants read the article was mixed in as an attention check.

After rating the article, the participants were asked to answer general media bias questions and give demographic and background information. At the end of the study, we asked them whether their data could be used for scientific purposes, and a chance to comment on the study was given.

D. Survey participants

We recruited a sample of 940 American participants, of which 827 participated in the study. We had to exclude 91 because of missing data. We excluded another 18 participants who indicated that their data could not be trusted and further 55 participants that indicated that they had not read the article (i.e., not the highest quarter of the rating scale). The final sample consisted of 663 participants (53.5 % women, 44.8 % men, 1.7 % other). The mean age was 33.86 (SD=13.35), ranging from 18 to 80 years. The highest level of education of participants ranged from some high school education (1.1 %), high school graduate (10.9 %), vocational or technical school (1.2 %), some college education (24.3 %), an associate degree (8.6 %), and a bachelor's degree (35.7 %), to graduate work (18.3 %). On average, participants reported spending 2.95 hours per day viewing or reading the news (SD=3.87). All participants volunteered for the study and gave informed consent. We estimated the duration of this study at 12 minutes. After completing the study, participants received ?1.50 as payment. Participants described themselves as tending to be politically interested (M = 2.76, SD = 5.75) and modestly politically involved (M = ?0.45, SD = 5.46.). The average self-reported political orientation leaned towards liberalism (M=?2.89, SD=5.43; ?10=very liberal, 10=very conservative), and there was no clear agreement to the general existence of media bias (M = ?0.51, SD = 4.24).

E. Article Selection

Regarding the articles each participant read, we followed the article selection process described in [1] to create a sample that balances the number and extremity of both politically left and right articles included. We chose 190 articles from different topics, media outlets, authors, writing styles, and most importantly, articles that range from unbiased to very biased and politically lean towards different sides.

6 , accessed on 2021-01-08.

To select such a sample, we obtained the articles for this study from the platform Allsides. Out of the various topics that Allsides covers, we chose ten different topics to cover a broad spectrum based on two parameters: Current issues (e.g., Coronavirus, Elections) versus general topics (e.g., Economy, Racism), and controversial (e.g., Gun Control, Abortion, Immigration) versus less controversial topics (e.g., Arts and Entertainment, Disasters, World News). From each of the ten topics, we chose 17 articles, six articles biased to the left (three left, three lean left), five articles rated center, and six articles biased towards the right (three lean right, three right). Therefore, we overall collected 170 articles for this study from Allsides. To extend our data set with rather extreme content, we added another 20 articles, ten extremely left, ten extremely right (two for each topic), directly from alternative news outlets. The extended Allsides ratings of political ideology thus ranged from very liberal (1) to very conservative (7) (M = 4, SD = 1.59; ratings adjusted to include the ten extreme articles of either side). The articles varied among outlets and were published between Oct. 1, 2019, and Oct. 31, 2020, and are all under 1500 words long. To avoid confounding variables, we showed only plain texts. We present a complete list of articles, their ratings, further information, and their issue statements on . We inspected every article manually and confirmed whether we agree with the Allsides rating. Still, since the Allsides rating is rather related to a news outlet than a single news article, it might not represent an exact and complete article bias index. We address the possibilities of extending and improving our article set in Section 5.

F. Measures

Perception of Media Bias in articles. Participants were shown 48 items about the perception of media bias. The included items covered cause (e.g., "Do you think that the article includes different points of view regarding the topic in the article?"), direction (e.g., "This article is... liberal/conservative"), existence (e.g., "This article is biased."), influence (e.g., "How much do you think the news article would influence your view of the issue?") and strength of the bias (e.g., "How biased is the article?"). We measured all answers on VAS with a verbal left and a right anchor.

Attention check. To ensure that participants were paying attention, we mixed the item "I read the article" in with the questions on article bias. We anchored the VAS from strongly disagree to agree strongly.

Perception of General Media Bias. Six items measured perception of general media bias on a VAS (?10 = strongly disagree to 10 = strongly agree). The different statements about media in general covered the aspects usually referred to in previous research. As a personal variable, we will analyze the results for the perception of general media bias

in a different setting and did not consider it for the scale construction described here.

G. Exploratory factor analysis

To empirically reduce the 48 questions even more and derive a final set of questions that is useable in a single study, we used an exploratory factor analysis (EFA) [37]. An EFA is a statistical technique to reduce data to a smaller set of summary variables and explore and uncover response patterns in survey items. It identifies latent constructs (factors) that define the interrelationship among items by accounting for common variance [37]. An in computer science more widely known special case of an EFA is the Principal Component Analysis (PCA), which uses a linear combination of a set of variables to create one or more index variables. We, however, use the EFA.

The agreement between the survey participants within a EFA can be described in different ways. One of them, which we use in our study, is the Intraclass Correlation (ICC). The ICC is a descriptive statistic that describes how strongly units in the same group resemble each other and can be interpreted as the fraction of variance shared by all raters [38].

While our factor analysis results will allow us to reduce the number of questions reliably, the sample size is not large enough to perform cross-validation. We will therefore run a second validation study in the future, which we address in Section 5.

IV. ANALYSIS & RESULTS

All articles were rated between one and five times. On average, each article was rated M = 3.49 times (SD = .76). For the factor analysis, we averaged ratings across participants to obtain a mean article rating per item. We computed the agreement between raters per item as the previously described intraclass correlation (ICC) via REML estimates of random intercept models (Table 1) [39].

A. Factor analysis

The factor analysis used maximum likelihood estimators and oblique promax rotations ( = 4). Both KMO (.919) and Bartlett-test (2(1128)=9346.38, p < .001) indicated that the selected items were suitable for factor analysis. For determining the number of factors, we used the Velicer's MAP criterium [40], which yielded 6 factors, which could also be viewed as confirmed by the scree-plot (Figure 2). Kaiser criterium yielded 7 and parallel test 5 factors.

TABLE I. Items

In/accurate

FACTOR ANALYSIS WITH AGREEMENT BETWEEN RATERS PER ITEM AS ICC WITH SUPPRESSED LOADINGS BELOW .3. THE UNDERLINED ITEMS WERE KEPT FOR THE FINAL QUESTION SET.

Factuality 0.999

Influence

Rotated factor loadings

Topic Affirmation

Negative Emotions

Political ICC Bias Ideology

Mean (SD)

12.64% 3.25(3.05)

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download