Lukepawlowski.files.wordpress.com



---title: "Assignment 3"author: "Luke Pawlowski"date: "April 6, 2019"output: html_document---```{r setup, include = FALSE}library(tidyverse)library(ggplot2)library("fivethirtyeight")```## Activity 1This dataset is intended to take a deeper look at comicbook characters, more specifically the rates at which female characters were created as apposed to male ones. This data was pulled from the wikia pages of the two publishers that are being compared in the data, Marvel and DC comics, and it was pulled by (former?) fivethirtyeight writer Walt Hickey. There are 23272 different characters (rows) and 16 different variables (collumns). ## Activity 2The categorical variables are:publisher - DC or Marvelname - name of the characterurlslug - url for the characterid - whether or not they have a secret identity eye - eye colorhair - hair colorsex - gender of the charactergsm - if a character is in a sexual/gender minority alive - if the character is alive or notfirst_appearance - the month/year that the character was first createdmonth - the month that the character was first created year - the year that the character was first createddate - the exact date that the character was first created ```{r, include = FALSE}comic_characters %>% filter(year >= 1961) %>% group_by(year) %>% summarise(total_count = n(), male_count = sum(sex == "Male Characters", na.rm = TRUE), female_count = sum(sex == "Female Characters", na.rm = TRUE)) %>% mutate(percent_male = male_count / total_count, percent_female = female_count / total_count) %>% ggplot() + geom_line(aes(x = year, y = percent_male), stat = "identity", color = "blue") + geom_line(aes(x = year, y = percent_female), stat = "identity", color = "red")``````{r, include = FALSE}comic_characters %>% filter(year >= 1961) %>% group_by(year) %>% summarise(dc_total_count = sum(publisher == "DC"), dc_male_count_good = sum(publisher == "DC" & sex == "Male Characters" & align == "Good Characters", na.rm = TRUE), dc_male_count_bad = sum(publisher == "DC" & sex == "Male Characters" & align == "Bad Characters", na.rm = TRUE), dc_female_count_good = sum(publisher == "DC" & sex == "Female Characters" & align == "Good Characters", na.rm = TRUE), dc_female_count_bad = sum(publisher == "DC" & sex == "Female Characters" & align == "Bad Characters", na.rm = TRUE), marvel_total_count = sum(publisher == "Marvel"), marvel_male_count_good = sum(publisher == "Marvel" & sex == "Male Characters" & align == "Good Characters", na.rm = TRUE), marvel_male_count_bad = sum(publisher == "Marvel" & sex == "Male Characters" & align == "Bad Characters", na.rm = TRUE), marvel_female_count_good = sum(publisher == "Marvel" & sex == "Female Characters" & align == "Good Characters", na.rm = TRUE), marvel_female_count_bad = sum(publisher == "Marvel" & sex == "Female Characters" & align == "Bad Characters", na.rm = TRUE)) %>% mutate(percent_dc_male_good = dc_male_count_good / dc_total_count, percent_dc_male_bad = dc_male_count_bad / dc_total_count, percent_dc_female_good = dc_female_count_good / dc_total_count, percent_dc_female_bad = dc_female_count_bad / dc_total_count, percent_marvel_male_good = marvel_male_count_good / marvel_total_count, percent_marvel_male_bad = marvel_male_count_bad / marvel_total_count, percent_marvel_female_good = marvel_female_count_good / marvel_total_count, percent_marvel_female_bad = marvel_female_count_bad / marvel_total_count) %>% ggplot() + geom_line(aes(x = year, y = percent_dc_male_good), stat = "identity", color = "blue") + geom_line(aes(x = year, y = percent_dc_male_bad), stat = "identity", color = "dark blue") + geom_line(aes(x = year, y = percent_dc_female_good), stat = "identity", color = "red") + geom_line(aes(x = year, y = percent_dc_female_bad), stat = "identity", color = "dark red") + geom_line(aes(x = year, y = percent_marvel_male_good), stat = "identity", color = "blue", linetype = "dashed") + geom_line(aes(x = year, y = percent_marvel_male_bad), stat = "identity", color = "dark blue", linetype = "dashed") + geom_line(aes(x = year, y = percent_marvel_female_good), stat = "identity", color = "red", linetype = "dashed") + geom_line(aes(x = year, y = percent_marvel_female_bad), stat = "identity", color = "dark red", linetype = "dashed")``````{r, include = FALSE}comic_characters %>% filter(year >= 1961, appearances > 100) %>% group_by(year) %>% summarise(dc_male_appearence_average = mean(appearances [publisher == "DC" & sex == "Male Characters"], na.rm = TRUE), dc_female_appearence_average = mean(appearances [publisher == "DC" & sex == "Female Characters"], na.rm = TRUE), marvel_male_appearence_average = mean(appearances [publisher == "Marvel" & sex == "Male Characters"], na.rm = TRUE), marvel_female_appearence_average = mean(appearances [publisher == "Marvel" & sex == "Female Characters"], na.rm = TRUE)) %>% ggplot() + geom_line(aes(x = year, y = dc_male_appearence_average), stat = "identity", color = "blue") + geom_line(aes(x = year, y = dc_female_appearence_average), stat = "identity", color = "red") + geom_line(aes(x = year, y = marvel_male_appearence_average), stat = "identity", color = "blue", linetype = "dashed") + geom_line(aes(x = year, y = marvel_female_appearence_average), stat = "identity", color = "red", linetype = "dashed")```## Activity 4### Question 1```{r, include = FALSE}q1colors = c("Male" = "blue", "Female" = "red")``````{r, echo = FALSE}comic_characters %>% filter(year >= 1961) %>% group_by(year) %>% summarise(total_count = n(), male_count = sum(sex == "Male Characters", na.rm = TRUE), female_count = sum(sex == "Female Characters", na.rm = TRUE), male = "Male", female = "Female") %>% mutate(percent_male = male_count / total_count, percent_female = female_count / total_count) %>% ggplot() + geom_line(aes(x = year, y = percent_male, color = male), stat = "identity") + geom_point(aes(x = year, y = percent_male, color = male), stat = "identity") + geom_line(aes(x = year, y = percent_female, color = female), stat = "identity") + geom_point(aes(x = year, y = percent_female, color = female), stat = "identity") + labs(x = "Year", y = "Percentage", title = "Percentage of Male vs. Female Characters in Comic Books: 1961-2014", color = "Gender") + theme_minimal() + scale_y_continuous(breaks = seq(0, 1, .1)) + scale_x_continuous(breaks = seq(1960, 2020, 10)) + theme(legend.position = "right") + scale_color_manual(values = q1colors)```### Question 2```{r, include = FALSE}q2colors = c("Good Male" = "blue", "Bad Male" = "dark blue", "Good Female" = "red", "Bad Female" = "dark red")``````{r, echo = FALSE}comic_characters %>% filter(year >= 1961) %>% group_by(year) %>% summarise(dc_total_count = sum(publisher == "DC"), dc_male_count_good = sum(publisher == "DC" & sex == "Male Characters" & align == "Good Characters", na.rm = TRUE), dc_male_count_bad = sum(publisher == "DC" & sex == "Male Characters" & align == "Bad Characters", na.rm = TRUE), dc_female_count_good = sum(publisher == "DC" & sex == "Female Characters" & align == "Good Characters", na.rm = TRUE), dc_female_count_bad = sum(publisher == "DC" & sex == "Female Characters" & align == "Bad Characters", na.rm = TRUE), marvel_total_count = sum(publisher == "Marvel"), marvel_male_count_good = sum(publisher == "Marvel" & sex == "Male Characters" & align == "Good Characters", na.rm = TRUE), marvel_male_count_bad = sum(publisher == "Marvel" & sex == "Male Characters" & align == "Bad Characters", na.rm = TRUE), marvel_female_count_good = sum(publisher == "Marvel" & sex == "Female Characters" & align == "Good Characters", na.rm = TRUE), marvel_female_count_bad = sum(publisher == "Marvel" & sex == "Female Characters" & align == "Bad Characters", na.rm = TRUE), good_male = "Good Male", bad_male = "Bad Male", good_female = "Good Female", bad_female = "Bad Female") %>% mutate(percent_dc_male_good = dc_male_count_good / dc_total_count, percent_dc_male_bad = dc_male_count_bad / dc_total_count, percent_dc_female_good = dc_female_count_good / dc_total_count, percent_dc_female_bad = dc_female_count_bad / dc_total_count, percent_marvel_male_good = marvel_male_count_good / marvel_total_count, percent_marvel_male_bad = marvel_male_count_bad / marvel_total_count, percent_marvel_female_good = marvel_female_count_good / marvel_total_count, percent_marvel_female_bad = marvel_female_count_bad / marvel_total_count) %>% ggplot() + geom_line(aes(x = year, y = percent_dc_male_good, color = good_male, linetype = ), stat = "identity") + geom_point(aes(x = year, y = percent_dc_male_good, color = good_male, linetype = ), stat = "identity") + geom_line(aes(x = year, y = percent_dc_male_bad, color = bad_male), stat = "identity") + geom_point(aes(x = year, y = percent_dc_male_bad, color = bad_male), stat = "identity") + geom_line(aes(x = year, y = percent_dc_female_good, color = good_female), stat = "identity") + geom_point(aes(x = year, y = percent_dc_female_good, color = good_female), stat = "identity") + geom_line(aes(x = year, y = percent_dc_female_bad, color = bad_female), stat = "identity") + geom_point(aes(x = year, y = percent_dc_female_bad, color = bad_female), stat = "identity") + geom_line(aes(x = year, y = percent_marvel_male_good, color = good_male), stat = "identity", size = 2, alpha = .6) + geom_line(aes(x = year, y = percent_marvel_male_bad, color = bad_male), stat = "identity", size = 2, alpha = .6) + geom_line(aes(x = year, y = percent_marvel_female_good, color = good_female), stat = "identity", size = 2, alpha = .6) + geom_line(aes(x = year, y = percent_marvel_female_bad, color = bad_female), stat = "identity", size = 2, alpha = .6) + labs(x = "Year", y = "Percentage", title = "Male vs. Female: Good Guys vs. Bad Guys: 1961-2014", color = "Gender", caption = "DC = Dotted Line. Marvel = Thick Line") + theme_minimal() + scale_y_continuous(breaks = seq(0, 1, .1)) + scale_x_continuous(breaks = seq(1960, 2020, 10)) + theme(legend.position = "right") + scale_color_manual(values = q2colors)```### Question 3```{r, include = FALSE}q3colors = c("Male" = "blue", "Female" = "red")``````{r, echo = FALSE, warning = FALSE}comic_characters %>% filter(year >= 1961, appearances > 100) %>% group_by(year) %>% summarise(dc_male_appearence_average = mean(appearances [publisher == "DC" & sex == "Male Characters"], na.rm = TRUE), dc_female_appearence_average = mean(appearances [publisher == "DC" & sex == "Female Characters"], na.rm = TRUE), marvel_male_appearence_average = mean(appearances [publisher == "Marvel" & sex == "Male Characters"], na.rm = TRUE), marvel_female_appearence_average = mean(appearances [publisher == "Marvel" & sex == "Female Characters"], na.rm = TRUE), male = "Male", female = "Female") %>% ggplot() + geom_line(aes(x = year, y = dc_male_appearence_average, color = male), stat = "identity") + geom_point(aes(x = year, y = dc_male_appearence_average, color = male), stat = "identity") + geom_line(aes(x = year, y = dc_female_appearence_average, color = female), stat = "identity") + geom_point(aes(x = year, y = dc_female_appearence_average, color = female), stat = "identity") + geom_line(aes(x = year, y = marvel_male_appearence_average, color = male), stat = "identity", size = 2, alpha = .6) + geom_line(aes(x = year, y = marvel_female_appearence_average, color = female), stat = "identity", size = 2, alpha = .6) + labs(x = "Year", y = "Percentage", title = "Male vs. Female: Average Appearences Throughout the Years: 1961-2014", color = "Gender", caption = "DC = Dotted Line. Marvel = Thick Line") + theme_minimal() + scale_y_continuous(breaks = seq(0, 1, .1)) + scale_x_continuous(breaks = seq(1960, 2020, 10)) + theme(legend.position = "right") + scale_color_manual(values = q3colors)```## Activity 5 Through studying and analyzing the dataset it is interesting to see how gender is represented in comic books. But first, some background information. This data was collected by Walt Hickey, a former FiveThrityEight culture writer. He did this by scraping data off the Marvel and DC Wikias. The data dates to somewhere in 1931 until August of 2014, but for our purposes we are only using data from 1961 until 2014. When Walt wrote this article in 2014, his purpose was to demonstrate the level of which comic books were still being made for men, by men, something that the data seems to back up. The data consists of 23,272 different comic book characters, both super and non-super, and utilizes 16 different variables to try and paint a picture of how these characters differ, and to see if how they differ points to some level of sexism.For our purposes we are asking three questions; #1. In each year since 1961 and for both publishers combined, what percent of new characters are male versus female? #2. What proportion of good versus bad characters from each publisher are male versus female, since 1961? And #3. In each comic universe, do male characters or female characters, on average, have more appearances per character, just looking at years since 1961 and characters that appear in comics often (more than 100 appearances)? Based on the graphs above (labeled Question 1, Question 2, and Question 3) we can see the trends over those 53 years and help answer those questions. For Question #1 reference the graph titled "Percentage of Male vs. Female Characters in Comic Books: 1961-2014". We can see that as time has gone on the gender gap in the comic book industry has gotten smaller and smaller, but that is still exists. Fair representation is an important issue today, and we should strive to be better. For Question #2 reference the graph titled "Male vs. Female: Good Guys vs. Bad Guys: 1961-2014". Using this graph, we can dig a little bit deeper. Due to Question #1 we can know that the gender gap still exists. The point of this question and graph are to paint a better picture of what male and females are represented. Utilizing this graph we can see that males are represented more often than females, something that we already knew, but also that there is a difference in their representation. It changes from year to year, but often, males are "bad guys" more than they are "good guys" and women are "good guys" more than they are "bad guys". What this means cannot be determined from this graph alone, but it is an interesting insight into the world of comic books. For the final Question reference the graph titled "Male vs. Female: Average Appearances Throughout the Years: 1961-2014." The question being asked here is if more male or more female characters make repeat appearances, and the answer to that question is that they are close. There are some spikes, but overall it remains consistent. Overall it is apparent that, while the comic book industry is doing better regarding gender representation, if can do better still. From the three questions asked: #1. In each year since 1961 and for both publishers combined, what percent of new characters are male versus female? #2. What proportion of good versus bad characters from each publisher are male versus female, since 1961? And #3. In each comic universe, do male characters or female characters, on average, have more appearances per character, just looking at years since 1961 and characters that appear in comics often (more than 100 appearances)? And their corresponding graphs it is fair to say that the comic book industry could do better. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download