Introduction
Data Wrangling
Data Wrangling is “the art of getting your data into R in a useful form for visualization and modeling” (Wickham and Grolemund 2016)
- We rely on the tidyverse library
- The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.
- On-line resources about tidyverse
Tidy data
See Wickham (2014) for a description of good practices in organizing your data.1
- 3 rules to have tidy data
- Each variable forms a column.
- Each observation forms a row.
- Each type of observational unit forms a table.
Source: Wickham and Grolemund (2016)
- Data that are not tidy are untidy!
Example of tidy data
# A tibble: 5 x 3
Name `Height(mt)` `Weight(kg)`
<chr> <dbl> <dbl>
1 Venonat 1.24 46.6
2 Ledyba 1.2 14.8
3 Wingull 0.49 7.05
4 Treeko 0.51 3.9
5 Zubat 0.9 8.66
Example of untidy data
# A tibble: 10 x 3
Name Key Value
<chr> <chr> <dbl>
1 Ledyba Height(mt) 1.2
2 Ledyba Weight(kg) 14.8
3 Treeko Height(mt) 0.51
4 Treeko Weight(kg) 3.9
5 Venonat Height(mt) 1.24
6 Venonat Weight(kg) 46.6
7 Wingull Height(mt) 0.49
8 Wingull Weight(kg) 7.05
9 Zubat Height(mt) 0.9
10 Zubat Weight(kg) 8.66
Data wrangling
Coding Style
- The code we are going to write relies on pipes: %>%
- We use %>% to emphasize a sequence of action
- . is a placeholder
[1] 3025
- Without pipes
[1] 3025
Mutate
- Adds new variables and preserves existing ones
# A tibble: 5 x 2
Sequence Sequence_rev
<int> <int>
1 1 5
2 2 4
3 3 3
4 4 2
5 5 1
- You can also overwrite old variables
# A tibble: 5 x 1
Sequence
<int>
1 5
2 4
3 3
4 2
5 1
Select
- Choose only a few variables of the data frame
- We not select one of the variables (e.g., z)
# A tibble: 5 x 2
x y
<dbl> <dbl>
1 0.171 0.961
2 0.545 1.77
3 0.548 1.29
4 0.0527 0.354
5 0.654 0.927
- We explicitly select the variables of interest
# A tibble: 5 x 2
x y
<dbl> <dbl>
1 0.849 0.775
2 0.584 0.140
3 0.876 0.442
4 0.792 0.421
5 0.0208 0.669
Rename
- We may want to change the name of variables, to make them more self-explanatory
# A tibble: 5 x 3
x y z
<dbl> <dbl> <dbl>
1 1.76 1.95 12.1
2 3.14 1.55 18.8
3 1.68 1.54 15.3
4 3.61 1.33 1.92
5 2.15 1.41 17.5
# A tibble: 5 x 3
Length Width Height
<dbl> <dbl> <dbl>
1 1.76 1.95 12.1
2 3.14 1.55 18.8
3 1.68 1.54 15.3
4 3.61 1.33 1.92
5 2.15 1.41 17.5
Arrange
- Sort you data according to values in one columns
- Important to get a first impression on rankings
- You can hierarchically sort according to several columns
- Important to get a first impression on rankings
- According to length
tmp <- tibble(Type=c(LETTERS[1:5],LETTERS[1:5]),x=runif(10,1,4),y=runif(10,1,4))
tmp %>% arrange(Type)
# A tibble: 10 x 3
Type x y
<chr> <dbl> <dbl>
1 A 3.73 1.18
2 A 3.33 3.24
3 B 3.63 3.39
4 B 3.62 3.65
5 C 1.51 2.30
6 C 1.10 1.24
7 D 2.55 2.27
8 D 1.18 2.01
9 E 2.02 3.05
10 E 1.63 3.70
# A tibble: 10 x 3
Type x y
<chr> <dbl> <dbl>
1 A 3.33 3.24
2 A 3.73 1.18
3 B 3.62 3.65
4 B 3.63 3.39
5 C 1.10 1.24
6 C 1.51 2.30
7 D 1.18 2.01
8 D 2.55 2.27
9 E 1.63 3.70
10 E 2.02 3.05
Filter
- To choose rows/cases where conditions are true
# A tibble: 5 x 3
x y z
<dbl> <dbl> <dbl>
1 2.57 1.79 13.8
2 1.99 1.42 14.4
3 2.20 1.36 11.3
4 1.47 1.16 2.18
5 1.42 1.53 8.17
# A tibble: 2 x 3
x y z
<dbl> <dbl> <dbl>
1 2.57 1.79 13.8
2 2.20 1.36 11.3
Join: full
- We may want to put two different data frames in relation
- The two data frames contain a common column
- Values \(x\) and \(y\) for Obs are in table tmp.1
- Values \(x\) and \(y\) for Obs are in table tmp.2
# A tibble: 5 x 3
x y Obs
<dbl> <dbl> <int>
1 1.72 1.12 3
2 1.36 1.46 4
3 3.83 1.62 5
4 3.27 1.86 6
5 1.60 1.49 7
# A tibble: 5 x 3
z w Obs
<dbl> <dbl> <int>
1 2.62 0.230 1
2 2.74 1.94 2
3 3.65 1.59 3
4 7.92 1.06 4
5 2.10 1.33 5
Join: full (ii)
# A tibble: 7 x 5
x y Obs z w
<dbl> <dbl> <int> <dbl> <dbl>
1 1.72 1.12 3 3.65 1.59
2 1.36 1.46 4 7.92 1.06
3 3.83 1.62 5 2.10 1.33
4 3.27 1.86 6 NA NA
5 1.60 1.49 7 NA NA
6 NA NA 1 2.62 0.230
7 NA NA 2 2.74 1.94
- ! Not all Obs are in both tables
Join: partial
- We may want to have all values of the “left” data frame (tmp.1)
# A tibble: 5 x 5
x y Obs z w
<dbl> <dbl> <int> <dbl> <dbl>
1 1.72 1.12 3 3.65 1.59
2 1.36 1.46 4 7.92 1.06
3 3.83 1.62 5 2.10 1.33
4 3.27 1.86 6 NA NA
5 1.60 1.49 7 NA NA
- We may want to have all values of the “right” data frame (tmp.2)
# A tibble: 5 x 5
x y Obs z w
<dbl> <dbl> <int> <dbl> <dbl>
1 NA NA 1 2.62 0.230
2 NA NA 2 2.74 1.94
3 1.72 1.12 3 3.65 1.59
4 1.36 1.46 4 7.92 1.06
5 3.83 1.62 5 2.10 1.33
Gathering
- Gather takes multiple columns and collapses into key-value pairs, duplicating all other columns as needed
# A tibble: 5 x 3
x y Obs
<dbl> <dbl> <int>
1 1.72 1.12 3
2 1.36 1.46 4
3 3.83 1.62 5
4 3.27 1.86 6
5 1.60 1.49 7
# A tibble: 10 x 3
Obs Key Value
<int> <chr> <dbl>
1 3 x 1.72
2 4 x 1.36
3 5 x 3.83
4 6 x 3.27
5 7 x 1.60
6 3 y 1.12
7 4 y 1.46
8 5 y 1.62
9 6 y 1.86
10 7 y 1.49
Spread
- spread spreads a key-value pair across multiple columns (inverse of gather)
# A tibble: 5 x 3
Obs x y
<int> <dbl> <dbl>
1 3 1.72 1.12
2 4 1.36 1.46
3 5 3.83 1.62
4 6 3.27 1.86
5 7 1.60 1.49
An Application to Experimental Data
The Experiment
- We are going to work with a dataframe containing the outcomes of the following experiment
- Participants play an Ultimatum Game
- Two roles: Proposer and Responder
- The Proposer receives an endowment of € 10 and decides how to divide it between herself and the Responder
- The Responder decides whether to accept the offer or not
- The consequence of Responder choice is manipulated in two experimental treatments
- Voice+Punishment
- When the offer is rejected, both the Proposer and the Responder earn 0
- When the offer is accepted, the division made by the Proposer is implemented and payoffs computed accordingly
- Voice
- When the offer is rejected, the division made by the Proposer is implemented and payoffs computed accordingly
- When the offer is accepted, the division made by the Proposer is implemented and payoffs computed accordingly
- Voice+Punishment
Data frame
Session | Treatment | Subject | Role | Group | Offer | Reject |
---|---|---|---|---|---|---|
1 | Voice+Punishment | 1 | Proposer | 1 | 2 | NA |
1 | Voice+Punishment | 2 | Proposer | 2 | 3 | NA |
1 | Voice+Punishment | 3 | Proposer | 3 | 5 | NA |
1 | Voice+Punishment | 4 | Proposer | 4 | 2 | NA |
1 | Voice+Punishment | 5 | Proposer | 5 | 2 | NA |
1 | Voice+Punishment | 6 | Proposer | 6 | 3 | NA |
1 | Voice+Punishment | 7 | Proposer | 7 | 0 | NA |
1 | Voice+Punishment | 8 | Proposer | 8 | 1 | NA |
1 | Voice+Punishment | 9 | Proposer | 9 | 4 | NA |
1 | Voice+Punishment | 10 | Proposer | 10 | 2 | NA |
1 | Voice+Punishment | 11 | Proposer | 11 | 2 | NA |
1 | Voice+Punishment | 12 | Proposer | 12 | 0 | NA |
1 | Voice+Punishment | 13 | Proposer | 13 | 2 | NA |
1 | Voice+Punishment | 14 | Proposer | 14 | 0 | NA |
1 | Voice+Punishment | 15 | Proposer | 15 | 2 | NA |
1 | Voice+Punishment | 16 | Proposer | 16 | 2 | NA |
1 | Voice+Punishment | 17 | Proposer | 17 | 2 | NA |
1 | Voice+Punishment | 18 | Proposer | 18 | 1 | NA |
1 | Voice+Punishment | 19 | Proposer | 19 | 0 | NA |
1 | Voice+Punishment | 20 | Proposer | 20 | 1 | NA |
1 | Voice+Punishment | 21 | Responder | 1 | NA | 1 |
1 | Voice+Punishment | 22 | Responder | 2 | NA | 0 |
1 | Voice+Punishment | 23 | Responder | 3 | NA | 0 |
1 | Voice+Punishment | 24 | Responder | 4 | NA | 0 |
1 | Voice+Punishment | 25 | Responder | 5 | NA | 0 |
1 | Voice+Punishment | 26 | Responder | 6 | NA | 0 |
1 | Voice+Punishment | 27 | Responder | 7 | NA | 1 |
1 | Voice+Punishment | 28 | Responder | 8 | NA | 1 |
1 | Voice+Punishment | 29 | Responder | 9 | NA | 0 |
1 | Voice+Punishment | 30 | Responder | 10 | NA | 0 |
1 | Voice+Punishment | 31 | Responder | 11 | NA | 1 |
1 | Voice+Punishment | 32 | Responder | 12 | NA | 0 |
1 | Voice+Punishment | 33 | Responder | 13 | NA | 0 |
1 | Voice+Punishment | 34 | Responder | 14 | NA | 1 |
1 | Voice+Punishment | 35 | Responder | 15 | NA | 1 |
1 | Voice+Punishment | 36 | Responder | 16 | NA | 0 |
1 | Voice+Punishment | 37 | Responder | 17 | NA | 1 |
1 | Voice+Punishment | 38 | Responder | 18 | NA | 0 |
1 | Voice+Punishment | 39 | Responder | 19 | NA | 1 |
1 | Voice+Punishment | 40 | Responder | 20 | NA | 1 |
2 | Voice+Punishment | 1 | Proposer | 1 | 1 | NA |
2 | Voice+Punishment | 2 | Proposer | 2 | 2 | NA |
2 | Voice+Punishment | 3 | Proposer | 3 | 1 | NA |
2 | Voice+Punishment | 4 | Proposer | 4 | 0 | NA |
2 | Voice+Punishment | 5 | Proposer | 5 | 5 | NA |
2 | Voice+Punishment | 6 | Proposer | 6 | 0 | NA |
2 | Voice+Punishment | 7 | Proposer | 7 | 3 | NA |
2 | Voice+Punishment | 8 | Proposer | 8 | 2 | NA |
2 | Voice+Punishment | 9 | Proposer | 9 | 3 | NA |
2 | Voice+Punishment | 10 | Proposer | 10 | 3 | NA |
2 | Voice+Punishment | 11 | Proposer | 11 | 2 | NA |
2 | Voice+Punishment | 12 | Proposer | 12 | 0 | NA |
2 | Voice+Punishment | 13 | Proposer | 13 | 5 | NA |
2 | Voice+Punishment | 14 | Proposer | 14 | 4 | NA |
2 | Voice+Punishment | 15 | Proposer | 15 | 5 | NA |
2 | Voice+Punishment | 16 | Proposer | 16 | 2 | NA |
2 | Voice+Punishment | 17 | Proposer | 17 | 2 | NA |
2 | Voice+Punishment | 18 | Proposer | 18 | 2 | NA |
2 | Voice+Punishment | 19 | Proposer | 19 | 2 | NA |
2 | Voice+Punishment | 20 | Proposer | 20 | 2 | NA |
2 | Voice+Punishment | 21 | Responder | 1 | NA | 0 |
2 | Voice+Punishment | 22 | Responder | 2 | NA | 0 |
2 | Voice+Punishment | 23 | Responder | 3 | NA | 1 |
2 | Voice+Punishment | 24 | Responder | 4 | NA | 1 |
2 | Voice+Punishment | 25 | Responder | 5 | NA | 0 |
2 | Voice+Punishment | 26 | Responder | 6 | NA | 1 |
2 | Voice+Punishment | 27 | Responder | 7 | NA | 0 |
2 | Voice+Punishment | 28 | Responder | 8 | NA | 1 |
2 | Voice+Punishment | 29 | Responder | 9 | NA | 0 |
2 | Voice+Punishment | 30 | Responder | 10 | NA | 0 |
2 | Voice+Punishment | 31 | Responder | 11 | NA | 1 |
2 | Voice+Punishment | 32 | Responder | 12 | NA | 0 |
2 | Voice+Punishment | 33 | Responder | 13 | NA | 0 |
2 | Voice+Punishment | 34 | Responder | 14 | NA | 0 |
2 | Voice+Punishment | 35 | Responder | 15 | NA | 0 |
2 | Voice+Punishment | 36 | Responder | 16 | NA | 1 |
2 | Voice+Punishment | 37 | Responder | 17 | NA | 1 |
2 | Voice+Punishment | 38 | Responder | 18 | NA | 1 |
2 | Voice+Punishment | 39 | Responder | 19 | NA | 1 |
2 | Voice+Punishment | 40 | Responder | 20 | NA | 1 |
3 | Voice | 1 | Proposer | 1 | 3 | NA |
3 | Voice | 2 | Proposer | 2 | 2 | NA |
3 | Voice | 3 | Proposer | 3 | 0 | NA |
3 | Voice | 4 | Proposer | 4 | 0 | NA |
3 | Voice | 5 | Proposer | 5 | 0 | NA |
3 | Voice | 6 | Proposer | 6 | 0 | NA |
3 | Voice | 7 | Proposer | 7 | 3 | NA |
3 | Voice | 8 | Proposer | 8 | 1 | NA |
3 | Voice | 9 | Proposer | 9 | 0 | NA |
3 | Voice | 10 | Proposer | 10 | 1 | NA |
3 | Voice | 11 | Proposer | 11 | 1 | NA |
3 | Voice | 12 | Proposer | 12 | 0 | NA |
3 | Voice | 13 | Proposer | 13 | 0 | NA |
3 | Voice | 14 | Proposer | 14 | 0 | NA |
3 | Voice | 15 | Proposer | 15 | 0 | NA |
3 | Voice | 16 | Proposer | 16 | 2 | NA |
3 | Voice | 17 | Proposer | 17 | 0 | NA |
3 | Voice | 18 | Proposer | 18 | 1 | NA |
3 | Voice | 19 | Proposer | 19 | 0 | NA |
3 | Voice | 20 | Proposer | 20 | 0 | NA |
3 | Voice | 21 | Responder | 1 | NA | 1 |
3 | Voice | 22 | Responder | 2 | NA | 1 |
3 | Voice | 23 | Responder | 3 | NA | 1 |
3 | Voice | 24 | Responder | 4 | NA | 1 |
3 | Voice | 25 | Responder | 5 | NA | 1 |
3 | Voice | 26 | Responder | 6 | NA | 1 |
3 | Voice | 27 | Responder | 7 | NA | 1 |
3 | Voice | 28 | Responder | 8 | NA | 1 |
3 | Voice | 29 | Responder | 9 | NA | 1 |
3 | Voice | 30 | Responder | 10 | NA | 1 |
3 | Voice | 31 | Responder | 11 | NA | 1 |
3 | Voice | 32 | Responder | 12 | NA | 1 |
3 | Voice | 33 | Responder | 13 | NA | 1 |
3 | Voice | 34 | Responder | 14 | NA | 1 |
3 | Voice | 35 | Responder | 15 | NA | 1 |
3 | Voice | 36 | Responder | 16 | NA | 1 |
3 | Voice | 37 | Responder | 17 | NA | 1 |
3 | Voice | 38 | Responder | 18 | NA | 1 |
3 | Voice | 39 | Responder | 19 | NA | 1 |
3 | Voice | 40 | Responder | 20 | NA | 1 |
4 | Voice | 1 | Proposer | 1 | 0 | NA |
4 | Voice | 2 | Proposer | 2 | 3 | NA |
4 | Voice | 3 | Proposer | 3 | 2 | NA |
4 | Voice | 4 | Proposer | 4 | 2 | NA |
4 | Voice | 5 | Proposer | 5 | 0 | NA |
4 | Voice | 6 | Proposer | 6 | 0 | NA |
4 | Voice | 7 | Proposer | 7 | 1 | NA |
4 | Voice | 8 | Proposer | 8 | 1 | NA |
4 | Voice | 9 | Proposer | 9 | 0 | NA |
4 | Voice | 10 | Proposer | 10 | 0 | NA |
4 | Voice | 11 | Proposer | 11 | 0 | NA |
4 | Voice | 12 | Proposer | 12 | 0 | NA |
4 | Voice | 13 | Proposer | 13 | 0 | NA |
4 | Voice | 14 | Proposer | 14 | 2 | NA |
4 | Voice | 15 | Proposer | 15 | 0 | NA |
4 | Voice | 16 | Proposer | 16 | 0 | NA |
4 | Voice | 17 | Proposer | 17 | 0 | NA |
4 | Voice | 18 | Proposer | 18 | 0 | NA |
4 | Voice | 19 | Proposer | 19 | 0 | NA |
4 | Voice | 20 | Proposer | 20 | 0 | NA |
4 | Voice | 21 | Responder | 1 | NA | 1 |
4 | Voice | 22 | Responder | 2 | NA | 1 |
4 | Voice | 23 | Responder | 3 | NA | 0 |
4 | Voice | 24 | Responder | 4 | NA | 1 |
4 | Voice | 25 | Responder | 5 | NA | 0 |
4 | Voice | 26 | Responder | 6 | NA | 1 |
4 | Voice | 27 | Responder | 7 | NA | 1 |
4 | Voice | 28 | Responder | 8 | NA | 1 |
4 | Voice | 29 | Responder | 9 | NA | 1 |
4 | Voice | 30 | Responder | 10 | NA | 1 |
4 | Voice | 31 | Responder | 11 | NA | 1 |
4 | Voice | 32 | Responder | 12 | NA | 1 |
4 | Voice | 33 | Responder | 13 | NA | 1 |
4 | Voice | 34 | Responder | 14 | NA | 0 |
4 | Voice | 35 | Responder | 15 | NA | 1 |
4 | Voice | 36 | Responder | 16 | NA | 1 |
4 | Voice | 37 | Responder | 17 | NA | 1 |
4 | Voice | 38 | Responder | 18 | NA | 1 |
4 | Voice | 39 | Responder | 19 | NA | 1 |
4 | Voice | 40 | Responder | 20 | NA | 1 |
Mutate
- In each Session, Subject and Group are unique identifiers
- Not across sessions!
- We need a unique identifier for Subjects and Groups across sessions
# A tibble: 160 x 7
Session Treatment Subject Role Group Offer Reject
<dbl> <fct> <dbl> <fct> <dbl> <int> <int>
1 1 Voice+Punishment 101 Proposer 101 2 NA
2 1 Voice+Punishment 102 Proposer 102 3 NA
3 1 Voice+Punishment 103 Proposer 103 5 NA
4 1 Voice+Punishment 104 Proposer 104 2 NA
5 1 Voice+Punishment 105 Proposer 105 2 NA
6 1 Voice+Punishment 106 Proposer 106 3 NA
7 1 Voice+Punishment 107 Proposer 107 0 NA
8 1 Voice+Punishment 108 Proposer 108 1 NA
9 1 Voice+Punishment 109 Proposer 109 4 NA
10 1 Voice+Punishment 110 Proposer 110 2 NA
# … with 150 more rows
Mutate (ii)
- Treatment and Role are of type character
- More appropriate to have them of type factor
Classes 'tbl_df', 'tbl' and 'data.frame': 160 obs. of 7 variables:
$ Session : num 1 1 1 1 1 1 1 1 1 1 ...
$ Treatment: Factor w/ 2 levels "Voice","Voice+Punishment": 2 2 2 2 2 2 2 2 2 2 ...
$ Subject : num 101 102 103 104 105 106 107 108 109 110 ...
$ Role : Factor w/ 2 levels "Proposer","Responder": 1 1 1 1 1 1 1 1 1 1 ...
$ Group : num 101 102 103 104 105 106 107 108 109 110 ...
$ Offer : int 2 3 5 2 2 3 0 1 4 2 ...
$ Reject : int NA NA NA NA NA NA NA NA NA NA ...
Classes 'tbl_df', 'tbl' and 'data.frame': 160 obs. of 7 variables:
$ Session : num 1 1 1 1 1 1 1 1 1 1 ...
$ Treatment: Factor w/ 2 levels "Voice","Voice+Punishment": 2 2 2 2 2 2 2 2 2 2 ...
$ Subject : num 101 102 103 104 105 106 107 108 109 110 ...
$ Role : Factor w/ 2 levels "Proposer","Responder": 1 1 1 1 1 1 1 1 1 1 ...
$ Group : num 101 102 103 104 105 106 107 108 109 110 ...
$ Offer : int 2 3 5 2 2 3 0 1 4 2 ...
$ Reject : int NA NA NA NA NA NA NA NA NA NA ...
Select
- We are not interested in Session anymore
- We can infer it from the unique identifier of Subject
- 10x = Session 1, 20x = Session 2, …
- We can infer it from the unique identifier of Subject
- We also want to reorder columns in a meaningful way
# A tibble: 160 x 6
Group Subject Role Treatment Offer Reject
<dbl> <dbl> <fct> <fct> <int> <int>
1 101 101 Proposer Voice+Punishment 2 NA
2 102 102 Proposer Voice+Punishment 3 NA
3 103 103 Proposer Voice+Punishment 5 NA
4 104 104 Proposer Voice+Punishment 2 NA
5 105 105 Proposer Voice+Punishment 2 NA
6 106 106 Proposer Voice+Punishment 3 NA
7 107 107 Proposer Voice+Punishment 0 NA
8 108 108 Proposer Voice+Punishment 1 NA
9 109 109 Proposer Voice+Punishment 4 NA
10 110 110 Proposer Voice+Punishment 2 NA
# … with 150 more rows
Rename
- In our experiment, we want to clearly indicate that Subject and Group are unique identifiers
# A tibble: 160 x 6
Group.ID Subject.ID Role Treatment Offer Reject
<dbl> <dbl> <fct> <fct> <int> <int>
1 101 101 Proposer Voice+Punishment 2 NA
2 102 102 Proposer Voice+Punishment 3 NA
3 103 103 Proposer Voice+Punishment 5 NA
4 104 104 Proposer Voice+Punishment 2 NA
5 105 105 Proposer Voice+Punishment 2 NA
6 106 106 Proposer Voice+Punishment 3 NA
7 107 107 Proposer Voice+Punishment 0 NA
8 108 108 Proposer Voice+Punishment 1 NA
9 109 109 Proposer Voice+Punishment 4 NA
10 110 110 Proposer Voice+Punishment 2 NA
# … with 150 more rows
Filter
- Select only the data that refer to the Proposer and get rid of column Reject which is empty for the Proposer
# A tibble: 80 x 5
Group.ID Subject.ID Role Treatment Offer
<dbl> <dbl> <fct> <fct> <int>
1 101 101 Proposer Voice+Punishment 2
2 102 102 Proposer Voice+Punishment 3
3 103 103 Proposer Voice+Punishment 5
4 104 104 Proposer Voice+Punishment 2
5 105 105 Proposer Voice+Punishment 2
6 106 106 Proposer Voice+Punishment 3
7 107 107 Proposer Voice+Punishment 0
8 108 108 Proposer Voice+Punishment 1
9 109 109 Proposer Voice+Punishment 4
10 110 110 Proposer Voice+Punishment 2
# … with 70 more rows
Filter (ii)
- Select only the data that refer to the Proposer and get rid of column Offer which is empty for the Responder
# A tibble: 80 x 5
Group.ID Subject.ID Role Treatment Reject
<dbl> <dbl> <fct> <fct> <int>
1 101 121 Responder Voice+Punishment 1
2 102 122 Responder Voice+Punishment 0
3 103 123 Responder Voice+Punishment 0
4 104 124 Responder Voice+Punishment 0
5 105 125 Responder Voice+Punishment 0
6 106 126 Responder Voice+Punishment 0
7 107 127 Responder Voice+Punishment 1
8 108 128 Responder Voice+Punishment 1
9 109 129 Responder Voice+Punishment 0
10 110 130 Responder Voice+Punishment 0
# … with 70 more rows
Join
- We want to create a database in which each row collects the observations within a Group
- Both the offer and the reject choice
d.UG.2 <- full_join(
d.UG %>% filter(Role=="Proposer") %>% select(-Reject),
d.UG %>% filter(Role=="Responder") %>% select(-Offer),
by="Group.ID"
) %>%
select(Group.ID,Treatment.x,Offer,Reject) %>%
rename(Treatment=Treatment.x)
d.UG.2
# A tibble: 80 x 4
Group.ID Treatment Offer Reject
<dbl> <fct> <int> <int>
1 101 Voice+Punishment 2 1
2 102 Voice+Punishment 3 0
3 103 Voice+Punishment 5 0
4 104 Voice+Punishment 2 0
5 105 Voice+Punishment 2 0
6 106 Voice+Punishment 3 0
7 107 Voice+Punishment 0 1
8 108 Voice+Punishment 1 1
9 109 Voice+Punishment 4 0
10 110 Voice+Punishment 2 0
# … with 70 more rows
Spread
- The spreading results is untidy data
- However, it can be useful as intermediate step
- e.g., to compute payoffs
# A tibble: 160 x 4
Group.ID Treatment Key Value
<dbl> <fct> <chr> <int>
1 101 Voice+Punishment Offer 2
2 102 Voice+Punishment Offer 3
3 103 Voice+Punishment Offer 5
4 104 Voice+Punishment Offer 2
5 105 Voice+Punishment Offer 2
6 106 Voice+Punishment Offer 3
7 107 Voice+Punishment Offer 0
8 108 Voice+Punishment Offer 1
9 109 Voice+Punishment Offer 4
10 110 Voice+Punishment Offer 2
# … with 150 more rows
Thank you
🦖
To contact me just write me an email
or write me on the forum of the course
Appendix
Assignments
Assignment 1
- Create these two datasets
# A tibble: 5 x 4
Outcome a b c
<int> <int> <int> <dbl>
1 1 1 5 1
2 2 2 4 3
3 3 3 3 5
4 4 4 2 7
5 5 5 1 9
# A tibble: 5 x 4
Outcome d e f
<int> <int> <int> <dbl>
1 1 5 1 1
2 2 4 2 5
3 3 3 3 9
4 4 2 4 13
5 5 1 5 17
Join them in a unique dataset using Outcome
Create a new column sum_ad with the sum of a and f
Create a new dataset looking like this
# A tibble: 3 x 4
Outcome a f sum_ad
<int> <int> <dbl> <dbl>
1 1 1 1 2
2 3 3 9 12
3 5 5 17 22
Assignment 2
- Take the data from the UG (included here) and compute the payoffs in the game
References
Wickham, Hadley. 2014. “Tidy Data.” The Journal of Statistical Software 59 (10). http://www.jstatsoft.org/v59/i10/.
Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. " O’Reilly Media, Inc.".