R: Data Wrangling

Data Wrangling and Data Representation in R markdown"

Matteo Ploner

Introduction

Data Wrangling

  • Data Wrangling is “the art of getting your data into R in a useful form for visualization and modeling(Wickham and Grolemund 2016)

  • We rely on the tidyverse library
    • The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.

Tidy data

  • See Wickham (2014) for a description of good practices in organizing your data.1

  • 3 rules to have tidy data
    • Each variable forms a column.
    • Each observation forms a row.
    • Each type of observational unit forms a table.


Source: Wickham and Grolemund (2016)

  • Data that are not tidy are untidy!

Example of tidy data

  # A tibble: 5 x 3
    Name    `Height(mt)` `Weight(kg)`
    <chr>          <dbl>        <dbl>
  1 Venonat         1.24        46.6 
  2 Ledyba          1.2         14.8 
  3 Wingull         0.49         7.05
  4 Treeko          0.51         3.9 
  5 Zubat           0.9          8.66

Example of untidy data

  # A tibble: 10 x 3
     Name    Key        Value
     <chr>   <chr>      <dbl>
   1 Ledyba  Height(mt)  1.2 
   2 Ledyba  Weight(kg) 14.8 
   3 Treeko  Height(mt)  0.51
   4 Treeko  Weight(kg)  3.9 
   5 Venonat Height(mt)  1.24
   6 Venonat Weight(kg) 46.6 
   7 Wingull Height(mt)  0.49
   8 Wingull Weight(kg)  7.05
   9 Zubat   Height(mt)  0.9 
  10 Zubat   Weight(kg)  8.66

Data wrangling

Coding Style

  • The code we are going to write relies on pipes: %>%
  • We use %>% to emphasize a sequence of action
    • . is a placeholder
  [1] 3025
  • Without pipes
  [1] 3025

Mutate

  • Adds new variables and preserves existing ones
  # A tibble: 5 x 2
    Sequence Sequence_rev
       <int>        <int>
  1        1            5
  2        2            4
  3        3            3
  4        4            2
  5        5            1
  • You can also overwrite old variables
  # A tibble: 5 x 1
    Sequence
       <int>
  1        5
  2        4
  3        3
  4        2
  5        1

Select

  • Choose only a few variables of the data frame
  • We not select one of the variables (e.g., z)
  # A tibble: 5 x 2
         x     y
     <dbl> <dbl>
  1 0.171  0.961
  2 0.545  1.77 
  3 0.548  1.29 
  4 0.0527 0.354
  5 0.654  0.927
  • We explicitly select the variables of interest
  # A tibble: 5 x 2
         x     y
     <dbl> <dbl>
  1 0.849  0.775
  2 0.584  0.140
  3 0.876  0.442
  4 0.792  0.421
  5 0.0208 0.669

Rename

  • We may want to change the name of variables, to make them more self-explanatory
  # A tibble: 5 x 3
        x     y     z
    <dbl> <dbl> <dbl>
  1  1.76  1.95 12.1 
  2  3.14  1.55 18.8 
  3  1.68  1.54 15.3 
  4  3.61  1.33  1.92
  5  2.15  1.41 17.5
  # A tibble: 5 x 3
    Length Width Height
     <dbl> <dbl>  <dbl>
  1   1.76  1.95  12.1 
  2   3.14  1.55  18.8 
  3   1.68  1.54  15.3 
  4   3.61  1.33   1.92
  5   2.15  1.41  17.5

Arrange

  • Sort you data according to values in one columns
    • Important to get a first impression on rankings
      • You can hierarchically sort according to several columns
  • According to length
  # A tibble: 10 x 3
     Type      x     y
     <chr> <dbl> <dbl>
   1 A      3.73  1.18
   2 A      3.33  3.24
   3 B      3.63  3.39
   4 B      3.62  3.65
   5 C      1.51  2.30
   6 C      1.10  1.24
   7 D      2.55  2.27
   8 D      1.18  2.01
   9 E      2.02  3.05
  10 E      1.63  3.70
  # A tibble: 10 x 3
     Type      x     y
     <chr> <dbl> <dbl>
   1 A      3.33  3.24
   2 A      3.73  1.18
   3 B      3.62  3.65
   4 B      3.63  3.39
   5 C      1.10  1.24
   6 C      1.51  2.30
   7 D      1.18  2.01
   8 D      2.55  2.27
   9 E      1.63  3.70
  10 E      2.02  3.05

Filter

  • To choose rows/cases where conditions are true
  # A tibble: 5 x 3
        x     y     z
    <dbl> <dbl> <dbl>
  1  2.57  1.79 13.8 
  2  1.99  1.42 14.4 
  3  2.20  1.36 11.3 
  4  1.47  1.16  2.18
  5  1.42  1.53  8.17
  # A tibble: 2 x 3
        x     y     z
    <dbl> <dbl> <dbl>
  1  2.57  1.79  13.8
  2  2.20  1.36  11.3

Join: full

  • We may want to put two different data frames in relation
    • The two data frames contain a common column
  • Values \(x\) and \(y\) for Obs are in table tmp.1
  • Values \(x\) and \(y\) for Obs are in table tmp.2
  # A tibble: 5 x 3
        x     y   Obs
    <dbl> <dbl> <int>
  1  1.72  1.12     3
  2  1.36  1.46     4
  3  3.83  1.62     5
  4  3.27  1.86     6
  5  1.60  1.49     7

Join: full (ii)

  # A tibble: 7 x 5
        x     y   Obs     z      w
    <dbl> <dbl> <int> <dbl>  <dbl>
  1  1.72  1.12     3  3.65  1.59 
  2  1.36  1.46     4  7.92  1.06 
  3  3.83  1.62     5  2.10  1.33 
  4  3.27  1.86     6 NA    NA    
  5  1.60  1.49     7 NA    NA    
  6 NA    NA        1  2.62  0.230
  7 NA    NA        2  2.74  1.94
  • ! Not all Obs are in both tables

Join: partial

  • We may want to have all values of the “left” data frame (tmp.1)
  # A tibble: 5 x 5
        x     y   Obs     z     w
    <dbl> <dbl> <int> <dbl> <dbl>
  1  1.72  1.12     3  3.65  1.59
  2  1.36  1.46     4  7.92  1.06
  3  3.83  1.62     5  2.10  1.33
  4  3.27  1.86     6 NA    NA   
  5  1.60  1.49     7 NA    NA
  • We may want to have all values of the “right” data frame (tmp.2)
  # A tibble: 5 x 5
        x     y   Obs     z     w
    <dbl> <dbl> <int> <dbl> <dbl>
  1 NA    NA        1  2.62 0.230
  2 NA    NA        2  2.74 1.94 
  3  1.72  1.12     3  3.65 1.59 
  4  1.36  1.46     4  7.92 1.06 
  5  3.83  1.62     5  2.10 1.33

Gathering

  • Gather takes multiple columns and collapses into key-value pairs, duplicating all other columns as needed
  # A tibble: 5 x 3
        x     y   Obs
    <dbl> <dbl> <int>
  1  1.72  1.12     3
  2  1.36  1.46     4
  3  3.83  1.62     5
  4  3.27  1.86     6
  5  1.60  1.49     7
  # A tibble: 10 x 3
       Obs Key   Value
     <int> <chr> <dbl>
   1     3 x      1.72
   2     4 x      1.36
   3     5 x      3.83
   4     6 x      3.27
   5     7 x      1.60
   6     3 y      1.12
   7     4 y      1.46
   8     5 y      1.62
   9     6 y      1.86
  10     7 y      1.49

Spread

  • spread spreads a key-value pair across multiple columns (inverse of gather)
  # A tibble: 5 x 3
      Obs     x     y
    <int> <dbl> <dbl>
  1     3  1.72  1.12
  2     4  1.36  1.46
  3     5  3.83  1.62
  4     6  3.27  1.86
  5     7  1.60  1.49

An Application to Experimental Data

The Experiment

  • We are going to work with a dataframe containing the outcomes of the following experiment
  • Participants play an Ultimatum Game
    • Two roles: Proposer and Responder
    • The Proposer receives an endowment of € 10 and decides how to divide it between herself and the Responder
    • The Responder decides whether to accept the offer or not
    • The consequence of Responder choice is manipulated in two experimental treatments
      • Voice+Punishment
        • When the offer is rejected, both the Proposer and the Responder earn 0
        • When the offer is accepted, the division made by the Proposer is implemented and payoffs computed accordingly
      • Voice
        • When the offer is rejected, the division made by the Proposer is implemented and payoffs computed accordingly
        • When the offer is accepted, the division made by the Proposer is implemented and payoffs computed accordingly

Data frame

Session Treatment Subject Role Group Offer Reject
1 Voice+Punishment 1 Proposer 1 2 NA
1 Voice+Punishment 2 Proposer 2 3 NA
1 Voice+Punishment 3 Proposer 3 5 NA
1 Voice+Punishment 4 Proposer 4 2 NA
1 Voice+Punishment 5 Proposer 5 2 NA
1 Voice+Punishment 6 Proposer 6 3 NA
1 Voice+Punishment 7 Proposer 7 0 NA
1 Voice+Punishment 8 Proposer 8 1 NA
1 Voice+Punishment 9 Proposer 9 4 NA
1 Voice+Punishment 10 Proposer 10 2 NA
1 Voice+Punishment 11 Proposer 11 2 NA
1 Voice+Punishment 12 Proposer 12 0 NA
1 Voice+Punishment 13 Proposer 13 2 NA
1 Voice+Punishment 14 Proposer 14 0 NA
1 Voice+Punishment 15 Proposer 15 2 NA
1 Voice+Punishment 16 Proposer 16 2 NA
1 Voice+Punishment 17 Proposer 17 2 NA
1 Voice+Punishment 18 Proposer 18 1 NA
1 Voice+Punishment 19 Proposer 19 0 NA
1 Voice+Punishment 20 Proposer 20 1 NA
1 Voice+Punishment 21 Responder 1 NA 1
1 Voice+Punishment 22 Responder 2 NA 0
1 Voice+Punishment 23 Responder 3 NA 0
1 Voice+Punishment 24 Responder 4 NA 0
1 Voice+Punishment 25 Responder 5 NA 0
1 Voice+Punishment 26 Responder 6 NA 0
1 Voice+Punishment 27 Responder 7 NA 1
1 Voice+Punishment 28 Responder 8 NA 1
1 Voice+Punishment 29 Responder 9 NA 0
1 Voice+Punishment 30 Responder 10 NA 0
1 Voice+Punishment 31 Responder 11 NA 1
1 Voice+Punishment 32 Responder 12 NA 0
1 Voice+Punishment 33 Responder 13 NA 0
1 Voice+Punishment 34 Responder 14 NA 1
1 Voice+Punishment 35 Responder 15 NA 1
1 Voice+Punishment 36 Responder 16 NA 0
1 Voice+Punishment 37 Responder 17 NA 1
1 Voice+Punishment 38 Responder 18 NA 0
1 Voice+Punishment 39 Responder 19 NA 1
1 Voice+Punishment 40 Responder 20 NA 1
2 Voice+Punishment 1 Proposer 1 1 NA
2 Voice+Punishment 2 Proposer 2 2 NA
2 Voice+Punishment 3 Proposer 3 1 NA
2 Voice+Punishment 4 Proposer 4 0 NA
2 Voice+Punishment 5 Proposer 5 5 NA
2 Voice+Punishment 6 Proposer 6 0 NA
2 Voice+Punishment 7 Proposer 7 3 NA
2 Voice+Punishment 8 Proposer 8 2 NA
2 Voice+Punishment 9 Proposer 9 3 NA
2 Voice+Punishment 10 Proposer 10 3 NA
2 Voice+Punishment 11 Proposer 11 2 NA
2 Voice+Punishment 12 Proposer 12 0 NA
2 Voice+Punishment 13 Proposer 13 5 NA
2 Voice+Punishment 14 Proposer 14 4 NA
2 Voice+Punishment 15 Proposer 15 5 NA
2 Voice+Punishment 16 Proposer 16 2 NA
2 Voice+Punishment 17 Proposer 17 2 NA
2 Voice+Punishment 18 Proposer 18 2 NA
2 Voice+Punishment 19 Proposer 19 2 NA
2 Voice+Punishment 20 Proposer 20 2 NA
2 Voice+Punishment 21 Responder 1 NA 0
2 Voice+Punishment 22 Responder 2 NA 0
2 Voice+Punishment 23 Responder 3 NA 1
2 Voice+Punishment 24 Responder 4 NA 1
2 Voice+Punishment 25 Responder 5 NA 0
2 Voice+Punishment 26 Responder 6 NA 1
2 Voice+Punishment 27 Responder 7 NA 0
2 Voice+Punishment 28 Responder 8 NA 1
2 Voice+Punishment 29 Responder 9 NA 0
2 Voice+Punishment 30 Responder 10 NA 0
2 Voice+Punishment 31 Responder 11 NA 1
2 Voice+Punishment 32 Responder 12 NA 0
2 Voice+Punishment 33 Responder 13 NA 0
2 Voice+Punishment 34 Responder 14 NA 0
2 Voice+Punishment 35 Responder 15 NA 0
2 Voice+Punishment 36 Responder 16 NA 1
2 Voice+Punishment 37 Responder 17 NA 1
2 Voice+Punishment 38 Responder 18 NA 1
2 Voice+Punishment 39 Responder 19 NA 1
2 Voice+Punishment 40 Responder 20 NA 1
3 Voice 1 Proposer 1 3 NA
3 Voice 2 Proposer 2 2 NA
3 Voice 3 Proposer 3 0 NA
3 Voice 4 Proposer 4 0 NA
3 Voice 5 Proposer 5 0 NA
3 Voice 6 Proposer 6 0 NA
3 Voice 7 Proposer 7 3 NA
3 Voice 8 Proposer 8 1 NA
3 Voice 9 Proposer 9 0 NA
3 Voice 10 Proposer 10 1 NA
3 Voice 11 Proposer 11 1 NA
3 Voice 12 Proposer 12 0 NA
3 Voice 13 Proposer 13 0 NA
3 Voice 14 Proposer 14 0 NA
3 Voice 15 Proposer 15 0 NA
3 Voice 16 Proposer 16 2 NA
3 Voice 17 Proposer 17 0 NA
3 Voice 18 Proposer 18 1 NA
3 Voice 19 Proposer 19 0 NA
3 Voice 20 Proposer 20 0 NA
3 Voice 21 Responder 1 NA 1
3 Voice 22 Responder 2 NA 1
3 Voice 23 Responder 3 NA 1
3 Voice 24 Responder 4 NA 1
3 Voice 25 Responder 5 NA 1
3 Voice 26 Responder 6 NA 1
3 Voice 27 Responder 7 NA 1
3 Voice 28 Responder 8 NA 1
3 Voice 29 Responder 9 NA 1
3 Voice 30 Responder 10 NA 1
3 Voice 31 Responder 11 NA 1
3 Voice 32 Responder 12 NA 1
3 Voice 33 Responder 13 NA 1
3 Voice 34 Responder 14 NA 1
3 Voice 35 Responder 15 NA 1
3 Voice 36 Responder 16 NA 1
3 Voice 37 Responder 17 NA 1
3 Voice 38 Responder 18 NA 1
3 Voice 39 Responder 19 NA 1
3 Voice 40 Responder 20 NA 1
4 Voice 1 Proposer 1 0 NA
4 Voice 2 Proposer 2 3 NA
4 Voice 3 Proposer 3 2 NA
4 Voice 4 Proposer 4 2 NA
4 Voice 5 Proposer 5 0 NA
4 Voice 6 Proposer 6 0 NA
4 Voice 7 Proposer 7 1 NA
4 Voice 8 Proposer 8 1 NA
4 Voice 9 Proposer 9 0 NA
4 Voice 10 Proposer 10 0 NA
4 Voice 11 Proposer 11 0 NA
4 Voice 12 Proposer 12 0 NA
4 Voice 13 Proposer 13 0 NA
4 Voice 14 Proposer 14 2 NA
4 Voice 15 Proposer 15 0 NA
4 Voice 16 Proposer 16 0 NA
4 Voice 17 Proposer 17 0 NA
4 Voice 18 Proposer 18 0 NA
4 Voice 19 Proposer 19 0 NA
4 Voice 20 Proposer 20 0 NA
4 Voice 21 Responder 1 NA 1
4 Voice 22 Responder 2 NA 1
4 Voice 23 Responder 3 NA 0
4 Voice 24 Responder 4 NA 1
4 Voice 25 Responder 5 NA 0
4 Voice 26 Responder 6 NA 1
4 Voice 27 Responder 7 NA 1
4 Voice 28 Responder 8 NA 1
4 Voice 29 Responder 9 NA 1
4 Voice 30 Responder 10 NA 1
4 Voice 31 Responder 11 NA 1
4 Voice 32 Responder 12 NA 1
4 Voice 33 Responder 13 NA 1
4 Voice 34 Responder 14 NA 0
4 Voice 35 Responder 15 NA 1
4 Voice 36 Responder 16 NA 1
4 Voice 37 Responder 17 NA 1
4 Voice 38 Responder 18 NA 1
4 Voice 39 Responder 19 NA 1
4 Voice 40 Responder 20 NA 1

Mutate

  • In each Session, Subject and Group are unique identifiers
    • Not across sessions!
  • We need a unique identifier for Subjects and Groups across sessions
  # A tibble: 160 x 7
     Session Treatment        Subject Role     Group Offer Reject
       <dbl> <fct>              <dbl> <fct>    <dbl> <int>  <int>
   1       1 Voice+Punishment     101 Proposer   101     2     NA
   2       1 Voice+Punishment     102 Proposer   102     3     NA
   3       1 Voice+Punishment     103 Proposer   103     5     NA
   4       1 Voice+Punishment     104 Proposer   104     2     NA
   5       1 Voice+Punishment     105 Proposer   105     2     NA
   6       1 Voice+Punishment     106 Proposer   106     3     NA
   7       1 Voice+Punishment     107 Proposer   107     0     NA
   8       1 Voice+Punishment     108 Proposer   108     1     NA
   9       1 Voice+Punishment     109 Proposer   109     4     NA
  10       1 Voice+Punishment     110 Proposer   110     2     NA
  # … with 150 more rows

Mutate (ii)

  • Treatment and Role are of type character
    • More appropriate to have them of type factor
  Classes 'tbl_df', 'tbl' and 'data.frame': 160 obs. of  7 variables:
   $ Session  : num  1 1 1 1 1 1 1 1 1 1 ...
   $ Treatment: Factor w/ 2 levels "Voice","Voice+Punishment": 2 2 2 2 2 2 2 2 2 2 ...
   $ Subject  : num  101 102 103 104 105 106 107 108 109 110 ...
   $ Role     : Factor w/ 2 levels "Proposer","Responder": 1 1 1 1 1 1 1 1 1 1 ...
   $ Group    : num  101 102 103 104 105 106 107 108 109 110 ...
   $ Offer    : int  2 3 5 2 2 3 0 1 4 2 ...
   $ Reject   : int  NA NA NA NA NA NA NA NA NA NA ...
  Classes 'tbl_df', 'tbl' and 'data.frame': 160 obs. of  7 variables:
   $ Session  : num  1 1 1 1 1 1 1 1 1 1 ...
   $ Treatment: Factor w/ 2 levels "Voice","Voice+Punishment": 2 2 2 2 2 2 2 2 2 2 ...
   $ Subject  : num  101 102 103 104 105 106 107 108 109 110 ...
   $ Role     : Factor w/ 2 levels "Proposer","Responder": 1 1 1 1 1 1 1 1 1 1 ...
   $ Group    : num  101 102 103 104 105 106 107 108 109 110 ...
   $ Offer    : int  2 3 5 2 2 3 0 1 4 2 ...
   $ Reject   : int  NA NA NA NA NA NA NA NA NA NA ...

Select

  • We are not interested in Session anymore
    • We can infer it from the unique identifier of Subject
      • 10x = Session 1, 20x = Session 2, …
  • We also want to reorder columns in a meaningful way
  # A tibble: 160 x 6
     Group Subject Role     Treatment        Offer Reject
     <dbl>   <dbl> <fct>    <fct>            <int>  <int>
   1   101     101 Proposer Voice+Punishment     2     NA
   2   102     102 Proposer Voice+Punishment     3     NA
   3   103     103 Proposer Voice+Punishment     5     NA
   4   104     104 Proposer Voice+Punishment     2     NA
   5   105     105 Proposer Voice+Punishment     2     NA
   6   106     106 Proposer Voice+Punishment     3     NA
   7   107     107 Proposer Voice+Punishment     0     NA
   8   108     108 Proposer Voice+Punishment     1     NA
   9   109     109 Proposer Voice+Punishment     4     NA
  10   110     110 Proposer Voice+Punishment     2     NA
  # … with 150 more rows

Rename

  • In our experiment, we want to clearly indicate that Subject and Group are unique identifiers
  # A tibble: 160 x 6
     Group.ID Subject.ID Role     Treatment        Offer Reject
        <dbl>      <dbl> <fct>    <fct>            <int>  <int>
   1      101        101 Proposer Voice+Punishment     2     NA
   2      102        102 Proposer Voice+Punishment     3     NA
   3      103        103 Proposer Voice+Punishment     5     NA
   4      104        104 Proposer Voice+Punishment     2     NA
   5      105        105 Proposer Voice+Punishment     2     NA
   6      106        106 Proposer Voice+Punishment     3     NA
   7      107        107 Proposer Voice+Punishment     0     NA
   8      108        108 Proposer Voice+Punishment     1     NA
   9      109        109 Proposer Voice+Punishment     4     NA
  10      110        110 Proposer Voice+Punishment     2     NA
  # … with 150 more rows

Filter

  • Select only the data that refer to the Proposer and get rid of column Reject which is empty for the Proposer
  # A tibble: 80 x 5
     Group.ID Subject.ID Role     Treatment        Offer
        <dbl>      <dbl> <fct>    <fct>            <int>
   1      101        101 Proposer Voice+Punishment     2
   2      102        102 Proposer Voice+Punishment     3
   3      103        103 Proposer Voice+Punishment     5
   4      104        104 Proposer Voice+Punishment     2
   5      105        105 Proposer Voice+Punishment     2
   6      106        106 Proposer Voice+Punishment     3
   7      107        107 Proposer Voice+Punishment     0
   8      108        108 Proposer Voice+Punishment     1
   9      109        109 Proposer Voice+Punishment     4
  10      110        110 Proposer Voice+Punishment     2
  # … with 70 more rows

Filter (ii)

  • Select only the data that refer to the Proposer and get rid of column Offer which is empty for the Responder
  # A tibble: 80 x 5
     Group.ID Subject.ID Role      Treatment        Reject
        <dbl>      <dbl> <fct>     <fct>             <int>
   1      101        121 Responder Voice+Punishment      1
   2      102        122 Responder Voice+Punishment      0
   3      103        123 Responder Voice+Punishment      0
   4      104        124 Responder Voice+Punishment      0
   5      105        125 Responder Voice+Punishment      0
   6      106        126 Responder Voice+Punishment      0
   7      107        127 Responder Voice+Punishment      1
   8      108        128 Responder Voice+Punishment      1
   9      109        129 Responder Voice+Punishment      0
  10      110        130 Responder Voice+Punishment      0
  # … with 70 more rows

Join

  • We want to create a database in which each row collects the observations within a Group
    • Both the offer and the reject choice
  # A tibble: 80 x 4
     Group.ID Treatment        Offer Reject
        <dbl> <fct>            <int>  <int>
   1      101 Voice+Punishment     2      1
   2      102 Voice+Punishment     3      0
   3      103 Voice+Punishment     5      0
   4      104 Voice+Punishment     2      0
   5      105 Voice+Punishment     2      0
   6      106 Voice+Punishment     3      0
   7      107 Voice+Punishment     0      1
   8      108 Voice+Punishment     1      1
   9      109 Voice+Punishment     4      0
  10      110 Voice+Punishment     2      0
  # … with 70 more rows

Spread

  • The spreading results is untidy data
    • However, it can be useful as intermediate step
    • e.g., to compute payoffs
  # A tibble: 160 x 4
     Group.ID Treatment        Key   Value
        <dbl> <fct>            <chr> <int>
   1      101 Voice+Punishment Offer     2
   2      102 Voice+Punishment Offer     3
   3      103 Voice+Punishment Offer     5
   4      104 Voice+Punishment Offer     2
   5      105 Voice+Punishment Offer     2
   6      106 Voice+Punishment Offer     3
   7      107 Voice+Punishment Offer     0
   8      108 Voice+Punishment Offer     1
   9      109 Voice+Punishment Offer     4
  10      110 Voice+Punishment Offer     2
  # … with 150 more rows

Thank you

🦖

To contact me just write me an email

matteo.ploner@unitn.it

or write me on the forum of the course

Appendix

Assignments

Assignment 1

  1. Create these two datasets
  # A tibble: 5 x 4
    Outcome     a     b     c
      <int> <int> <int> <dbl>
  1       1     1     5     1
  2       2     2     4     3
  3       3     3     3     5
  4       4     4     2     7
  5       5     5     1     9
  # A tibble: 5 x 4
    Outcome     d     e     f
      <int> <int> <int> <dbl>
  1       1     5     1     1
  2       2     4     2     5
  3       3     3     3     9
  4       4     2     4    13
  5       5     1     5    17
  1. Join them in a unique dataset using Outcome

  2. Create a new column sum_ad with the sum of a and f

  3. Create a new dataset looking like this

  # A tibble: 3 x 4
    Outcome     a     f sum_ad
      <int> <int> <dbl>  <dbl>
  1       1     1     1      2
  2       3     3     9     12
  3       5     5    17     22

Assignment 2

  • Take the data from the UG (included here) and compute the payoffs in the game
Download UG.csv

References

Wickham, Hadley. 2014. “Tidy Data.” The Journal of Statistical Software 59 (10). http://www.jstatsoft.org/v59/i10/.

Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. " O’Reilly Media, Inc.".


  1. A condensed online version available here.