Model the Data Generation Process

Elijah Galvan

September 1, 2023

12 min read

Lesson

Goal During this Stage

We want to mathematically model the data generation process - in other words, develop an equation that predicts what all people will decide in each trial.

Warning

Mathematically modeling the data generation process is the most difficult step in the computational modeling process: there is no objectively correct way to go about it. With that said, I believe that this is achievable in a timely manner when given the correct goals and a useful framework.

How to Achieve this Goal

1. Create Construct Value Functions - for each motive you have identified, create a function which takes Decisions, Independent Variables, and Constants as inputs, and outputs a value which reflects the Construct Value of such a motive based on Decisions.

Conceptual Examples of Construct Value Formulations

Here the Decision (the chosen option) is represented as Decision, the option which most closely follows the norm is represented as Norm, and the option which most strongly violates the norm is represented as Maximum Norm Violation.

Note

All differences are set to the absolute value. The logic of this choice is that the direction of the norm violation is not important: violating a norm in either direction results in disutility. If norm violation in one direction could result in a positive utility, this should not be done. However, in most cases this is appropriate (albiet unnecessary from a design perspective since usually such deviations are not possible).

Raw Norm Violation = (|Norm - Decision|)/(Maximum Norm Violation)

  • If any violation of the norm results in disutility

Unidirectional Norm Violation = (|max((Norm - Decision), 0)|)/(Maximum Norm Violation)

  • If the Norm is violated in one direction but not another

  • Could also use min

  • The second argument (0 here) can be anything, though 0 is usually the most meaningful

Normalized Norm Violation = ((|Norm - Decision|)/(|Maximum Norm Violation - Norm|))

  • If all choices result in a norm being violated to some extent, disutility is not experienced by choosing the closest value to the norm

Squared Normalized Norm Violation = ((Norm - Decision)/(Maximum Norm Violation - Norm))²

  • If smaller norm violations are less significant than a linear relationship would suggest

Square Root Normalized Norm Violation = sqrt((Norm - Decision)/(Maximum Norm Violation - Norm))

  • If smaller norm violations are more significant than a linear relationship would suggest

Note

Adherance = 1 - Violation

If you plot the output (let’s call this our Construct Values ) against Decisions for a few trials (with a few example combinations of Independent Variables ) and it makes sense to you, then great job! Otherwise, think about how you can fix it so that it will make sense.

  1. Identify Relevant Free Parameters for each Construct Value - identify which Free Parameters are relevant for each Construct Value in determining Utility

  2. Propose a Utility Equation - identify how each Construct Value mathematically interacts with the relevant Free Parameters to determine Utility.

General Utility Equation Formulation

Note

In most cases, you should not apply a nonlinear transformation to your Free Parameters. This is because the value of Free Parameters becomes uninterpretable.

Utility =

( Utility Source 1 × Relevant Free Parameters ) +

( Utility Source 2 × Relevant Free Parameters ) + … +

( Utility Source N × Relevant Free Parameters )

Examples of Utility Equations

Note

SN is shorthand for Social Norm - when there are multiple social norms we use numbers to demarcate.

Adh is shorthand for Adherance. Vio is shorthand for Violation.

1 Parameter

Utility = Payout Adh × Θ - (1 - Θ ) × SN Vio

Θ = [0, 1]

Payout Adh = Normalized Norm Adherance = [0, 1]

SN1 Vio = Squared Normalized Norm Violation = [0, 0.25]

2 Parameter

Utility = Payout Adh × Θ + (1 - Θ ) × ( Φ × SN1 Adh + (1 - Φ ) × SN2 Adh )

Θ = [0, 1]

Φ = [0, 1]

Payout Adh = Squared Normalized Norm Adherance = [0, 1]

SN1 Adh = Squared Normalized Norm Adherance = [0, 1]

SN2 Adh = Squared Normalized Norm Adherance = [0, 1]

Note

This model was used in multiplayer choice. Thus, SN1 Adh and SN2 Adh were computed as 1 - (sum(Norm Violation for Each Player²)/sum(Maximum Norm Violation for Each Player²)).

Utility = Payout Adh × Θ - (1 - Θ ) × ( Φ × SN1 Vio + (1 - Φ ) × SN2 Vio )

Θ = [0, 0.5]

Φ = [0, 1]

Payout Adh = Normalized Norm Adherance = [0, 1]

SN1 Vio = Squared Normalized Norm Violation = [0, 0.25]

SN2 Vio = Squared Normalized Norm Violation = [0, 0.25]

Utility = Payout Adh × Θ - (1 - Θ ) × min( SN1 Vio + Φ , SN2 Vio - Φ )

Θ = [0, 0.5]

Φ = [-0.1, 0.1]

Payout Adh = Normalized Norm Adherance = [0, 1]

SN1 Vio = Squared Normalized Norm Violation = [0, 0.25]

SN2 Vio = Squared Normalized Norm Violation = [0, 0.25]

Tutorials

Note

If you want to have practice finding the correct model, leave the first two dropdowns alone and skip to the third dropdown to compare your Utility equation to those used in the existing papers.

Otherwise, check the answers in the dropdowns below. Please be aware that no examples are given in the documentation for alternative models.

Tutorial 1 - van Baar, Chang, & Sanfey, 2019

Create Construct Value Functions
Greed

The extent to which one has behaved greedily can be expressed as the proportion of how much they decided to keep for themselves out of how much they could have kept for themselves (i.e the extent to which they maximized their payout).

In the Trust Game, the maximum amount that the Trustee can keep for themselves is what they received, namely: Investment × Multiplier. And, therefore, what they Keep is ( Investment × Multiplier ) - Returned

Thus, the extent to which one has maximized their payout is:

Payout Maximization = Keep / (Investment × Multiplier )

Since Keep can range from 0 to Investment × Multiplier, it ranges from 0 to 1, inclusive.

payout_maximization = function(investment, multiplier, returned){
    return(((investment * multiplier) - returned)/(investment * multiplier))
}
function value = payout_maximization(investment, multiplier, returned)
    value = ((investment * multiplier) - returned) / (investment * multiplier);
end
def payout_maximization(investment, multiplier, returned):
    return ((investment * multiplier) - returned) / (investment * multiplier)
Inequity Aversion

Equity is creating an equal division of money in the game. Thus, the extent to the principle of equity has been violated can be expressed as the difference between perfect equity (the norm) and the actual division.

In the Trust Game, the Trustee’s payout is what they Keep which is ( Investment × Multiplier ) - Returned while the Investor’s payout is what they did not invest which is ( Endowment - Investment ). If the Trustee has half of the money in the game, Keep is half of all of the money in the game - the sum of the multiplied investment ( Investment × Multiplier ) and what the Investor did not invest ( Endowment - Investment ).

Note

There are cases where the Investor does not invest enough for the Trustee to achieve Equity: in the paper they elected for the raw norm violation rather than the normalized norm violation so we’ll do the same (although I can confirm that this doesn’t affect the results). They also chose a squared formulation based on previous literature.

Thus, the extent to which inequity was created (i.e. one violated the principle of equity) is:

Inequity = (0.5 - ( Keep / ( Endowment - Investment + Investment × Multiplier )))²

Since Keep can range from 0 to Investment × Multiplier (when Endowment - Investment = 0), the maximum difference can be 0.5 which when squared is 0.25. Thus, Inequity can range from 0 to 0.25, inclusive.

inequity = function(investment, multiplier, returned, endowment){
    return((((investment * multiplier - returned)/(investment * multiplier + endowment - investment)) - 0.5)**2)
}
function value = inequity(investment, multiplier, returned, endowment)
    value = (((investment * multiplier - returned)/(investment * multiplier + endowment - investment)) - 0.5)^2;
end
def inequity(investment, multiplier, returned, endowment):
    return((((investment * multiplier - returned)/(investment * multiplier + endowment - investment)) - 0.5)**2)
Guilt Aversion

Guilt is experienced by violating expectations: in this case, the norm is to give half of what one receives. Thus, the extent to which one has violated the social norm can be expressed as the difference between the expected return on investment and the actual return on investment.

In the experiment, Believed Multiplier was a constant - it was always 4 and let’s adopt the assumption (which was supported in the data) that Trustees believed that Investor’s expected to received half of the multiplied investment. Thus, the expectation can be expressed as ( Investment × Believed Multiplier )/2.

Note

Theoretically, giving more than ( Investment × Believed Multiplier )/2 is represented as a disutility - theoretically caused by an experience of guilt. Of course this seems unreasonable but let’s play this out further - (( Investment × Believed Multiplier )/2) - Returned can actually be equal to Investment × Believed Multiplier. This could be very problematic: Guilt can only range from 0 to 0.25 but Guilt can range from 0 to 1.

Obviously, this is not a huge problem because the model entirely overlooks the possibility that guilt averse people would give more than half of Investment × Believed Multiplier or that inequity averse people would give more than half of Investment × Multiplier which seems reasonable. But still, let’s think of what an alternative formulation would be.

What’s a reasonable alternative formulation?

The answer would be to apply a unidirectional formulation: max(((( Investment × Believed Multiplier )/2) - Returned ), 0)

What’s wrong with this alternative formulation?

The answer would be that it is nonspecific: any return value greater than or equal to ( Investment × Believed Multiplier )/2 results in the exact same disutility (i.e. 0). Specificity is a highly, highly important feature of these models: you need to ensure than models make distinct predictions as much as is possible.

Remember that there is always often tradeoff between specificty, parsimony, and theoretical correctness.

We can fix this by changing the denominator from Investment × Believed Multiplier to Investment × Multiplier - guilt can now only range from 0 to 0.25

Thus with this representation of the norm, then the extent to which it was violated is:

Guilt = (((( Investment × Believed Multiplier )/2) - Returned ) /( Investment × Multiplier ))²

guilt = function(investment, believed_multiplier, returned, multiplier){
    return((((investment * believed_multiplier * 0.5) - returned)/(investment * believed_multiplier))**2)
}
function value = guilt(investment, believed_multiplier, returned, multiplier)
    value = (((investment * believed_multiplier)/2 - returned) / (investment * believed_multiplier))^2;
end
def guilt(investment, believed_multiplier, returned, multiplier):
    return ((((investment * believed_multiplier)/2 - returned) / (investment * believed_multiplier))**2)
Identify Relevant Free Parameters for each Construct Value
  1. Payout Maximization - D1

  2. Equity Achieved - D1 & D2

  3. Expectation Meeting - D1 & D2

Note

Why do we use (1-D1 ) and (1-D2 )?

Each dimension we have created is mathematically arbitrary: the fact that greed is endorsed at high values of D1 is a consequence of our choice. It could just as reasonably be that greed is endorsed at low values of D1.

The dimension we created dichtomize one preference against another: thus, we can just as reasonably take the inverse.

Propose a Utility Equation

Utility = Payout_Maximization × Θ - (1 - Θ ) × min( Guilt + Φ , Inequity - Φ )

Tutorial 2 - Galvan & Sanfey, 2024

Create Construct Value Functions

Note

As you may have noticed, our conceptions of redistribution preferences treats redistribution itself (i.e. the selected tax rate) as a means by which people achieve certain outcomes - maximizing payout, producing equality, or producing equity. Thus, we need to write a function called new_value which computes each person’s outcome under a given tax rate, given their Initial Allocation.

First, we take this player’s Initial Allocation and we subtract it from the amount that they lose to taxation under a given Tax Rate. Then, we add this to what they receive as a redistributed amount under a given Tax Rate - this is their Outcome. What they receive as a redistributed amount under a given Tax Rate is the Tax Rate times the Number of Tokens in Game, divided by the Number of Players in Game. The Number of Tokens in Game is a constant (100) as is the Number of Players in Game (10). And, since we are only dealing in whole tokens, we need to round this number to the nearest integer.

new_value = function(initial_allocation, tax_rate_decimal, number_tokens_game = 100, number_players_game = 10){
    return(round(initial_allocation - (tax_rate_decimal * initial_allocation) + ((number_tokens_game * tax_rate_decimal)/(number_players_game))))
}
function new_value = calculate_new_value(initial_allocation, tax_rate_decimal, number_tokens_game, number_players_game)
    if nargin < 3
        number_tokens_game = 100;
    end
    if nargin < 4
        number_players_game = 10;
    end

    new_value = round(initial_allocation - (tax_rate_decimal * initial_allocation) + ((number_tokens_game * tax_rate_decimal) / number_players_game));
end
def calculate_new_value(initial_allocation, tax_rate_decimal, number_tokens_game=100, number_players_game=10):
    return round(initial_allocation - (tax_rate_decimal * initial_allocation) + ((number_tokens_game * tax_rate_decimal) / number_players_game))
Payout-Maximization

The extent to which one has engaged in Payout-Maximization can be expressed as the proportion of how much they decided to keep for themselves out of how much they could have kept for themselves. We’ll take, as an argument, the potential outcomes for oneself for all possible Tax Rates.

payout_maximization = function(chosen_outcome_self, possible_outcomes_self){
    return(chosen_outcome_self/max(possible_outcomes_self))
}
Equality-Seeking

The extent to which people have engaged in Equality-seeking is the extent to which they have redistributed with a tax rate of 100%. However, focusing on Outcomes for all players, it is characterized as the extent to which one has minimized inequality for all players. Thus, Perfect Equality Outcomes would be where all players had the same amount - i.e. the Number of Tokens in Game (100) divided by the Number of Players in Game (10). Consequently, produced inequality is formulated as the sum of squared errors between Chosen Outcomes and Perfect Equality Outcomes divded by the sum of squared errors between Perfect Inequality Outcomes and Perfect Equity Outcomes. Then, we conceive of produced equality as the inverse of produced inequality: if produced inequality is 1 (i.e. the highest possible value) then produced equality is 0 while if produced inequality is 0 (i.e. the lowest possible value) then produced equality is 1 (the highest possible value).

equality = function(chosen_outcomes_all, intial_allocations_all, perfect_equality = 100/10){
    return((1 - sum((chosen_outcomes_all - perfect_equality)**2)/sum((intial_allocations_all - perfect_equality)**2)))
}
Equity-Seeking

The extent to which people have engaged in Equity-seeking is the extent to which they have not redistributed. However, focusing on Outcomes for all players, it is characterized as the extent to which one has minimized inequity for all players. Thus, Perfect Equality Outcomes is equivalent to Perfect Inequity Outcomes would be where all players had the same amount - i.e. the Number of Tokens in Game (100) divided by the Number of Players in Game (10). Consequently, produced inequality is formulated as the sum of squared errors between Outcomes and Perfect Equity (i.e. Initial Allocations) divded by the sum of squared errors between Perfect Equity and the Perfect Inequity. Then, we conceive of produced equity as the inverse of produced inequity: if produced inequity is 1 (i.e. the highest possible value) then produced equity is 0 while if produced inequity is 0 (i.e. the lowest possible value) then produced equity is 1 (the highest possible value).

equity = function(chosen_outcomes_all, intial_allocations_all, perfect_inequity = 100/10){
    return((1 - sum((chosen_outcomes_all - intial_allocations_all)**2)/sum((perfect_equality - intial_allocations_all)**2)))
}
Identify Relevant Free Parameters for each Construct Value
  1. Payout-Maximization - D1

  2. Equity-Seeking - D1 & D2

  3. Equality-Seeking - D1 & D2

Propose a Utility Equation

Utility = Payout_Maximization × Θ + (1 - Θ ) × (( Equality + Φ ) + Equity × ( 1 - Φ ))

Tutorial 3 - Crockett et al., 2014

Create Construct Value Functions
Harm Aversion

Here, the Harm caused by a certain choice is the relative change in the number of shocks experienced.

harm = function(shocksThisChoice, shocksOtherChoice){
    return(shocksThisChoice - shocksOtherChoice)
}
Payout-Maximization

Here, the extent to which one has maximized Payout is the relative change in the amount of money one has received.

payout = function(moneyThisChoice, moneyOtherChoice){
    return(moneyThisChoice - moneyOtherChoice)
}
Identify Relevant Free Parameters for each Construct Value
  1. Harm - D1 & D2

  2. Payout - D1 & D2

Propose a Utility Equation

Utility Difference = ( κ × Payout) + ((1 - κ ) × Harm)

Tutorial 4 - Li et al., 2022

Create Construct Value Functions
Inequality Aversion

Here, Inequality is the absolute value of the difference between Player A’s and Player B’s payouts for each Choice.

inequality = function(a1, b1, a2, b2){
    d_Inequality = abs(a2 - b2) - abs(a1 - b1)
    return(d_Inequality)
}
Harm Aversion

Harm is defined as the amount taken away from the more advantaged Player for each Choice.

harm = function(a0, b0, a1, b1, a2, b2){
    initial = c(a0, b0)
    choice1 = c(a1, b1)
    choice2 = c(a2, b2)
    if (a0 == b0){advantaged = 1} else {advantaged = which(initial == max(initial))}
    d_lossAdvantaged = (initial[advantaged] - choice1[advantaged]) - (initial[advantaged] - choice2[advantaged])
    return(d_lossAdvantaged)
}
Rank Reversal Aversion

Finally, Rank Reversal is defined as whether or not the more initially advantaged Player is less advantaged in the current Choice.

rankReverse = function(a0, b0, a1, b1, a2, b2){
    if (a0 == b0){return(0)}
    d_initial = a0 - b0
    d_choice1 = a1 - b1
    d_choice2 = a2 - b2
    if (d_initial > 0){
        if (d_choice1 < 0){choice1Reversed = 1} else {choice1Reversed = 0}
        if (d_choice2 < 0){choice2Reversed = 1} else {choice2Reversed = 0}
    } else if (d_initial < 0){
        if (d_choice1 > 0){choice1Reversed = 1} else {choice1Reversed = 0}
        if (d_choice2 > 0){choice2Reversed = 1} else {choice2Reversed = 0}
    }
    return(choice1Reversed - choice2Reversed)
}
Identify Relevant Free Parameters for each Construct Value
  1. Inequality Aversion - D1

  2. Harm Aversion - D2

  3. Rank Reversal Aversion - D3

Propose a Utility Equation

Utility Difference = ( α × Inequality) - ( δ × Harm) - ( ρ × Rank Reversal)