Test Modulatory Hypotheses
Lesson
Goal During this Stage
To see how Conditions affect Free Parameters.
Note
In experiments where you use multiple Conditions, you will end up recovering Free Parameters for seperate Conditions. Thus, you may wish to use these Free Parameters as a proxy for how preferences change as a function of your Condition - these would be Modulatory Hypotheses (i.e. how Conditions modulate the data generation process).
Why Fitting Per Condition is Less Preferrable
Although there are definitely use cases for fitting a model over multiple different Conditions (what we will learn below), when possible it is typically better to integrate multiple Conditions within the same model. Namely, when it is believed that Conditions will modulate individuals’ Preferences and there are more than two Conditions, I believe that models which internally hypothesize how these Preferences change across Conditions are superior. In general, there are three reasons that it is not desirable to even consider models fit separately per Condition when there are more than two Conditions.
Inability to Correctly Identify the Correct Model
Simulations from a hypothetical Dictator Game experiment are presented here which test the ability of model comparison to correctly identify the model that generated the data. A model split over Conditions often outperforms the model that actually generated the data even though it is more parsimonious under several different circumstances. This means, depending on (a) the assumptions of your Parameter estimation process, (b) the features of your data, and (c) the power in your sample, including a per condition model in your model set may obfuscate the real model.
Lack of Predictive Value
We run experiments to gain insights which, we hope, generalize beyond the laboratory. One key advantage of computational models is that they enable us to generate novel predictions based on known Preferences to new situations within the laboratory - moving us closer to this goal of external validation. However, for models fit seperately per Condition, if we demonstrate that this model is superior via model testing what we have also demonstrated is that Free Parameters do not generalize to new contexts. Thus, the model has no predictive value in new Conditions which further undermines its’ generalizability.
Theoretical Deficits
Similar to its lack of predictive value, a model fit separately between Conditions misses an opportunity to hone theoretical knowledge. A key reason that we assess how Preferences change between Conditions is typically because we hypothesize that Conditions are distinct in articulable ways. Testing these hypothesized differences is a key advantage of computational modeling, but a model fit separately between Conditions does not enable this hypothesis to be tested.
How to Achieve this Goal
Preliminary Validation
Before you jump to testing, there are also a few more new things that we have to check. Let’s take a look at each.
Robust and Reliable Free Parameters
As we said before, we have to prove that our recovery of Free Parameters is robust. This is a step above what we previously did - essentially, fivefold validation allowed us to rule out the idea that our Free Parameters were overfitted meaning that our model wasn’t performing so well because it was just capturing little quirks in the data. We need to take this a step further to use these directly in statistical analyses: we need to show that treating recovered Free Parameters as a continuous scale measure is appropriate. If the following are false, you should be okay to proceed:
Your utility equation applies a nonlinear transformation to your Free Parameters - this means that an increase in one unit of your Free Parameter scale is not equal for all values of the Free Parameter so this analysis is probably inappropriate
The recovery of your Free Parameter that you want to test is independent of the other Free Parameters in your model - if your Free Parameter values only interact with other Free Parameters you will have to apply a transformation to account for this dependency (see tutorial 2 for an example of this)
Remember that assumption of homoscedasticity that we said wasn’t super important? Well, now it is. If your data is heteroscedastic, recovery of Free Parameters could be differentially overfit or underfit at certain values of the Independent Variable which makes these Free Parameters unreliable. You will have to re-estimate your Free Parameters using an alternative estimator (i.e. Robust Maximum Likelihood Estimation or Weighted Least Squares)
Meaningful Condition Differences
So you’ve now shown that your Free Parameters are robust and reliable - what’s left to do other than test? Something really important actually: you have to prove that you are even justified in recovering different Free Parameters in each Condition. Even if you show that Free Parameters are meaningfully different across Conditions, the test results are not valid if you have not proven that the Decisions that Subjects make differ between Conditions.
So, we’re going to go back and create a model which does not differentiate between Conditions - training all of the data at once. Since our demo did not have a design with multiple Conditions, we’ll create a complete example here.
obj_function_aao = function(params, decisions, method = "OLS") {
Parameter1 = params[1]
Parameter2 = params[2]
trialList = #must redefine and also must be of the same length as decisions
predicted_utility = vector('numeric', length(trialList[,1]))
observed_utility = vector('numeric', length(trialList[,1]))
for (k in 1:length(trialList[,1])){
IV = trialList[k, 1]
Constant = trialList[k, 2]
Choices = #something
Utility = vector('numeric', length(Choices))
for (n in 1:length(Choices)){
Utility[n] = utility(Parameter1, Parameter2, construct1(IV, Constant, Choices[n]), construct2(IV, Constant, Choices[n]), construct3(IV, Constant, Choices[n]))
}
predicted_utility[k] = max(Utility)
observed_utility[k] = Utility[chosen[k]]
}
if (method == "OLS"){
return(sum((predicted_utility - observed_utility)**2))
} else if (method == "MLE"){
return(-1 * sum(dnorm(observed_utility, mean = predicted_utility, sd = sd, log = TRUE)))
}
}
for (i in 1:length(included_subjects)){
datafile = paste(parentfolder, included_subjects[i], restoffilepath, sep = '') # produces a character vector 'parentfolder/included_subjects[i]**.filetype'
df = read.csv2(datafile) #this will have variables called IV, Decisions, Condition, and information about the original order of trials (i.e. trialsTask.thisIndex) - it will also have information about the number of blocks
df$Prediction = vector('numeric', length(df$IV))
Par1_PerCondition = vector('numeric', length(levels(df$Condition)))
Par2_PerCondition = vector('numeric', length(levels(df$Condition)))
SS_PerCondition = vector('numeric', length(levels(df$Condition)))
Deviance_PerCondition = vector('numeric', length(levels(df$Condition))) #to calculate NLL later
for (c in 1:length(levels(df$Condition))){
result = fmincon(obj_function,x0 = initial_params, A = NULL, b = NULL, Aeq = NULL, beq = NULL,
lb = lower_bounds, ub = upper_bounds,
df = df)
#Just Added
closestPoint = which(as.numeric(freeParameters[,1]) == as.numeric(round(result$par[1])) & as.numeric(freeParameters[,2]) == as.numeric(round(result$par[2])))
Prediction = vector('numeric')
for (k in 1:length(df$Decisions)){
Utility = vector('numeric', length(Choices))
for (n in 1:length(Choices)){
Utility[n] = utility(parameter1 = results$par[1],
parameter2 = results$par[2],
construct1 = construct1(df$IV[k], df$Constant[k], Choices[n]),
construct2 = construct2(df$IV[k], df$Constant[k], Choices[n])),
construct3 = construct3(df$IV[k], df$Constant[k], Choices[n])
}
correct_choice = which(Utility == max(Utility))
if (length(correct_choice) > 1){
correct_choice = correct_choice[sample(correct_choice, 1)]
}
Prediction[k] = Choices[correct_choice]
}
Deviance_PerCondition[c] = dnorm(df$Decision, mean = Prediction)
SS_PerCondition[c] = sum((df$Decision - Prediction)**2)
df$Prediction[which(df$Condition == levels(df$Condition)[c])] = Prediction
}
NLL_PerCondition = -2 * log(sum(Deviance_PerCondition))
result = fmincon(obj_function_aao,x0 = initial_params, A = NULL, b = NULL, Aeq = NULL, beq = NULL,
lb = lower_bounds, ub = upper_bounds,
decisions = df$Decisions)
df$PredictionAAO = vector('numeric')
for (k in 1:length(df$Decisions)){
Utility = vector('numeric', length(Choices))
for (n in 1:length(Choices)){
Utility[n] = utility(parameter1 = results$par[1],
parameter2 = results$par[2],
construct1 = construct1(df$IV[k], df$Constant[k], Choices[n]),
construct2 = construct2(df$IV[k], df$Constant[k], Choices[n])),
construct3 = construct3(df$IV[k], df$Constant[k], Choices[n])
}
correct_choice = which(Utility == max(Utility))
if (length(correct_choice) > 1){
correct_choice = correct_choice[sample(correct_choice, 1)]
}
df$PredictionAAO[k] = Choices[correct_choice]
}
NLL_AAO = -2 * log(sum(dnorm(df$Decision, mean = df$Prediction)))
SS_AAO = sum((df$Decision - df$Prediction)**2)
subjectData[i, ] = c(included_subjects[i], sum(SS_PerCondition), NLL_PerCondition, SS_AAO, NLL_AAO,
Par1_PerCondition, Par2_PerCondition, result$par[1], result$par[2])
start = length(subjectData[, 1]) + 1
end = start + length(df$Decisions)
trialData[start:end, 1] = included_subjects[i]
trialData[start:end, 2] = df$IV
trialData[start:end, 3] = df$Constant
trialData[start:end, 4] = df$Decision
trialData[start:end, 5] = df$Condition
trialData[start:end, 6] = df$Prediction
trialData[start:end, 7] = df$PredictionAAO
}
colnames(subjectData) = c('SubjectID', 'modelSS_PerCondition', 'modelNLL_PerCondition', 'modelSS_AllAtOnce', 'modelNLL_AllAtOnce',
'Parameter1_Condition1', ..., 'Parameter2_Condition1', ..., 'Parameter1_AllAtOnce', 'Parameter2_AllAtOnce')
#levels(df$Condition) will always be in the same order for all subjects so conditions will be saved in the same columns
colnames(trailData) = c('SubjectID', 'IV', 'Constant', 'Decision', 'Prediction_PerCondition', 'Prediction_AllAtOnce')
subjectData$AIC_PerCondition = length(df$IV) * log(subjectData$SS_PerCondition/length(df$IV)) + 2 * 2 * (length(levels(df$Condition)))
subjectData$AIC_AllAtOnce = length(df$IV) * log(subjectData$SS_AllAtOnce/length(df$IV)) + 2 * 2 * (length(levels(df$Condition)))
t.test(subjectData$AIC_PerCondition, subjectData$AIC_AllAtOnce, paired = T)
function obj_function_aao = obj_function(params, decisions, method)
Parameter1 = params(1);
Parameter2 = params(2);
trialList = % must redefine and also must be of the same length as decisions
predicted_utility = zeros(1, length(trialList(:, 1)));
observed_utility = zeros(1, length(trialList(:, 1)));
for k = 1:length(trialList(:, 1))
IV = trialList(k, 1);
Constant = trialList(k, 2);
Choices = % something
Utility = zeros(1, length(Choices));
for n = 1:length(Choices)
Utility(n) = utility(Parameter1, Parameter2, construct1(IV, Constant, Choices(n)), construct2(IV, Constant, Choices(n)), construct3(IV, Constant, Choices(n)));
end
predicted_utility(k) = max(Utility);
observed_utility(k) = Utility(chosen(k));
end
if strcmp(method, 'OLS')
obj_function_aao = sum((predicted_utility - observed_utility).^2);
elseif strcmp(method, 'MLE')
obj_function_aao = -1 * sum(log(normpdf(observed_utility, predicted_utility, sd)));
end
end
for i = 1:length(included_subjects)
datafile = strcat(parentfolder, included_subjects(i), restoffilepath);
df = readtable(datafile);
df.Prediction = zeros(1, length(df.IV));
Par1_PerCondition = zeros(1, length(unique(df.Condition)));
Par2_PerCondition = zeros(1, length(unique(df.Condition)));
SS_PerCondition = zeros(1, length(unique(df.Condition)));
Deviance_PerCondition = zeros(1, length(unique(df.Condition)));
for c = 1:length(unique(df.Condition))
result = fmincon(@obj_function, initial_params, [], [], [], [], lower_bounds, upper_bounds, df);
closestPoint = find(freeParameters(:, 1) == round(result(1)) & freeParameters(:, 2) == round(result(2)));
Prediction = zeros(1, length(df.Decisions));
for k = 1:length(df.Decisions)
Utility = zeros(1, length(Choices));
for n = 1:length(Choices)
Utility(n) = utility(result(1), result(2), construct1(df.IV(k), df.Constant(k), Choices(n)), construct2(df.IV(k), df.Constant(k), Choices(n)), construct3(df.IV(k), df.Constant(k), Choices(n)));
end
correct_choice = find(Utility == max(Utility));
if length(correct_choice) > 1
correct_choice = correct_choice(randi(length(correct_choice)));
end
Prediction(k) = Choices(correct_choice);
end
Deviance_PerCondition(c) = normpdf(df.Decision, Prediction);
SS_PerCondition(c) = sum((df.Decision - Prediction).^2);
df.Prediction(df.Condition == unique(df.Condition)(c)) = Prediction;
end
NLL_PerCondition = -2 * sum(log(Deviance_PerCondition));
result = fmincon(@obj_function_aao, initial_params, [], [], [], [], lower_bounds, upper_bounds, df.Decisions);
df.PredictionAAO = zeros(1, length(df.Decisions));
for k = 1:length(df.Decisions)
Utility = zeros(1, length(Choices));
for n = 1:length(Choices)
Utility(n) = utility(result(1), result(2), construct1(df.IV(k), df.Constant(k), Choices(n)), construct2(df.IV(k), df.Constant(k), Choices(n)), construct3(df.IV(k), df.Constant(k), Choices(n)));
end
correct_choice = find(Utility == max(Utility));
if length(correct_choice) > 1
correct_choice = correct_choice(randi(length(correct_choice)));
end
df.PredictionAAO(k) = Choices(correct_choice);
end
NLL_AAO = -2 * sum(log(normpdf(df.Decision, df.Prediction)));
SS_AAO = sum((df.Decision - df.Prediction).^2);
subjectData(i, :) = [included_subjects(i), sum(SS_PerCondition), NLL_PerCondition, SS_AAO, NLL_AAO, Par1_PerCondition, Par2_PerCondition, result(1), result(2)];
start = size(subjectData, 1) + 1;
endIdx = start + length(df.Decisions) - 1;
trialData(start:endIdx, 1) = included_subjects(i);
trialData(start:endIdx, 2) = df.IV;
trialData(start:endIdx, 3) = df.Constant;
trialData(start:endIdx, 4) = df.Decision;
trialData(start:endIdx, 5) = df.Condition;
trialData(start:endIdx, 6) = df.Prediction;
trialData(start:endIdx, 7) = df.PredictionAAO;
end
subjectData.Properties.VariableNames = {'SubjectID', 'modelSS_PerCondition', 'modelNLL_PerCondition', 'modelSS_AllAtOnce', 'modelNLL_AllAtOnce', 'Parameter1_Condition1', 'Parameter2_Condition1', 'Parameter1_Condition2', 'Parameter2_Condition2', 'Parameter1_AllAtOnce', 'Parameter2_AllAtOnce'};
trailData.Properties.VariableNames = {'SubjectID', 'IV', 'Constant', 'Decision', 'Prediction_PerCondition', 'Prediction_AllAtOnce'};
subjectData.AIC_PerCondition = length(df.IV) * log(subjectData.modelSS_PerCondition/length(df.IV)) + 2 * 2 * length(unique(df.Condition));
subjectData.AIC_AllAtOnce = length(df.IV) * log(subjectData.modelSS_AllAtOnce/length(df.IV)) + 2 * 2 * length(unique(df.Condition));
ttest(subjectData.AIC_PerCondition, subjectData.AIC_AllAtOnce, 'Paired', true);
def obj_function(params, decisions, method):
Parameter1 = params[0]
Parameter2 = params[1]
trialList = # must redefine and also must be of the same length as decisions
predicted_utility = np.zeros(len(trialList[:, 0]))
observed_utility = np.zeros(len(trialList[:, 0]))
for k in range(len(trialList[:, 0])):
IV = trialList[k, 0]
Constant = trialList[k, 1]
Choices = # something
Utility = np.zeros(len(Choices))
for n in range(len(Choices)):
Utility[n] = utility(Parameter1, Parameter2, construct1(IV, Constant, Choices[n]), construct2(IV, Constant, Choices[n]), construct3(IV, Constant, Choices[n]))
predicted_utility[k] = max(Utility)
observed_utility[k] = Utility[chosen[k]]
if method == 'OLS':
return np.sum((predicted_utility - observed_utility)**2)
elif method == 'MLE':
return -1 * np.sum(np.log(norm.pdf(observed_utility, loc=predicted_utility, scale=sd)))
for i in range(len(included_subjects)):
datafile = parentfolder + included_subjects[i] + restoffilepath
df = pd.read_csv(datafile)
df['Prediction'] = np.zeros(len(df['IV']))
Par1_PerCondition = np.zeros(len(df['Condition'].unique()))
Par2_PerCondition = np.zeros(len(df['Condition'].unique()))
SS_PerCondition = np.zeros(len(df['Condition'].unique()))
Deviance_PerCondition = np.zeros(len(df['Condition'].unique()))
for c in range(len(df['Condition'].unique())):
result = minimize(obj_function, initial_params, args=(df), bounds=list(zip(lower_bounds, upper_bounds)))
closestPoint = np.where((freeParameters[:, 0] == round(result.x[0])) & (freeParameters[:, 1] == round(result.x[1])))
Prediction = np.zeros(len(df['Decisions']))
for k in range(len(df['Decisions'])):
Utility = np.zeros(len(Choices))
for n in range(len(Choices)):
Utility[n] = utility(result.x[0], result.x[1], construct1(df['IV'][k], df['Constant'][k], Choices[n]), construct2(df['IV'][k], df['Constant'][k], Choices[n]), construct3(df['IV'][k], df['Constant'][k], Choices[n]))
correct_choice = np.where(Utility == max(Utility))
if len(correct_choice) > 1:
correct_choice = correct_choice[np.random.choice(len(correct_choice))]
Prediction[k] = Choices[correct_choice[0]]
Deviance_PerCondition[c] = norm.pdf(df['Decision'], Prediction)
SS_PerCondition[c] = np.sum((df['Decision'] - Prediction)**2)
df['Prediction'][df['Condition'] == df['Condition'].unique()[c]] = Prediction
NLL_PerCondition = -2 * np.sum(np.log(Deviance_PerCondition))
result = minimize(obj_function_aao, initial_params, args=(df['Decisions'],), bounds=list(zip(lower_bounds, upper_bounds)))
df['PredictionAAO'] = np.zeros(len(df['Decisions']))
for k in range(len(df['Decisions'])):
Utility = np.zeros(len(Choices))
for n in range(len(Choices)):
Utility[n] = utility(result.x[0], result.x[1], construct1(df['IV'][k], df['Constant'][k], Choices[n]), construct2(df['IV'][k], df['Constant'][k], Choices[n]), construct3(df['IV'][k], df['Constant'][k], Choices[n]))
correct_choice = np.where(Utility == max(Utility))
if len(correct_choice) > 1:
correct_choice = correct_choice[np.random.choice(len(correct_choice))]
df['PredictionAAO'][k] = Choices[correct_choice[0]]
NLL_AAO = -2 * np.sum(np.log(norm.pdf(df['Decision'], df['Prediction'])))
SS_AAO = np.sum((df['Decision'] - df['Prediction'])**2)
subjectData[i, :] = [included_subjects[i], np.sum(SS_PerCondition), NLL_PerCondition, SS_AAO, NLL_AAO, Par1_PerCondition, Par2_PerCondition, result.x[0], result.x[1]]
start = subjectData.shape[0] + 1
endIdx = start + len(df['Decisions']) - 1
trialData[start:endIdx, 0] = included_subjects[i]
trialData[start:endIdx, 1] = df['IV']
trialData[start:endIdx, 2] = df['Constant']
trialData[start:endIdx, 3] = df['Decision']
trialData[start:endIdx, 4] = df['Condition']
trialData[start:endIdx, 5] = df['Prediction']
trialData[start:endIdx, 6] = df['PredictionAAO']
subjectData.columns = ['SubjectID', 'modelSS_PerCondition', 'modelNLL_PerCondition', 'modelSS_AllAtOnce', 'modelNLL_AllAtOnce', 'Parameter1_Condition1', 'Parameter2_Condition1', 'Parameter1_Condition2', 'Parameter2_Condition2', 'Parameter1_AllAtOnce', 'Parameter2_AllAtOnce']
subjectData['AIC_PerCondition'] = len(df['IV']) * np.log(subjectData['modelSS_PerCondition'] / len(df['IV'])) + 2 * 2 * len(df['Condition'].unique())
subjectData['AIC_AllAtOnce'] = len(df['IV']) * np.log(subjectData['modelSS_AllAtOnce'] / len(df['IV'])) + 2 * 2 * len(df['Condition'].unique())
ttest_rel(subjectData['AIC_PerCondition'], subjectData['AIC_AllAtOnce'])
Testing a Modulatory Hypothesis
Now, if we’ve shown that people do indeed make different Decisions in each Condition and are convinced that our Free Parameters can be trusted as a valid continuous measure of Subjects’ preferences, we can now test our modulatory hypotheses. To reiterate, these are hypotheses about directional, group-level differences: you’ve already shown that preferences change and now you want to show how they specifically change as a function of Condition. Occasionally, if you have more than two Conditions, you might also want to first do an omnibus test for Condition effects and then you might also do post-hoc tests after. We’ll show you how to do both of these.
t.test(subjectData$Parameter1_Condition1, subjectData$Parameter1_Condition2, paired = T) #change in parameter 1 between condition 1 and 2
#now we need to put this data in long format to do a linear mixed effects model
omnibusData = data.frame(c(subjectData$Parameter1_Condition1, subjectData$Parameter1_Condition2, subjectData$Parameter1_Condition3),
rep(c('Condition 1', 'Condition 2', 'Condition 3'), each = length(subjectData$SubjectID)),
rep(subjectData$SubjectID, times = 3))
colnames(omnibusData) = c('Parameter1', 'Condition', 'SubjectID')
ombnibusModulatoryEffect = lmer(data = ombnibusData, Parameter1 ~ Condition + (1 | SubjectID)) #our omnnibus test
summary(ombnibusModulatoryEffect) #if omnibus test is signficiant, proceed to post hoc tests below
library(emmeans)
Parameter1_PostHocModulationEffect = emmeans(ombnibusModulatoryEffect, "Condition")
summary(pairs(Parameter1_PostHocModulationEffect)) #post hoc pairwise test across condition
% t-test
[h, p, ci, stats] = ttest(subjectData.Parameter1_Condition1, subjectData.Parameter1_Condition2, 'paired', true);
% Put data in long format
omnibusData = table([subjectData.Parameter1_Condition1; subjectData.Parameter1_Condition2; subjectData.Parameter1_Condition3], ...
repelem({'Condition 1', 'Condition 2', 'Condition 3'}, length(subjectData.SubjectID)), ...
repmat(subjectData.SubjectID, 1, 3), ...
'VariableNames', {'Parameter1', 'Condition', 'SubjectID'});
% Fit linear mixed effects model
omnibusModulatoryEffect = fitlme(omnibusData, 'Parameter1 ~ Condition + (1|SubjectID)');
% Display summary
disp(omnibusModulatoryEffect);
% Perform post hoc tests
Parameter1_PostHocModulationEffect = emmeans(omnibusModulatoryEffect, 'Condition');
disp(pairs(Parameter1_PostHocModulationEffect));
from statsmodels.formula.api import mixedlm
from pingouin import pairwise_tukey
# t-test
t_stat, p_val = sm.stats.ttest_rel(subjectData['Parameter1_Condition1'], subjectData['Parameter1_Condition2'])
# Put data in long format
data_dict = {
'Parameter1': subjectData['Parameter1_Condition1'].append([subjectData['Parameter1_Condition2'], subjectData['Parameter1_Condition3']]),
'Condition': ['Condition 1', 'Condition 2', 'Condition 3'] * len(subjectData['SubjectID']),
'SubjectID': list(subjectData['SubjectID']) * 3
}
omnibusData = pd.DataFrame(data_dict)
# Fit linear mixed effects model
omnibusModulatoryEffect = mixedlm('Parameter1 ~ Condition', omnibusData, groups=omnibusData['SubjectID']).fit()
# Display summary
print(omnibusModulatoryEffect.summary())
# Perform post hoc tests
posthoc_results = pairwise_tukey(data=omnibusData, dv='Parameter1', between='Condition')
print(posthoc_results)
Using Categorical Clusters to Test Modulatory Hypotheses
Sometimes, treating Free Parameters as an outcome measure doesn’t tell us everything that we want to know about the strategies that people use to make Decisions. This is one time where using a priori clustering can come in handy: rather than saying Free Parameter 1 changed a certain amount between Condition 1 and Condition 2 and that Free Parameter 2 changed a certain amount, we summarize behavioral patterns in a way that accounts for both at once and also tells us something about how the prevalence of each strategy changed betweeen Conditions. To do this, we want to use a Chi-Square Test.
subjectData$Strategy_Condition1 = vector('character', length(subjectData$SubjectID))
subjectData$Strategy_Condition2 = vector('character', length(subjectData$SubjectID))
subjectData$Strategy_Condition3 = vector('character', length(subjectData$SubjectID))
subjectData$Strategy_Condition4 = vector('character', length(subjectData$SubjectID))
for (i in 1:length(subjectData$SubjectID)){
subjectData$Strategy_Condition1[i] = freeParameters$Strategy[which(round(freeParameters$Parameter1, 2) == round(subjectData$Parameter1_Condition1[i], 2) & round(freeParameters$Parameter2, 2) == round(subjectData$Parameter2_Condition1[i], 2))]
subjectData$Strategy_Condition2[i] = freeParameters$Strategy[which(round(freeParameters$Parameter1, 2) == round(subjectData$Parameter1_Condition2[i], 2) & round(freeParameters$Parameter2, 2) == round(subjectData$Parameter2_Condition2[i], 2))]
subjectData$Strategy_Condition3[i] = freeParameters$Strategy[which(round(freeParameters$Parameter1, 2) == round(subjectData$Parameter1_Condition3[i], 2) & round(freeParameters$Parameter2, 2) == round(subjectData$Parameter2_Condition3[i], 2))]
subjectData$Strategy_Condition4[i] = freeParameters$Strategy[which(round(freeParameters$Parameter1, 2) == round(subjectData$Parameter1_Condition4[i], 2) & round(freeParameters$Parameter2, 2) == round(subjectData$Parameter2_Condition4[i], 2))]
}
conditions = c('condition1', 'condition2', 'condition3', 'condition4')
strategies = c('strategy1', 'strategy2', 'strategy3')
group_by_condition = data.frame()
columns = #columns of subjectData where Condition1-4 are kept
for (i in 1:4){
for (j in 1:3){
group_by_condition[i, j] = sum(subjectData[, columns[i]] == strategies[j])
}
}
colnames(group_by_condition) = strategies
rownames(group_by_condition) = conditions
chisq.test(group_by_condition)
subjectData.Strategy_Condition1 = cell(1, length(subjectData.SubjectID));
subjectData.Strategy_Condition2 = cell(1, length(subjectData.SubjectID));
subjectData.Strategy_Condition3 = cell(1, length(subjectData.SubjectID));
subjectData.Strategy_Condition4 = cell(1, length(subjectData.SubjectID));
for i = 1:length(subjectData.SubjectID)
subjectData.Strategy_Condition1{i} = freeParameters.Strategy(round(freeParameters.Parameter1, 2) == round(subjectData.Parameter1_Condition1(i), 2) & round(freeParameters.Parameter2, 2) == round(subjectData.Parameter2_Condition1(i), 2));
subjectData.Strategy_Condition2{i} = freeParameters.Strategy(round(freeParameters.Parameter1, 2) == round(subjectData.Parameter1_Condition2(i), 2) & round(freeParameters.Parameter2, 2) == round(subjectData.Parameter2_Condition2(i), 2));
subjectData.Strategy_Condition3{i} = freeParameters.Strategy(round(freeParameters.Parameter1, 2) == round(subjectData.Parameter1_Condition3(i), 2) & round(freeParameters.Parameter2, 2) == round(subjectData.Parameter2_Condition3(i), 2));
subjectData.Strategy_Condition4{i} = freeParameters.Strategy(round(freeParameters.Parameter1, 2) == round(subjectData.Parameter1_Condition4(i), 2) & round(freeParameters.Parameter2, 2) == round(subjectData.Parameter2_Condition4(i), 2));
end
conditions = {'condition1', 'condition2', 'condition3', 'condition4'};
strategies = {'strategy1', 'strategy2', 'strategy3'};
group_by_condition = zeros(4, 3);
columns = % columns of subjectData where Condition1-4 are kept
for i = 1:4
for j = 1:3
group_by_condition(i, j) = sum(strcmp(subjectData.(columns{i}), strategies{j}));
end
end
group_by_condition = array2table(group_by_condition, 'VariableNames', strategies, 'RowNames', conditions);
chisqtest(group_by_condition)
from scipy.stats import chi2_contingency
subjectData['Strategy_Condition1'] = [None] * len(subjectData['SubjectID'])
subjectData['Strategy_Condition2'] = [None] * len(subjectData['SubjectID'])
subjectData['Strategy_Condition3'] = [None] * len(subjectData['SubjectID'])
subjectData['Strategy_Condition4'] = [None] * len(subjectData['SubjectID'])
for i in range(len(subjectData['SubjectID'])):
subjectData.at[i, 'Strategy_Condition1'] = freeParameters['Strategy'][(np.round(freeParameters['Parameter1'], 2) == np.round(subjectData['Parameter1_Condition1'][i], 2)) & (np.round(freeParameters['Parameter2'], 2) == np.round(subjectData['Parameter2_Condition1'][i], 2))]
subjectData.at[i, 'Strategy_Condition2'] = freeParameters['Strategy'][(np.round(freeParameters['Parameter1'], 2) == np.round(subjectData['Parameter1_Condition2'][i], 2)) & (np.round(freeParameters['Parameter2'], 2) == np.round(subjectData['Parameter2_Condition2'][i], 2))]
subjectData.at[i, 'Strategy_Condition3'] = freeParameters['Strategy'][(np.round(freeParameters['Parameter1'], 2) == np.round(subjectData['Parameter1_Condition3'][i], 2)) & (np.round(freeParameters['Parameter2'], 2) == np.round(subjectData['Parameter2_Condition3'][i], 2))]
subjectData.at[i, 'Strategy_Condition4'] = freeParameters['Strategy'][(np.round(freeParameters['Parameter1'], 2) == np.round(subjectData['Parameter1_Condition4'][i], 2)) & (np.round(freeParameters['Parameter2'], 2) == np.round(subjectData['Parameter2_Condition4'][i], 2))]
conditions = ['condition1', 'condition2', 'condition3', 'condition4']
strategies = ['strategy1', 'strategy2', 'strategy3']
group_by_condition = pd.DataFrame(0, index=conditions, columns=strategies)
columns = [] # columns of subjectData where Condition1-4 are kept
for i in range(4):
for j in range(3):
group_by_condition.iloc[i, j] = np.sum(subjectData[columns[i]] == strategies[j])
chi2, p, _, _ = chi2_contingency(group_by_condition)
Tutorials
Tutorial 1 - van Baar, Chang, & Sanfey, 2019
Not Applicable to this Data.
Tutorial 2 - Galvan & Sanfey, 2024
Preliminary Validation
Theta is a candidate for such an analysis - it is estimated uniquely with respect to Phi and we did not see any significant violations of homodscedasticity in our data.
Phi, on the other hand, is estimated depednent on Theta - remember that both appearances of Phi in our utility equation are where it interacts with Theta. Thus, we have to account for the fact that Phi is not meaningful at high values of Theta in order to use it in this way.
### Robust and Reliable Free Parameters
subjectData$phiMeritAdjusted = (subjectData$phiMerit - 0.5) #subtracting 0.5 puts 0 where phi weights on equality-seeking and equity-seeking in the same way (i.e. indifference point)
subjectData$phiMeritAdjusted = (subjectData$phiMeritAdjusted * (1 - subjectData$thetaMerit)) #the greater theta is, the closer to 0 this adjusted value is
subjectData$phiMeritAdjusted = subjectData$phiMeritAdjusted + 0.5 #returning to initial scale value (i.e. 0 to 1 with 0.5 beingthe indifference point)
subjectData$phiEntitlementAdjusted = ((subjectData$phiEntitlement - 0.5) * (1 - subjectData$thetaEntitlement)) + 0.5
subjectData$phiCorruptionAdjusted = ((subjectData$phiCorruption - 0.5) * (1 - subjectData$thetaCorruption)) + 0.5
subjectData$phiLuckAdjusted = ((subjectData$phiLuck - 0.5) * (1 - subjectData$thetaLuck)) + 0.5
### Meaningful Condition Differences
subjectData$SSAAO = vector('numeric')
for (i in 1:length(included_subjects)){
datafile = paste(parentfolder, included_subjects[i], restoffilepath, sep = '') # produces a character vector 'parentfolder/included_subjects[i]**.filetype'
df = read.csv2(datafile)
df = df[, c(49, 40:48, 33)]
df$redistributionRate = df$redistributionRate/100 #converting to a decimal from a percent
result = fmincon(obj_function,x0 = initial_params, A = NULL, b = NULL, Aeq = NULL, beq = NULL,
lb = lower_bounds, ub = upper_bounds,
df = df)
df$PredictionAAO = vector('numeric')
df$ObservedOutcome = new_value(df$myself, df$redistributionRate)
for (k in 1:length(df$redistributionRate)){
Utility = vector('numeric', length(Choices))
for (n in 1:length(Choices)){
Utility[n] = utility(theta = result,
phi = phiPerCondition[j],
Equity = equity(new_value(df[k, 1:10], choices[n]), df[k, 1:10], choices[n]),
Equality = equality(new_value(df[k, 1:10], choices[n]), df[k, 1:10], choices[n]),
Payout = payout(new_value(df[k, 1], choices[n]), df[k, 1], choices[n]))
}
correct_choice = which(Utility == max(Utility))
df$PredictionAAO[k] = new_value(df$myself[k], Choices[correct_choice[sample(length(correct_choice), 1)]])
}
subjectData$SSAAO[i] = sum((df$PredictionAAO - df$ObservedOutcome)**2)
}
N = length(df[, 1])
k = 2
subjectData$AICAAO = N * log(subjectData$SSAAO/N) + 2*2
subjectData$SSCond = subjectData$SSMerit + subjectData$SSEntitlement + subjectData$SSCorruption + subjectData$SSLuck
subjectData$AICCond = N * log(subjectData$SSCond/N) + 2*2*length(conditions)
excluded = which(subjectData$SSCond == 0)
t.test(subjectData$AICCond[-excluded], subjectData$AICAAO, paired = T, alternative = 'less')
% Robust and Reliable Free Parameters
subjectData.phiMeritAdjusted = (subjectData.phiMerit - 0.5); % subtracting 0.5 puts 0 where phi weights on equality-seeking and equity-seeking in the same way (i.e. indifference point)
subjectData.phiMeritAdjusted = subjectData.phiMeritAdjusted * (1 - subjectData.thetaMerit); % the greater theta is, the closer to 0 this adjusted value is
subjectData.phiMeritAdjusted = subjectData.phiMeritAdjusted + 0.5; % returning to initial scale value (i.e. 0 to 1 with 0.5 being the indifference point)
subjectData.phiEntitlementAdjusted = ((subjectData.phiEntitlement - 0.5) * (1 - subjectData.thetaEntitlement)) + 0.5;
subjectData.phiCorruptionAdjusted = ((subjectData.phiCorruption - 0.5) * (1 - subjectData.thetaCorruption)) + 0.5;
subjectData.phiLuckAdjusted = ((subjectData.phiLuck - 0.5) * (1 - subjectData.thetaLuck)) + 0.5;
% Meaningful Condition Differences
subjectData.SSAAO = zeros(1, length(included_subjects));
for i = 1:length(included_subjects)
datafile = strcat(parentfolder, included_subjects{i}, restoffilepath); % produces a character vector 'parentfolder/included_subjects[i]**.filetype'
df = readtable(datafile);
df = df(:, [49, 40:48, 33]);
df.redistributionRate = df.redistributionRate / 100; % converting to a decimal from a percent
result = fmincon(@(params) obj_function(params, df), initial_params, [], [], [], [], lower_bounds, upper_bounds);
df.PredictionAAO = zeros(height(df), 1);
df.ObservedOutcome = new_value(df.myself, df.redistributionRate);
for k = 1:height(df.redistributionRate)
Utility = zeros(1, length(Choices));
for n = 1:length(Choices)
Utility(n) = utility(result, phiPerCondition(j), ...
equity(new_value(table2array(df(k, 1:10)), Choices(n)), table2array(df(k, 1:10)), Choices(n)), ...
equality(new_value(table2array(df(k, 1:10)), Choices(n)), table2array(df(k, 1:10)), Choices(n)), ...
payout(new_value(table2array(df(k, 1)), Choices(n)), table2array(df(k, 1)), Choices(n)));
}
correct_choice = find(Utility == max(Utility));
df.PredictionAAO(k) = new_value(df.myself(k), Choices(correct_choice(randi(length(correct_choice)))));
}
subjectData.SSAAO(i) = sum((df.PredictionAAO - df.ObservedOutcome).^2);
end
N = height(df);
k = 2;
subjectData.AICAAO = N * log(subjectData.SSAAO / N) + 2 * 2;
subjectData.SSCond = subjectData.SSMerit + subjectData.SSEntitlement + subjectData.SSCorruption + subjectData.SSLuck;
subjectData.AICCond = N * log(subjectData.SSCond / N) + 2 * 2 * length(conditions);
excluded = find(subjectData.SSCond == 0);
ttest(subjectData.AICCond(~excluded), subjectData.AICAAO, 'Tail', 'left', 'Alpha', alpha);
# Robust and Reliable Free Parameters
subjectData['phiMeritAdjusted'] = (subjectData['phiMerit'] - 0.5) # subtracting 0.5 puts 0 where phi weights on equality-seeking and equity-seeking in the same way (i.e. indifference point)
subjectData['phiMeritAdjusted'] = subjectData['phiMeritAdjusted'] * (1 - subjectData['thetaMerit']) # the greater theta is, the closer to 0 this adjusted value is
subjectData['phiMeritAdjusted'] = subjectData['phiMeritAdjusted'] + 0.5 # returning to initial scale value (i.e. 0 to 1 with 0.5 being the indifference point)
subjectData['phiEntitlementAdjusted'] = ((subjectData['phiEntitlement'] - 0.5) * (1 - subjectData['thetaEntitlement'])) + 0.5
subjectData['phiCorruptionAdjusted'] = ((subjectData['phiCorruption'] - 0.5) * (1 - subjectData['thetaCorruption'])) + 0.5
subjectData['phiLuckAdjusted'] = ((subjectData['phiLuck'] - 0.5) * (1 - subjectData['thetaLuck'])) + 0.5
# Meaningful Condition Differences
subjectData['SSAAO'] = np.zeros(len(included_subjects))
for i in range(len(included_subjects)):
datafile = parentfolder + included_subjects[i] + restoffilepath # produces a character vector 'parentfolder/included_subjects[i]**.filetype'
df = pd.read_csv(datafile)
df = df.iloc[:, [48, 39, 40, 41, 42, 43, 44, 45, 46, 47, 32]]
df['redistributionRate'] = df['redistributionRate'] / 100 # converting to a decimal from a percent
result = minimize(lambda params: obj_function(params, df), initial_params, bounds=list(zip(lower_bounds, upper_bounds)))
df['PredictionAAO'] = np.zeros(len(df))
df['ObservedOutcome'] = new_value(df['myself'], df['redistributionRate'])
for k in range(len(df['redistributionRate'])):
Utility = np.zeros(len(Choices))
for n in range(len(Choices)):
Utility[n] = utility(result.x, phiPerCondition[j],
equity(new_value(df.iloc[k, 0:9], Choices[n]), df.iloc[k, 0:9], Choices[n]),
equality(new_value(df.iloc[k, 0:10], Choices[n]), df.iloc[k, 0:9], Choices[n]),
payout(new_value(df.iloc[k, 0], Choices[n]), df.iloc[k, 0], Choices[n]))
correct_choice = np.argmax(Utility)
df.at[k, 'PredictionAAO'] = new_value(df['myself'][k], Choices[correct_choice[np.random.randint(len(correct_choice))]])
subjectData['SSAAO'][i] = np.sum((df['PredictionAAO'] - df['ObservedOutcome'])**2)
N = len(df)
k = 2
subjectData['AICAAO'] = N * np.log(subjectData['SSAAO'] / N) + 2 * 2
subjectData['SSCond'] = subjectData['SSMerit'] + subjectData['SSEntitlement'] + subjectData['SSCorruption'] + subjectData['SSLuck']
subjectData['AICCond'] = N * np.log(subjectData['SSCond'] / N) + 2 * 2 * len(conditions)
excluded = np.where(subjectData['SSCond'] == 0)[0]
ttest_rel(subjectData['AICCond'][~excluded], subjectData['AICAAO'], alternative='less', nan_policy='omit')
Testing a Modulatory Hypothesis
omnibusData = data.frame(Theta = c(subjectData$thetaMerit, subjectData$thetaEntitlement, subjectData$thetaCorruption, subjectData$thetaLuck),
Phi = c(subjectData$phiMeritAdjusted, subjectData$phiEntitlementAdjusted, subjectData$phiCorruptionAdjusted, subjectData$phiLuckAdjusted),
Condition = rep(conditions, each = length(subjectData$SubjectID)),
SubjectID = rep(subjectData$SubjectID, times = length(conditions)))
library(lme4)
omnibusModulatoryEffectTheta = lmer(data = omnibusData, Theta ~ Condition + (1 | SubjectID))
omnibusModulatoryEffectPhi = lmer(data = omnibusData, Phi ~ Condition + (1 | SubjectID))
Theta = [subjectData.thetaMerit; subjectData.thetaEntitlement; subjectData.thetaCorruption; subjectData.thetaLuck];
Phi = [subjectData.phiMeritAdjusted; subjectData.phiEntitlementAdjusted; subjectData.phiCorruptionAdjusted; subjectData.phiLuckAdjusted];
Conditions = repmat(conditions', length(subjectData.SubjectID), 1);
SubjectID = repmat(subjectData.SubjectID, length(conditions), 1);
omnibusData = table(Theta, Phi, Conditions, SubjectID);
omnibusModulatoryEffectTheta = fitlme(omnibusData, 'Theta ~ Condition + (1|SubjectID)');
omnibusModulatoryEffectPhi = fitlme(omnibusData, 'Phi ~ Condition + (1|SubjectID)');
# Requires 'statsmodels' - you can ``pip install statsmodels u``
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
# Assuming omnibusData is a pandas DataFrame
omnibusData = pd.DataFrame({
'Theta': np.concatenate([subjectData['thetaMerit'], subjectData['thetaEntitlement'], subjectData['thetaCorruption'], subjectData['thetaLuck']]),
'Phi': np.concatenate([subjectData['phiMeritAdjusted'], subjectData['phiEntitlementAdjusted'], subjectData['phiCorruptionAdjusted'], subjectData['phiLuckAdjusted']]),
'Condition': np.tile(conditions, len(subjectData['SubjectID'])),
'SubjectID': np.repeat(subjectData['SubjectID'], len(conditions))
})
# Fit mixed-effects models
omnibusModulatoryEffectTheta = smf.mixedlm("Theta ~ Condition", omnibusData, groups=omnibusData["SubjectID"]).fit()
omnibusModulatoryEffectPhi = smf.mixedlm("Phi ~ Condition", omnibusData, groups=omnibusData["SubjectID"]).fit()
# Print model summaries
print(omnibusModulatoryEffectTheta.summary())
print(omnibusModulatoryEffectPhi.summary())
Using Categorical Clusters to Test Modulatory Hypotheses
subjectData$strategyMerit = as.factor(subjectData$strategyMerit)
subjectData$strategyEntitlement = as.factor(subjectData$strategyEntitlement)
subjectData$strategyCorruption = as.factor(subjectData$strategyCorruption)
subjectData$strategyLuck = as.factor(subjectData$strategyLuck)
strategies = levels(subjectData)
columns = 10:13 #columns of subject data correspond, in order, to the conditions in the conditions vector
for (i in 1:length(conditions)){
for (j in 1:length(strategies)){
group_by_condition[i, j] = sum(subjectData[, columns[i]] == strategies[j])
}
}
colnames(group_by_condition) = strategies
rownames(group_by_condition) = conditions
chisq.test(group_by_condition)
% Assuming subjectData is a table with columns strategyMerit, strategyEntitlement, strategyCorruption, strategyLuck,
% and conditions is a cell array containing condition names
subjectData.strategyMerit = categorical(subjectData.strategyMerit);
subjectData.strategyEntitlement = categorical(subjectData.strategyEntitlement);
subjectData.strategyCorruption = categorical(subjectData.strategyCorruption);
subjectData.strategyLuck = categorical(subjectData.strategyLuck);
strategies = categories(subjectData.strategyMerit); % Assuming all strategy columns have the same set of levels
columns = 10:13; % columns of subject data correspond, in order, to the conditions in the conditions vector
group_by_condition = zeros(length(conditions), length(strategies));
for i = 1:length(conditions)
for j = 1:length(strategies)
group_by_condition(i, j) = sum(subjectData{:, columns(i)} == strategies{j});
end
end
group_by_condition_table = array2table(group_by_condition, 'VariableNames', strategies, 'RowNames', conditions);
[p, chi2stat, ~, ~] = chi2test(group_by_condition);
from scipy.stats import chi2_contingency
# Assuming subjectData is a pandas DataFrame
subjectData['strategyMerit'] = subjectData['strategyMerit'].astype('category')
subjectData['strategyEntitlement'] = subjectData['strategyEntitlement'].astype('category')
subjectData['strategyCorruption'] = subjectData['strategyCorruption'].astype('category')
subjectData['strategyLuck'] = subjectData['strategyLuck'].astype('category')
strategies = subjectData['strategyMerit'].cat.categories # Assuming all strategy columns have the same set of levels
columns = list(range(9, 13)) # columns of subject data correspond, in order, to the conditions in the conditions vector
group_by_condition = np.zeros((len(conditions), len(strategies)))
for i, condition in enumerate(conditions):
for j, strategy in enumerate(strategies):
group_by_condition[i, j] = np.sum(subjectData.iloc[:, columns[i]] == strategy)
group_by_condition = pd.DataFrame(group_by_condition, index=conditions, columns=strategies)
chi2, p, _, _ = chi2_contingency(group_by_condition)
print("Chi-squared statistic:", chi2)
print("P-value:", p)
Tutorial 3 - Crockett et al., 2014
Testing a Modulatory Hypothesis
t.test(as.numeric(subjectData$Kappa_Self[-20]), as.numeric(subjectData$Kappa_Other[-20]), paired = T)
mean(as.numeric(subjectData$Kappa_Self[-20]))
mean(as.numeric(subjectData$Kappa_Other[-20]))
Tutorial 4 - Li et al., 2022
Not Applicable to this Data.