Test Modulatory Hypotheses ********** .. _homoscedasticity: https://social-utility-modeling.readthedocs.io/en/latest/2_4_0.html .. _here: https://epgalvan.github.io/integration-simulations/ Lesson ================ .. article-info:: :avatar: UCLA_Suit.jpg :avatar-link: https://www.decisionneurosciencelab.com/elijahgalvan :author: Elijah Galvan :date: September 1, 2023 :read-time: 15 min read :class-container: sd-p-2 sd-outline-muted sd-rounded-1 Goal During this Stage --------------- To see how :bdg-primary:`Conditions` affect :bdg-success:`Free Parameters`. .. Note:: In experiments where you use multiple :bdg-primary:`Conditions`, you will end up recovering :bdg-success:`Free Parameters` for seperate :bdg-primary:`Conditions`. Thus, you may wish to use these :bdg-success:`Free Parameters` as a proxy for how preferences change as a function of your :bdg-primary:`Condition` - these would be Modulatory Hypotheses (i.e. how :bdg-primary:`Conditions` modulate the data generation process). Why Fitting Per Condition is Less Preferrable ----------------- Although there are definitely use cases for fitting a model over multiple different :bdg-primary:`Conditions` (what we will learn below), when possible it is typically better to integrate multiple :bdg-primary:`Conditions` within the same model. Namely, when it is believed that :bdg-primary:`Conditions` will modulate individuals' :bdg-success:`Preferences` **and there are more than two** :bdg-primary:`Conditions`, I believe that models which internally hypothesize *how* these :bdg-success:`Preferences` change across :bdg-primary:`Conditions` are superior. In general, there are three reasons that it is not desirable to **even consider** models fit separately per :bdg-primary:`Condition` when there are more than two :bdg-primary:`Conditions`. .. dropdown:: Inability to Correctly Identify the Correct Model Simulations from a hypothetical Dictator Game experiment are presented `here`_ which test the ability of model comparison to correctly identify the model that generated the data. A model split over :bdg-primary:`Conditions` often outperforms the model that actually generated the data *even though it is more parsimonious* under several different circumstances. This means, depending on (a) the assumptions of your :bdg-success:`Parameter` estimation process, (b) the features of your data, and (c) the power in your sample, including a per condition model in your model set may obfuscate the real model. .. dropdown:: Lack of Predictive Value We run experiments to gain insights which, we hope, generalize beyond the laboratory. One key advantage of computational models is that they enable us to generate novel predictions based on known :bdg-success:`Preferences` to new situations within the laboratory - moving us closer to this goal of external validation. However, for models fit seperately per :bdg-primary:`Condition`, if we demonstrate that this model is superior via model testing what we have also demonstrated is that :bdg-success:`Free Parameters` do not generalize to new contexts. Thus, the model has no predictive value in new :bdg-primary:`Conditions` which further undermines its' generalizability. .. dropdown:: Theoretical Deficits Similar to its lack of predictive value, a model fit separately between :bdg-primary:`Conditions` misses an opportunity to hone theoretical knowledge. A key reason that we assess how :bdg-success:`Preferences` change between :bdg-primary:`Conditions` is typically because we hypothesize that :bdg-primary:`Conditions` are distinct in articulable ways. Testing these hypothesized differences is a key advantage of computational modeling, but a model fit separately between :bdg-primary:`Conditions` does not enable this hypothesis to be tested. How to Achieve this Goal ------------ .. dropdown:: Preliminary Validation Before you jump to testing, there are also a few more new things that we have to check. Let's take a look at each. .. dropdown:: Robust and Reliable :bdg-success:`Free Parameters` As we said before, we have to prove that our recovery of :bdg-success:`Free Parameters` is robust. This is a step above what we previously did - essentially, fivefold validation allowed us to rule out the idea that our :bdg-success:`Free Parameters` were overfitted meaning that our model wasn't performing so well because it was just capturing little quirks in the data. We need to take this a step further to use these directly in statistical analyses: we need to show that treating recovered :bdg-success:`Free Parameters` as a continuous scale measure is appropriate. If the following are false, you should be okay to proceed: 1. Your utility equation applies a nonlinear transformation to your :bdg-success:`Free Parameters` - this means that an increase in one unit of your :bdg-success:`Free Parameter` scale is not equal for all values of the :bdg-success:`Free Parameter` so this analysis is probably inappropriate 2. The recovery of your :bdg-success:`Free Parameter` that you want to test is independent of the other :bdg-success:`Free Parameters` in your model - if your :bdg-success:`Free Parameter` values only interact with other :bdg-success:`Free Parameters` you will have to apply a transformation to account for this dependency (see tutorial 2 for an example of this) 3. Remember that assumption of `homoscedasticity`_ that we said wasn't super important? Well, now it is. If your data is heteroscedastic, recovery of :bdg-success:`Free Parameters` could be differentially overfit or underfit at certain values of the :bdg-primary:`Independent Variable` which makes these :bdg-success:`Free Parameters` unreliable. You will have to re-estimate your :bdg-success:`Free Parameters` using an alternative estimator (i.e. Robust Maximum Likelihood Estimation or Weighted Least Squares) .. dropdown:: Meaningful :bdg-primary:`Condition` Differences .. tab-set:: .. tab-item:: Plain English So you've now shown that your :bdg-success:`Free Parameters` are robust and reliable - what's left to do other than test? Something really important actually: you have to prove that you are even justified in recovering different :bdg-success:`Free Parameters` in each :bdg-primary:`Condition`. Even if you show that :bdg-success:`Free Parameters` are meaningfully different across :bdg-primary:`Conditions`, the test results are not valid if you have not proven that the :bdg-danger:`Decisions` that :bdg-success:`Subjects` make differ between :bdg-primary:`Conditions`. So, we're going to go back and create a model which does not differentiate between :bdg-primary:`Conditions` - training all of the data at once. Since our demo did not have a design with multiple :bdg-primary:`Conditions`, we'll create a complete example here. .. tab-item:: R :: obj_function_aao = function(params, decisions, method = "OLS") { Parameter1 = params[1] Parameter2 = params[2] trialList = #must redefine and also must be of the same length as decisions predicted_utility = vector('numeric', length(trialList[,1])) observed_utility = vector('numeric', length(trialList[,1])) for (k in 1:length(trialList[,1])){ IV = trialList[k, 1] Constant = trialList[k, 2] Choices = #something Utility = vector('numeric', length(Choices)) for (n in 1:length(Choices)){ Utility[n] = utility(Parameter1, Parameter2, construct1(IV, Constant, Choices[n]), construct2(IV, Constant, Choices[n]), construct3(IV, Constant, Choices[n])) } predicted_utility[k] = max(Utility) observed_utility[k] = Utility[chosen[k]] } if (method == "OLS"){ return(sum((predicted_utility - observed_utility)**2)) } else if (method == "MLE"){ return(-1 * sum(dnorm(observed_utility, mean = predicted_utility, sd = sd, log = TRUE))) } } for (i in 1:length(included_subjects)){ datafile = paste(parentfolder, included_subjects[i], restoffilepath, sep = '') # produces a character vector 'parentfolder/included_subjects[i]**.filetype' df = read.csv2(datafile) #this will have variables called IV, Decisions, Condition, and information about the original order of trials (i.e. trialsTask.thisIndex) - it will also have information about the number of blocks df$Prediction = vector('numeric', length(df$IV)) Par1_PerCondition = vector('numeric', length(levels(df$Condition))) Par2_PerCondition = vector('numeric', length(levels(df$Condition))) SS_PerCondition = vector('numeric', length(levels(df$Condition))) Deviance_PerCondition = vector('numeric', length(levels(df$Condition))) #to calculate NLL later for (c in 1:length(levels(df$Condition))){ result = fmincon(obj_function,x0 = initial_params, A = NULL, b = NULL, Aeq = NULL, beq = NULL, lb = lower_bounds, ub = upper_bounds, df = df) #Just Added closestPoint = which(as.numeric(freeParameters[,1]) == as.numeric(round(result$par[1])) & as.numeric(freeParameters[,2]) == as.numeric(round(result$par[2]))) Prediction = vector('numeric') for (k in 1:length(df$Decisions)){ Utility = vector('numeric', length(Choices)) for (n in 1:length(Choices)){ Utility[n] = utility(parameter1 = results$par[1], parameter2 = results$par[2], construct1 = construct1(df$IV[k], df$Constant[k], Choices[n]), construct2 = construct2(df$IV[k], df$Constant[k], Choices[n])), construct3 = construct3(df$IV[k], df$Constant[k], Choices[n]) } correct_choice = which(Utility == max(Utility)) if (length(correct_choice) > 1){ correct_choice = correct_choice[sample(correct_choice, 1)] } Prediction[k] = Choices[correct_choice] } Deviance_PerCondition[c] = dnorm(df$Decision, mean = Prediction) SS_PerCondition[c] = sum((df$Decision - Prediction)**2) df$Prediction[which(df$Condition == levels(df$Condition)[c])] = Prediction } NLL_PerCondition = -2 * log(sum(Deviance_PerCondition)) result = fmincon(obj_function_aao,x0 = initial_params, A = NULL, b = NULL, Aeq = NULL, beq = NULL, lb = lower_bounds, ub = upper_bounds, decisions = df$Decisions) df$PredictionAAO = vector('numeric') for (k in 1:length(df$Decisions)){ Utility = vector('numeric', length(Choices)) for (n in 1:length(Choices)){ Utility[n] = utility(parameter1 = results$par[1], parameter2 = results$par[2], construct1 = construct1(df$IV[k], df$Constant[k], Choices[n]), construct2 = construct2(df$IV[k], df$Constant[k], Choices[n])), construct3 = construct3(df$IV[k], df$Constant[k], Choices[n]) } correct_choice = which(Utility == max(Utility)) if (length(correct_choice) > 1){ correct_choice = correct_choice[sample(correct_choice, 1)] } df$PredictionAAO[k] = Choices[correct_choice] } NLL_AAO = -2 * log(sum(dnorm(df$Decision, mean = df$Prediction))) SS_AAO = sum((df$Decision - df$Prediction)**2) subjectData[i, ] = c(included_subjects[i], sum(SS_PerCondition), NLL_PerCondition, SS_AAO, NLL_AAO, Par1_PerCondition, Par2_PerCondition, result$par[1], result$par[2]) start = length(subjectData[, 1]) + 1 end = start + length(df$Decisions) trialData[start:end, 1] = included_subjects[i] trialData[start:end, 2] = df$IV trialData[start:end, 3] = df$Constant trialData[start:end, 4] = df$Decision trialData[start:end, 5] = df$Condition trialData[start:end, 6] = df$Prediction trialData[start:end, 7] = df$PredictionAAO } colnames(subjectData) = c('SubjectID', 'modelSS_PerCondition', 'modelNLL_PerCondition', 'modelSS_AllAtOnce', 'modelNLL_AllAtOnce', 'Parameter1_Condition1', ..., 'Parameter2_Condition1', ..., 'Parameter1_AllAtOnce', 'Parameter2_AllAtOnce') #levels(df$Condition) will always be in the same order for all subjects so conditions will be saved in the same columns colnames(trailData) = c('SubjectID', 'IV', 'Constant', 'Decision', 'Prediction_PerCondition', 'Prediction_AllAtOnce') subjectData$AIC_PerCondition = length(df$IV) * log(subjectData$SS_PerCondition/length(df$IV)) + 2 * 2 * (length(levels(df$Condition))) subjectData$AIC_AllAtOnce = length(df$IV) * log(subjectData$SS_AllAtOnce/length(df$IV)) + 2 * 2 * (length(levels(df$Condition))) t.test(subjectData$AIC_PerCondition, subjectData$AIC_AllAtOnce, paired = T) .. tab-item:: MatLab :: function obj_function_aao = obj_function(params, decisions, method) Parameter1 = params(1); Parameter2 = params(2); trialList = % must redefine and also must be of the same length as decisions predicted_utility = zeros(1, length(trialList(:, 1))); observed_utility = zeros(1, length(trialList(:, 1))); for k = 1:length(trialList(:, 1)) IV = trialList(k, 1); Constant = trialList(k, 2); Choices = % something Utility = zeros(1, length(Choices)); for n = 1:length(Choices) Utility(n) = utility(Parameter1, Parameter2, construct1(IV, Constant, Choices(n)), construct2(IV, Constant, Choices(n)), construct3(IV, Constant, Choices(n))); end predicted_utility(k) = max(Utility); observed_utility(k) = Utility(chosen(k)); end if strcmp(method, 'OLS') obj_function_aao = sum((predicted_utility - observed_utility).^2); elseif strcmp(method, 'MLE') obj_function_aao = -1 * sum(log(normpdf(observed_utility, predicted_utility, sd))); end end for i = 1:length(included_subjects) datafile = strcat(parentfolder, included_subjects(i), restoffilepath); df = readtable(datafile); df.Prediction = zeros(1, length(df.IV)); Par1_PerCondition = zeros(1, length(unique(df.Condition))); Par2_PerCondition = zeros(1, length(unique(df.Condition))); SS_PerCondition = zeros(1, length(unique(df.Condition))); Deviance_PerCondition = zeros(1, length(unique(df.Condition))); for c = 1:length(unique(df.Condition)) result = fmincon(@obj_function, initial_params, [], [], [], [], lower_bounds, upper_bounds, df); closestPoint = find(freeParameters(:, 1) == round(result(1)) & freeParameters(:, 2) == round(result(2))); Prediction = zeros(1, length(df.Decisions)); for k = 1:length(df.Decisions) Utility = zeros(1, length(Choices)); for n = 1:length(Choices) Utility(n) = utility(result(1), result(2), construct1(df.IV(k), df.Constant(k), Choices(n)), construct2(df.IV(k), df.Constant(k), Choices(n)), construct3(df.IV(k), df.Constant(k), Choices(n))); end correct_choice = find(Utility == max(Utility)); if length(correct_choice) > 1 correct_choice = correct_choice(randi(length(correct_choice))); end Prediction(k) = Choices(correct_choice); end Deviance_PerCondition(c) = normpdf(df.Decision, Prediction); SS_PerCondition(c) = sum((df.Decision - Prediction).^2); df.Prediction(df.Condition == unique(df.Condition)(c)) = Prediction; end NLL_PerCondition = -2 * sum(log(Deviance_PerCondition)); result = fmincon(@obj_function_aao, initial_params, [], [], [], [], lower_bounds, upper_bounds, df.Decisions); df.PredictionAAO = zeros(1, length(df.Decisions)); for k = 1:length(df.Decisions) Utility = zeros(1, length(Choices)); for n = 1:length(Choices) Utility(n) = utility(result(1), result(2), construct1(df.IV(k), df.Constant(k), Choices(n)), construct2(df.IV(k), df.Constant(k), Choices(n)), construct3(df.IV(k), df.Constant(k), Choices(n))); end correct_choice = find(Utility == max(Utility)); if length(correct_choice) > 1 correct_choice = correct_choice(randi(length(correct_choice))); end df.PredictionAAO(k) = Choices(correct_choice); end NLL_AAO = -2 * sum(log(normpdf(df.Decision, df.Prediction))); SS_AAO = sum((df.Decision - df.Prediction).^2); subjectData(i, :) = [included_subjects(i), sum(SS_PerCondition), NLL_PerCondition, SS_AAO, NLL_AAO, Par1_PerCondition, Par2_PerCondition, result(1), result(2)]; start = size(subjectData, 1) + 1; endIdx = start + length(df.Decisions) - 1; trialData(start:endIdx, 1) = included_subjects(i); trialData(start:endIdx, 2) = df.IV; trialData(start:endIdx, 3) = df.Constant; trialData(start:endIdx, 4) = df.Decision; trialData(start:endIdx, 5) = df.Condition; trialData(start:endIdx, 6) = df.Prediction; trialData(start:endIdx, 7) = df.PredictionAAO; end subjectData.Properties.VariableNames = {'SubjectID', 'modelSS_PerCondition', 'modelNLL_PerCondition', 'modelSS_AllAtOnce', 'modelNLL_AllAtOnce', 'Parameter1_Condition1', 'Parameter2_Condition1', 'Parameter1_Condition2', 'Parameter2_Condition2', 'Parameter1_AllAtOnce', 'Parameter2_AllAtOnce'}; trailData.Properties.VariableNames = {'SubjectID', 'IV', 'Constant', 'Decision', 'Prediction_PerCondition', 'Prediction_AllAtOnce'}; subjectData.AIC_PerCondition = length(df.IV) * log(subjectData.modelSS_PerCondition/length(df.IV)) + 2 * 2 * length(unique(df.Condition)); subjectData.AIC_AllAtOnce = length(df.IV) * log(subjectData.modelSS_AllAtOnce/length(df.IV)) + 2 * 2 * length(unique(df.Condition)); ttest(subjectData.AIC_PerCondition, subjectData.AIC_AllAtOnce, 'Paired', true); .. tab-item:: Python :: def obj_function(params, decisions, method): Parameter1 = params[0] Parameter2 = params[1] trialList = # must redefine and also must be of the same length as decisions predicted_utility = np.zeros(len(trialList[:, 0])) observed_utility = np.zeros(len(trialList[:, 0])) for k in range(len(trialList[:, 0])): IV = trialList[k, 0] Constant = trialList[k, 1] Choices = # something Utility = np.zeros(len(Choices)) for n in range(len(Choices)): Utility[n] = utility(Parameter1, Parameter2, construct1(IV, Constant, Choices[n]), construct2(IV, Constant, Choices[n]), construct3(IV, Constant, Choices[n])) predicted_utility[k] = max(Utility) observed_utility[k] = Utility[chosen[k]] if method == 'OLS': return np.sum((predicted_utility - observed_utility)**2) elif method == 'MLE': return -1 * np.sum(np.log(norm.pdf(observed_utility, loc=predicted_utility, scale=sd))) for i in range(len(included_subjects)): datafile = parentfolder + included_subjects[i] + restoffilepath df = pd.read_csv(datafile) df['Prediction'] = np.zeros(len(df['IV'])) Par1_PerCondition = np.zeros(len(df['Condition'].unique())) Par2_PerCondition = np.zeros(len(df['Condition'].unique())) SS_PerCondition = np.zeros(len(df['Condition'].unique())) Deviance_PerCondition = np.zeros(len(df['Condition'].unique())) for c in range(len(df['Condition'].unique())): result = minimize(obj_function, initial_params, args=(df), bounds=list(zip(lower_bounds, upper_bounds))) closestPoint = np.where((freeParameters[:, 0] == round(result.x[0])) & (freeParameters[:, 1] == round(result.x[1]))) Prediction = np.zeros(len(df['Decisions'])) for k in range(len(df['Decisions'])): Utility = np.zeros(len(Choices)) for n in range(len(Choices)): Utility[n] = utility(result.x[0], result.x[1], construct1(df['IV'][k], df['Constant'][k], Choices[n]), construct2(df['IV'][k], df['Constant'][k], Choices[n]), construct3(df['IV'][k], df['Constant'][k], Choices[n])) correct_choice = np.where(Utility == max(Utility)) if len(correct_choice) > 1: correct_choice = correct_choice[np.random.choice(len(correct_choice))] Prediction[k] = Choices[correct_choice[0]] Deviance_PerCondition[c] = norm.pdf(df['Decision'], Prediction) SS_PerCondition[c] = np.sum((df['Decision'] - Prediction)**2) df['Prediction'][df['Condition'] == df['Condition'].unique()[c]] = Prediction NLL_PerCondition = -2 * np.sum(np.log(Deviance_PerCondition)) result = minimize(obj_function_aao, initial_params, args=(df['Decisions'],), bounds=list(zip(lower_bounds, upper_bounds))) df['PredictionAAO'] = np.zeros(len(df['Decisions'])) for k in range(len(df['Decisions'])): Utility = np.zeros(len(Choices)) for n in range(len(Choices)): Utility[n] = utility(result.x[0], result.x[1], construct1(df['IV'][k], df['Constant'][k], Choices[n]), construct2(df['IV'][k], df['Constant'][k], Choices[n]), construct3(df['IV'][k], df['Constant'][k], Choices[n])) correct_choice = np.where(Utility == max(Utility)) if len(correct_choice) > 1: correct_choice = correct_choice[np.random.choice(len(correct_choice))] df['PredictionAAO'][k] = Choices[correct_choice[0]] NLL_AAO = -2 * np.sum(np.log(norm.pdf(df['Decision'], df['Prediction']))) SS_AAO = np.sum((df['Decision'] - df['Prediction'])**2) subjectData[i, :] = [included_subjects[i], np.sum(SS_PerCondition), NLL_PerCondition, SS_AAO, NLL_AAO, Par1_PerCondition, Par2_PerCondition, result.x[0], result.x[1]] start = subjectData.shape[0] + 1 endIdx = start + len(df['Decisions']) - 1 trialData[start:endIdx, 0] = included_subjects[i] trialData[start:endIdx, 1] = df['IV'] trialData[start:endIdx, 2] = df['Constant'] trialData[start:endIdx, 3] = df['Decision'] trialData[start:endIdx, 4] = df['Condition'] trialData[start:endIdx, 5] = df['Prediction'] trialData[start:endIdx, 6] = df['PredictionAAO'] subjectData.columns = ['SubjectID', 'modelSS_PerCondition', 'modelNLL_PerCondition', 'modelSS_AllAtOnce', 'modelNLL_AllAtOnce', 'Parameter1_Condition1', 'Parameter2_Condition1', 'Parameter1_Condition2', 'Parameter2_Condition2', 'Parameter1_AllAtOnce', 'Parameter2_AllAtOnce'] subjectData['AIC_PerCondition'] = len(df['IV']) * np.log(subjectData['modelSS_PerCondition'] / len(df['IV'])) + 2 * 2 * len(df['Condition'].unique()) subjectData['AIC_AllAtOnce'] = len(df['IV']) * np.log(subjectData['modelSS_AllAtOnce'] / len(df['IV'])) + 2 * 2 * len(df['Condition'].unique()) ttest_rel(subjectData['AIC_PerCondition'], subjectData['AIC_AllAtOnce']) .. dropdown:: Testing a Modulatory Hypothesis .. tab-set:: .. tab-item:: Plain English Now, if we've shown that people do indeed make different :bdg-danger:`Decisions` in each :bdg-primary:`Condition` and are convinced that our :bdg-success:`Free Parameters` can be trusted as a valid continuous measure of :bdg-success:`Subjects`' preferences, we can now test our modulatory hypotheses. To reiterate, these are hypotheses about directional, group-level differences: you've already shown that preferences change and now you want to show how they specifically change as a function of :bdg-primary:`Condition`. Occasionally, if you have more than two :bdg-primary:`Conditions`, you might also want to first do an omnibus test for :bdg-primary:`Condition` effects and then you might also do post-hoc tests after. We'll show you how to do both of these. .. tab-item:: R :: t.test(subjectData$Parameter1_Condition1, subjectData$Parameter1_Condition2, paired = T) #change in parameter 1 between condition 1 and 2 #now we need to put this data in long format to do a linear mixed effects model omnibusData = data.frame(c(subjectData$Parameter1_Condition1, subjectData$Parameter1_Condition2, subjectData$Parameter1_Condition3), rep(c('Condition 1', 'Condition 2', 'Condition 3'), each = length(subjectData$SubjectID)), rep(subjectData$SubjectID, times = 3)) colnames(omnibusData) = c('Parameter1', 'Condition', 'SubjectID') ombnibusModulatoryEffect = lmer(data = ombnibusData, Parameter1 ~ Condition + (1 | SubjectID)) #our omnnibus test summary(ombnibusModulatoryEffect) #if omnibus test is signficiant, proceed to post hoc tests below library(emmeans) Parameter1_PostHocModulationEffect = emmeans(ombnibusModulatoryEffect, "Condition") summary(pairs(Parameter1_PostHocModulationEffect)) #post hoc pairwise test across condition .. tab-item:: MatLab :: % t-test [h, p, ci, stats] = ttest(subjectData.Parameter1_Condition1, subjectData.Parameter1_Condition2, 'paired', true); % Put data in long format omnibusData = table([subjectData.Parameter1_Condition1; subjectData.Parameter1_Condition2; subjectData.Parameter1_Condition3], ... repelem({'Condition 1', 'Condition 2', 'Condition 3'}, length(subjectData.SubjectID)), ... repmat(subjectData.SubjectID, 1, 3), ... 'VariableNames', {'Parameter1', 'Condition', 'SubjectID'}); % Fit linear mixed effects model omnibusModulatoryEffect = fitlme(omnibusData, 'Parameter1 ~ Condition + (1|SubjectID)'); % Display summary disp(omnibusModulatoryEffect); % Perform post hoc tests Parameter1_PostHocModulationEffect = emmeans(omnibusModulatoryEffect, 'Condition'); disp(pairs(Parameter1_PostHocModulationEffect)); .. tab-item:: Python :: from statsmodels.formula.api import mixedlm from pingouin import pairwise_tukey # t-test t_stat, p_val = sm.stats.ttest_rel(subjectData['Parameter1_Condition1'], subjectData['Parameter1_Condition2']) # Put data in long format data_dict = { 'Parameter1': subjectData['Parameter1_Condition1'].append([subjectData['Parameter1_Condition2'], subjectData['Parameter1_Condition3']]), 'Condition': ['Condition 1', 'Condition 2', 'Condition 3'] * len(subjectData['SubjectID']), 'SubjectID': list(subjectData['SubjectID']) * 3 } omnibusData = pd.DataFrame(data_dict) # Fit linear mixed effects model omnibusModulatoryEffect = mixedlm('Parameter1 ~ Condition', omnibusData, groups=omnibusData['SubjectID']).fit() # Display summary print(omnibusModulatoryEffect.summary()) # Perform post hoc tests posthoc_results = pairwise_tukey(data=omnibusData, dv='Parameter1', between='Condition') print(posthoc_results) .. dropdown:: Using Categorical Clusters to Test Modulatory Hypotheses .. tab-set:: .. tab-item:: Plain English Sometimes, treating :bdg-success:`Free Parameters` as an outcome measure doesn't tell us everything that we want to know about the strategies that people use to make :bdg-danger:`Decisions`. This is one time where using a priori clustering can come in handy: rather than saying :bdg-success:`Free Parameter 1` changed a certain amount between :bdg-primary:`Condition 1` and :bdg-primary:`Condition 2` and that :bdg-success:`Free Parameter 2` changed a certain amount, we summarize behavioral patterns in a way that accounts for both at once and also tells us something about how the prevalence of each strategy changed betweeen :bdg-primary:`Conditions`. To do this, we want to use a Chi-Square Test. .. tab-item:: R :: subjectData$Strategy_Condition1 = vector('character', length(subjectData$SubjectID)) subjectData$Strategy_Condition2 = vector('character', length(subjectData$SubjectID)) subjectData$Strategy_Condition3 = vector('character', length(subjectData$SubjectID)) subjectData$Strategy_Condition4 = vector('character', length(subjectData$SubjectID)) for (i in 1:length(subjectData$SubjectID)){ subjectData$Strategy_Condition1[i] = freeParameters$Strategy[which(round(freeParameters$Parameter1, 2) == round(subjectData$Parameter1_Condition1[i], 2) & round(freeParameters$Parameter2, 2) == round(subjectData$Parameter2_Condition1[i], 2))] subjectData$Strategy_Condition2[i] = freeParameters$Strategy[which(round(freeParameters$Parameter1, 2) == round(subjectData$Parameter1_Condition2[i], 2) & round(freeParameters$Parameter2, 2) == round(subjectData$Parameter2_Condition2[i], 2))] subjectData$Strategy_Condition3[i] = freeParameters$Strategy[which(round(freeParameters$Parameter1, 2) == round(subjectData$Parameter1_Condition3[i], 2) & round(freeParameters$Parameter2, 2) == round(subjectData$Parameter2_Condition3[i], 2))] subjectData$Strategy_Condition4[i] = freeParameters$Strategy[which(round(freeParameters$Parameter1, 2) == round(subjectData$Parameter1_Condition4[i], 2) & round(freeParameters$Parameter2, 2) == round(subjectData$Parameter2_Condition4[i], 2))] } conditions = c('condition1', 'condition2', 'condition3', 'condition4') strategies = c('strategy1', 'strategy2', 'strategy3') group_by_condition = data.frame() columns = #columns of subjectData where Condition1-4 are kept for (i in 1:4){ for (j in 1:3){ group_by_condition[i, j] = sum(subjectData[, columns[i]] == strategies[j]) } } colnames(group_by_condition) = strategies rownames(group_by_condition) = conditions chisq.test(group_by_condition) .. tab-item:: MatLab :: subjectData.Strategy_Condition1 = cell(1, length(subjectData.SubjectID)); subjectData.Strategy_Condition2 = cell(1, length(subjectData.SubjectID)); subjectData.Strategy_Condition3 = cell(1, length(subjectData.SubjectID)); subjectData.Strategy_Condition4 = cell(1, length(subjectData.SubjectID)); for i = 1:length(subjectData.SubjectID) subjectData.Strategy_Condition1{i} = freeParameters.Strategy(round(freeParameters.Parameter1, 2) == round(subjectData.Parameter1_Condition1(i), 2) & round(freeParameters.Parameter2, 2) == round(subjectData.Parameter2_Condition1(i), 2)); subjectData.Strategy_Condition2{i} = freeParameters.Strategy(round(freeParameters.Parameter1, 2) == round(subjectData.Parameter1_Condition2(i), 2) & round(freeParameters.Parameter2, 2) == round(subjectData.Parameter2_Condition2(i), 2)); subjectData.Strategy_Condition3{i} = freeParameters.Strategy(round(freeParameters.Parameter1, 2) == round(subjectData.Parameter1_Condition3(i), 2) & round(freeParameters.Parameter2, 2) == round(subjectData.Parameter2_Condition3(i), 2)); subjectData.Strategy_Condition4{i} = freeParameters.Strategy(round(freeParameters.Parameter1, 2) == round(subjectData.Parameter1_Condition4(i), 2) & round(freeParameters.Parameter2, 2) == round(subjectData.Parameter2_Condition4(i), 2)); end conditions = {'condition1', 'condition2', 'condition3', 'condition4'}; strategies = {'strategy1', 'strategy2', 'strategy3'}; group_by_condition = zeros(4, 3); columns = % columns of subjectData where Condition1-4 are kept for i = 1:4 for j = 1:3 group_by_condition(i, j) = sum(strcmp(subjectData.(columns{i}), strategies{j})); end end group_by_condition = array2table(group_by_condition, 'VariableNames', strategies, 'RowNames', conditions); chisqtest(group_by_condition) .. tab-item:: Python :: from scipy.stats import chi2_contingency subjectData['Strategy_Condition1'] = [None] * len(subjectData['SubjectID']) subjectData['Strategy_Condition2'] = [None] * len(subjectData['SubjectID']) subjectData['Strategy_Condition3'] = [None] * len(subjectData['SubjectID']) subjectData['Strategy_Condition4'] = [None] * len(subjectData['SubjectID']) for i in range(len(subjectData['SubjectID'])): subjectData.at[i, 'Strategy_Condition1'] = freeParameters['Strategy'][(np.round(freeParameters['Parameter1'], 2) == np.round(subjectData['Parameter1_Condition1'][i], 2)) & (np.round(freeParameters['Parameter2'], 2) == np.round(subjectData['Parameter2_Condition1'][i], 2))] subjectData.at[i, 'Strategy_Condition2'] = freeParameters['Strategy'][(np.round(freeParameters['Parameter1'], 2) == np.round(subjectData['Parameter1_Condition2'][i], 2)) & (np.round(freeParameters['Parameter2'], 2) == np.round(subjectData['Parameter2_Condition2'][i], 2))] subjectData.at[i, 'Strategy_Condition3'] = freeParameters['Strategy'][(np.round(freeParameters['Parameter1'], 2) == np.round(subjectData['Parameter1_Condition3'][i], 2)) & (np.round(freeParameters['Parameter2'], 2) == np.round(subjectData['Parameter2_Condition3'][i], 2))] subjectData.at[i, 'Strategy_Condition4'] = freeParameters['Strategy'][(np.round(freeParameters['Parameter1'], 2) == np.round(subjectData['Parameter1_Condition4'][i], 2)) & (np.round(freeParameters['Parameter2'], 2) == np.round(subjectData['Parameter2_Condition4'][i], 2))] conditions = ['condition1', 'condition2', 'condition3', 'condition4'] strategies = ['strategy1', 'strategy2', 'strategy3'] group_by_condition = pd.DataFrame(0, index=conditions, columns=strategies) columns = [] # columns of subjectData where Condition1-4 are kept for i in range(4): for j in range(3): group_by_condition.iloc[i, j] = np.sum(subjectData[columns[i]] == strategies[j]) chi2, p, _, _ = chi2_contingency(group_by_condition) Tutorials ========== Tutorial 1 - van Baar, Chang, & Sanfey, 2019 ---------------------- Not Applicable to this Data. Tutorial 2 - Galvan & Sanfey, 2024 ------------------- .. dropdown:: Preliminary Validation Theta is a candidate for such an analysis - it is estimated uniquely with respect to Phi and we did not see any significant violations of homodscedasticity in our data. Phi, on the other hand, is estimated depednent on Theta - remember that both appearances of Phi in our utility equation are where it interacts with Theta. Thus, we have to account for the fact that Phi is not meaningful at high values of Theta in order to use it in this way. .. tab-set:: .. tab-item:: R :: ### Robust and Reliable Free Parameters subjectData$phiMeritAdjusted = (subjectData$phiMerit - 0.5) #subtracting 0.5 puts 0 where phi weights on equality-seeking and equity-seeking in the same way (i.e. indifference point) subjectData$phiMeritAdjusted = (subjectData$phiMeritAdjusted * (1 - subjectData$thetaMerit)) #the greater theta is, the closer to 0 this adjusted value is subjectData$phiMeritAdjusted = subjectData$phiMeritAdjusted + 0.5 #returning to initial scale value (i.e. 0 to 1 with 0.5 beingthe indifference point) subjectData$phiEntitlementAdjusted = ((subjectData$phiEntitlement - 0.5) * (1 - subjectData$thetaEntitlement)) + 0.5 subjectData$phiCorruptionAdjusted = ((subjectData$phiCorruption - 0.5) * (1 - subjectData$thetaCorruption)) + 0.5 subjectData$phiLuckAdjusted = ((subjectData$phiLuck - 0.5) * (1 - subjectData$thetaLuck)) + 0.5 ### Meaningful Condition Differences subjectData$SSAAO = vector('numeric') for (i in 1:length(included_subjects)){ datafile = paste(parentfolder, included_subjects[i], restoffilepath, sep = '') # produces a character vector 'parentfolder/included_subjects[i]**.filetype' df = read.csv2(datafile) df = df[, c(49, 40:48, 33)] df$redistributionRate = df$redistributionRate/100 #converting to a decimal from a percent result = fmincon(obj_function,x0 = initial_params, A = NULL, b = NULL, Aeq = NULL, beq = NULL, lb = lower_bounds, ub = upper_bounds, df = df) df$PredictionAAO = vector('numeric') df$ObservedOutcome = new_value(df$myself, df$redistributionRate) for (k in 1:length(df$redistributionRate)){ Utility = vector('numeric', length(Choices)) for (n in 1:length(Choices)){ Utility[n] = utility(theta = result, phi = phiPerCondition[j], Equity = equity(new_value(df[k, 1:10], choices[n]), df[k, 1:10], choices[n]), Equality = equality(new_value(df[k, 1:10], choices[n]), df[k, 1:10], choices[n]), Payout = payout(new_value(df[k, 1], choices[n]), df[k, 1], choices[n])) } correct_choice = which(Utility == max(Utility)) df$PredictionAAO[k] = new_value(df$myself[k], Choices[correct_choice[sample(length(correct_choice), 1)]]) } subjectData$SSAAO[i] = sum((df$PredictionAAO - df$ObservedOutcome)**2) } N = length(df[, 1]) k = 2 subjectData$AICAAO = N * log(subjectData$SSAAO/N) + 2*2 subjectData$SSCond = subjectData$SSMerit + subjectData$SSEntitlement + subjectData$SSCorruption + subjectData$SSLuck subjectData$AICCond = N * log(subjectData$SSCond/N) + 2*2*length(conditions) excluded = which(subjectData$SSCond == 0) t.test(subjectData$AICCond[-excluded], subjectData$AICAAO, paired = T, alternative = 'less') .. tab-item:: MatLab :: % Robust and Reliable Free Parameters subjectData.phiMeritAdjusted = (subjectData.phiMerit - 0.5); % subtracting 0.5 puts 0 where phi weights on equality-seeking and equity-seeking in the same way (i.e. indifference point) subjectData.phiMeritAdjusted = subjectData.phiMeritAdjusted * (1 - subjectData.thetaMerit); % the greater theta is, the closer to 0 this adjusted value is subjectData.phiMeritAdjusted = subjectData.phiMeritAdjusted + 0.5; % returning to initial scale value (i.e. 0 to 1 with 0.5 being the indifference point) subjectData.phiEntitlementAdjusted = ((subjectData.phiEntitlement - 0.5) * (1 - subjectData.thetaEntitlement)) + 0.5; subjectData.phiCorruptionAdjusted = ((subjectData.phiCorruption - 0.5) * (1 - subjectData.thetaCorruption)) + 0.5; subjectData.phiLuckAdjusted = ((subjectData.phiLuck - 0.5) * (1 - subjectData.thetaLuck)) + 0.5; % Meaningful Condition Differences subjectData.SSAAO = zeros(1, length(included_subjects)); for i = 1:length(included_subjects) datafile = strcat(parentfolder, included_subjects{i}, restoffilepath); % produces a character vector 'parentfolder/included_subjects[i]**.filetype' df = readtable(datafile); df = df(:, [49, 40:48, 33]); df.redistributionRate = df.redistributionRate / 100; % converting to a decimal from a percent result = fmincon(@(params) obj_function(params, df), initial_params, [], [], [], [], lower_bounds, upper_bounds); df.PredictionAAO = zeros(height(df), 1); df.ObservedOutcome = new_value(df.myself, df.redistributionRate); for k = 1:height(df.redistributionRate) Utility = zeros(1, length(Choices)); for n = 1:length(Choices) Utility(n) = utility(result, phiPerCondition(j), ... equity(new_value(table2array(df(k, 1:10)), Choices(n)), table2array(df(k, 1:10)), Choices(n)), ... equality(new_value(table2array(df(k, 1:10)), Choices(n)), table2array(df(k, 1:10)), Choices(n)), ... payout(new_value(table2array(df(k, 1)), Choices(n)), table2array(df(k, 1)), Choices(n))); } correct_choice = find(Utility == max(Utility)); df.PredictionAAO(k) = new_value(df.myself(k), Choices(correct_choice(randi(length(correct_choice))))); } subjectData.SSAAO(i) = sum((df.PredictionAAO - df.ObservedOutcome).^2); end N = height(df); k = 2; subjectData.AICAAO = N * log(subjectData.SSAAO / N) + 2 * 2; subjectData.SSCond = subjectData.SSMerit + subjectData.SSEntitlement + subjectData.SSCorruption + subjectData.SSLuck; subjectData.AICCond = N * log(subjectData.SSCond / N) + 2 * 2 * length(conditions); excluded = find(subjectData.SSCond == 0); ttest(subjectData.AICCond(~excluded), subjectData.AICAAO, 'Tail', 'left', 'Alpha', alpha); .. tab-item:: Python :: # Robust and Reliable Free Parameters subjectData['phiMeritAdjusted'] = (subjectData['phiMerit'] - 0.5) # subtracting 0.5 puts 0 where phi weights on equality-seeking and equity-seeking in the same way (i.e. indifference point) subjectData['phiMeritAdjusted'] = subjectData['phiMeritAdjusted'] * (1 - subjectData['thetaMerit']) # the greater theta is, the closer to 0 this adjusted value is subjectData['phiMeritAdjusted'] = subjectData['phiMeritAdjusted'] + 0.5 # returning to initial scale value (i.e. 0 to 1 with 0.5 being the indifference point) subjectData['phiEntitlementAdjusted'] = ((subjectData['phiEntitlement'] - 0.5) * (1 - subjectData['thetaEntitlement'])) + 0.5 subjectData['phiCorruptionAdjusted'] = ((subjectData['phiCorruption'] - 0.5) * (1 - subjectData['thetaCorruption'])) + 0.5 subjectData['phiLuckAdjusted'] = ((subjectData['phiLuck'] - 0.5) * (1 - subjectData['thetaLuck'])) + 0.5 # Meaningful Condition Differences subjectData['SSAAO'] = np.zeros(len(included_subjects)) for i in range(len(included_subjects)): datafile = parentfolder + included_subjects[i] + restoffilepath # produces a character vector 'parentfolder/included_subjects[i]**.filetype' df = pd.read_csv(datafile) df = df.iloc[:, [48, 39, 40, 41, 42, 43, 44, 45, 46, 47, 32]] df['redistributionRate'] = df['redistributionRate'] / 100 # converting to a decimal from a percent result = minimize(lambda params: obj_function(params, df), initial_params, bounds=list(zip(lower_bounds, upper_bounds))) df['PredictionAAO'] = np.zeros(len(df)) df['ObservedOutcome'] = new_value(df['myself'], df['redistributionRate']) for k in range(len(df['redistributionRate'])): Utility = np.zeros(len(Choices)) for n in range(len(Choices)): Utility[n] = utility(result.x, phiPerCondition[j], equity(new_value(df.iloc[k, 0:9], Choices[n]), df.iloc[k, 0:9], Choices[n]), equality(new_value(df.iloc[k, 0:10], Choices[n]), df.iloc[k, 0:9], Choices[n]), payout(new_value(df.iloc[k, 0], Choices[n]), df.iloc[k, 0], Choices[n])) correct_choice = np.argmax(Utility) df.at[k, 'PredictionAAO'] = new_value(df['myself'][k], Choices[correct_choice[np.random.randint(len(correct_choice))]]) subjectData['SSAAO'][i] = np.sum((df['PredictionAAO'] - df['ObservedOutcome'])**2) N = len(df) k = 2 subjectData['AICAAO'] = N * np.log(subjectData['SSAAO'] / N) + 2 * 2 subjectData['SSCond'] = subjectData['SSMerit'] + subjectData['SSEntitlement'] + subjectData['SSCorruption'] + subjectData['SSLuck'] subjectData['AICCond'] = N * np.log(subjectData['SSCond'] / N) + 2 * 2 * len(conditions) excluded = np.where(subjectData['SSCond'] == 0)[0] ttest_rel(subjectData['AICCond'][~excluded], subjectData['AICAAO'], alternative='less', nan_policy='omit') .. dropdown:: Testing a Modulatory Hypothesis .. tab-set:: .. tab-item:: R :: omnibusData = data.frame(Theta = c(subjectData$thetaMerit, subjectData$thetaEntitlement, subjectData$thetaCorruption, subjectData$thetaLuck), Phi = c(subjectData$phiMeritAdjusted, subjectData$phiEntitlementAdjusted, subjectData$phiCorruptionAdjusted, subjectData$phiLuckAdjusted), Condition = rep(conditions, each = length(subjectData$SubjectID)), SubjectID = rep(subjectData$SubjectID, times = length(conditions))) library(lme4) omnibusModulatoryEffectTheta = lmer(data = omnibusData, Theta ~ Condition + (1 | SubjectID)) omnibusModulatoryEffectPhi = lmer(data = omnibusData, Phi ~ Condition + (1 | SubjectID)) .. tab-item:: MatLab :: Theta = [subjectData.thetaMerit; subjectData.thetaEntitlement; subjectData.thetaCorruption; subjectData.thetaLuck]; Phi = [subjectData.phiMeritAdjusted; subjectData.phiEntitlementAdjusted; subjectData.phiCorruptionAdjusted; subjectData.phiLuckAdjusted]; Conditions = repmat(conditions', length(subjectData.SubjectID), 1); SubjectID = repmat(subjectData.SubjectID, length(conditions), 1); omnibusData = table(Theta, Phi, Conditions, SubjectID); omnibusModulatoryEffectTheta = fitlme(omnibusData, 'Theta ~ Condition + (1|SubjectID)'); omnibusModulatoryEffectPhi = fitlme(omnibusData, 'Phi ~ Condition + (1|SubjectID)'); .. tab-item:: Python :: # Requires 'statsmodels' - you can ``pip install statsmodels u`` import pandas as pd import statsmodels.api as sm import statsmodels.formula.api as smf # Assuming omnibusData is a pandas DataFrame omnibusData = pd.DataFrame({ 'Theta': np.concatenate([subjectData['thetaMerit'], subjectData['thetaEntitlement'], subjectData['thetaCorruption'], subjectData['thetaLuck']]), 'Phi': np.concatenate([subjectData['phiMeritAdjusted'], subjectData['phiEntitlementAdjusted'], subjectData['phiCorruptionAdjusted'], subjectData['phiLuckAdjusted']]), 'Condition': np.tile(conditions, len(subjectData['SubjectID'])), 'SubjectID': np.repeat(subjectData['SubjectID'], len(conditions)) }) # Fit mixed-effects models omnibusModulatoryEffectTheta = smf.mixedlm("Theta ~ Condition", omnibusData, groups=omnibusData["SubjectID"]).fit() omnibusModulatoryEffectPhi = smf.mixedlm("Phi ~ Condition", omnibusData, groups=omnibusData["SubjectID"]).fit() # Print model summaries print(omnibusModulatoryEffectTheta.summary()) print(omnibusModulatoryEffectPhi.summary()) .. dropdown:: Using Categorical Clusters to Test Modulatory Hypotheses .. tab-set:: .. tab-item:: R :: subjectData$strategyMerit = as.factor(subjectData$strategyMerit) subjectData$strategyEntitlement = as.factor(subjectData$strategyEntitlement) subjectData$strategyCorruption = as.factor(subjectData$strategyCorruption) subjectData$strategyLuck = as.factor(subjectData$strategyLuck) strategies = levels(subjectData) columns = 10:13 #columns of subject data correspond, in order, to the conditions in the conditions vector for (i in 1:length(conditions)){ for (j in 1:length(strategies)){ group_by_condition[i, j] = sum(subjectData[, columns[i]] == strategies[j]) } } colnames(group_by_condition) = strategies rownames(group_by_condition) = conditions chisq.test(group_by_condition) .. tab-item:: MatLab :: % Assuming subjectData is a table with columns strategyMerit, strategyEntitlement, strategyCorruption, strategyLuck, % and conditions is a cell array containing condition names subjectData.strategyMerit = categorical(subjectData.strategyMerit); subjectData.strategyEntitlement = categorical(subjectData.strategyEntitlement); subjectData.strategyCorruption = categorical(subjectData.strategyCorruption); subjectData.strategyLuck = categorical(subjectData.strategyLuck); strategies = categories(subjectData.strategyMerit); % Assuming all strategy columns have the same set of levels columns = 10:13; % columns of subject data correspond, in order, to the conditions in the conditions vector group_by_condition = zeros(length(conditions), length(strategies)); for i = 1:length(conditions) for j = 1:length(strategies) group_by_condition(i, j) = sum(subjectData{:, columns(i)} == strategies{j}); end end group_by_condition_table = array2table(group_by_condition, 'VariableNames', strategies, 'RowNames', conditions); [p, chi2stat, ~, ~] = chi2test(group_by_condition); .. tab-item:: Python :: from scipy.stats import chi2_contingency # Assuming subjectData is a pandas DataFrame subjectData['strategyMerit'] = subjectData['strategyMerit'].astype('category') subjectData['strategyEntitlement'] = subjectData['strategyEntitlement'].astype('category') subjectData['strategyCorruption'] = subjectData['strategyCorruption'].astype('category') subjectData['strategyLuck'] = subjectData['strategyLuck'].astype('category') strategies = subjectData['strategyMerit'].cat.categories # Assuming all strategy columns have the same set of levels columns = list(range(9, 13)) # columns of subject data correspond, in order, to the conditions in the conditions vector group_by_condition = np.zeros((len(conditions), len(strategies))) for i, condition in enumerate(conditions): for j, strategy in enumerate(strategies): group_by_condition[i, j] = np.sum(subjectData.iloc[:, columns[i]] == strategy) group_by_condition = pd.DataFrame(group_by_condition, index=conditions, columns=strategies) chi2, p, _, _ = chi2_contingency(group_by_condition) print("Chi-squared statistic:", chi2) print("P-value:", p) Tutorial 3 - Crockett et al., 2014 ------------------- .. dropdown:: Testing a Modulatory Hypothesis .. tab-set:: .. tab-item:: R :: t.test(as.numeric(subjectData$Kappa_Self[-20]), as.numeric(subjectData$Kappa_Other[-20]), paired = T) mean(as.numeric(subjectData$Kappa_Self[-20])) mean(as.numeric(subjectData$Kappa_Other[-20])) .. tab-item:: MatLab :: .. tab-item:: Python :: Tutorial 4 - Li et al., 2022 ------------------- Not Applicable to this Data.