Test Modulatory Hypotheses

Lesson

Elijah Galvan

September 1, 2023

15 min read

Goal During this Stage

To see how Conditions affect Free Parameters.

Note

In experiments where you use multiple Conditions, you will end up recovering Free Parameters for seperate Conditions. Thus, you may wish to use these Free Parameters as a proxy for how preferences change as a function of your Condition - these would be Modulatory Hypotheses (i.e. how Conditions modulate the data generation process).

Why Fitting Per Condition is Less Preferrable

Although there are definitely use cases for fitting a model over multiple different Conditions (what we will learn below), when possible it is typically better to integrate multiple Conditions within the same model. Namely, when it is believed that Conditions will modulate individuals’ Preferences and there are more than two Conditions, I believe that models which internally hypothesize how these Preferences change across Conditions are superior. In general, there are three reasons that it is not desirable to even consider models fit separately per Condition when there are more than two Conditions.

Inability to Correctly Identify the Correct Model

Simulations from a hypothetical Dictator Game experiment are presented here which test the ability of model comparison to correctly identify the model that generated the data. A model split over Conditions often outperforms the model that actually generated the data even though it is more parsimonious under several different circumstances. This means, depending on (a) the assumptions of your Parameter estimation process, (b) the features of your data, and (c) the power in your sample, including a per condition model in your model set may obfuscate the real model.

Lack of Predictive Value

We run experiments to gain insights which, we hope, generalize beyond the laboratory. One key advantage of computational models is that they enable us to generate novel predictions based on known Preferences to new situations within the laboratory - moving us closer to this goal of external validation. However, for models fit seperately per Condition, if we demonstrate that this model is superior via model testing what we have also demonstrated is that Free Parameters do not generalize to new contexts. Thus, the model has no predictive value in new Conditions which further undermines its’ generalizability.

Theoretical Deficits

Similar to its lack of predictive value, a model fit separately between Conditions misses an opportunity to hone theoretical knowledge. A key reason that we assess how Preferences change between Conditions is typically because we hypothesize that Conditions are distinct in articulable ways. Testing these hypothesized differences is a key advantage of computational modeling, but a model fit separately between Conditions does not enable this hypothesis to be tested.

How to Achieve this Goal

Preliminary Validation

Before you jump to testing, there are also a few more new things that we have to check. Let’s take a look at each.

Robust and Reliable Free Parameters

As we said before, we have to prove that our recovery of Free Parameters is robust. This is a step above what we previously did - essentially, fivefold validation allowed us to rule out the idea that our Free Parameters were overfitted meaning that our model wasn’t performing so well because it was just capturing little quirks in the data. We need to take this a step further to use these directly in statistical analyses: we need to show that treating recovered Free Parameters as a continuous scale measure is appropriate. If the following are false, you should be okay to proceed:

  1. Your utility equation applies a nonlinear transformation to your Free Parameters - this means that an increase in one unit of your Free Parameter scale is not equal for all values of the Free Parameter so this analysis is probably inappropriate

  2. The recovery of your Free Parameter that you want to test is independent of the other Free Parameters in your model - if your Free Parameter values only interact with other Free Parameters you will have to apply a transformation to account for this dependency (see tutorial 2 for an example of this)

  3. Remember that assumption of homoscedasticity that we said wasn’t super important? Well, now it is. If your data is heteroscedastic, recovery of Free Parameters could be differentially overfit or underfit at certain values of the Independent Variable which makes these Free Parameters unreliable. You will have to re-estimate your Free Parameters using an alternative estimator (i.e. Robust Maximum Likelihood Estimation or Weighted Least Squares)

Meaningful Condition Differences

So you’ve now shown that your Free Parameters are robust and reliable - what’s left to do other than test? Something really important actually: you have to prove that you are even justified in recovering different Free Parameters in each Condition. Even if you show that Free Parameters are meaningfully different across Conditions, the test results are not valid if you have not proven that the Decisions that Subjects make differ between Conditions.

So, we’re going to go back and create a model which does not differentiate between Conditions - training all of the data at once. Since our demo did not have a design with multiple Conditions, we’ll create a complete example here.

obj_function_aao = function(params, decisions, method = "OLS") {
    Parameter1 = params[1]
    Parameter2 = params[2]

    trialList = #must redefine and also must be of the same length as decisions

    predicted_utility = vector('numeric', length(trialList[,1]))
    observed_utility = vector('numeric', length(trialList[,1]))

    for (k in 1:length(trialList[,1])){
        IV = trialList[k, 1]
        Constant = trialList[k, 2]
        Choices = #something

        Utility = vector('numeric', length(Choices))
        for (n in 1:length(Choices)){
        Utility[n] = utility(Parameter1, Parameter2, construct1(IV, Constant, Choices[n]), construct2(IV, Constant, Choices[n]), construct3(IV, Constant, Choices[n]))
        }
        predicted_utility[k] = max(Utility)
        observed_utility[k] = Utility[chosen[k]]
    }
    if (method == "OLS"){
        return(sum((predicted_utility - observed_utility)**2))
    } else if (method == "MLE"){
        return(-1 * sum(dnorm(observed_utility, mean = predicted_utility, sd = sd, log = TRUE)))
    }
}

for (i in 1:length(included_subjects)){
    datafile = paste(parentfolder, included_subjects[i], restoffilepath, sep = '') # produces a character vector 'parentfolder/included_subjects[i]**.filetype'
    df = read.csv2(datafile) #this will have variables called IV, Decisions, Condition, and information about the original order of trials (i.e. trialsTask.thisIndex) - it will also have information about the number of blocks

    df$Prediction = vector('numeric', length(df$IV))
    Par1_PerCondition = vector('numeric', length(levels(df$Condition)))
    Par2_PerCondition = vector('numeric', length(levels(df$Condition)))
    SS_PerCondition = vector('numeric', length(levels(df$Condition)))
    Deviance_PerCondition = vector('numeric', length(levels(df$Condition))) #to calculate NLL later

    for (c in 1:length(levels(df$Condition))){

        result = fmincon(obj_function,x0 = initial_params, A = NULL, b = NULL, Aeq = NULL, beq = NULL,
                        lb = lower_bounds, ub = upper_bounds,
                        df = df)

        #Just Added

        closestPoint = which(as.numeric(freeParameters[,1]) == as.numeric(round(result$par[1])) & as.numeric(freeParameters[,2]) == as.numeric(round(result$par[2])))
        Prediction = vector('numeric')
        for (k in 1:length(df$Decisions)){
            Utility = vector('numeric', length(Choices))
            for (n in 1:length(Choices)){
                Utility[n] = utility(parameter1 = results$par[1],
                                    parameter2 = results$par[2],
                                    construct1 = construct1(df$IV[k], df$Constant[k], Choices[n]),
                                    construct2 = construct2(df$IV[k], df$Constant[k], Choices[n])),
                                    construct3 = construct3(df$IV[k], df$Constant[k], Choices[n])
            }
            correct_choice = which(Utility == max(Utility))
            if (length(correct_choice) > 1){
                correct_choice = correct_choice[sample(correct_choice, 1)]
            }
            Prediction[k] = Choices[correct_choice]
        }

        Deviance_PerCondition[c] = dnorm(df$Decision, mean = Prediction)
        SS_PerCondition[c] = sum((df$Decision - Prediction)**2)
        df$Prediction[which(df$Condition == levels(df$Condition)[c])] = Prediction
    }
    NLL_PerCondition = -2 * log(sum(Deviance_PerCondition))

    result = fmincon(obj_function_aao,x0 = initial_params, A = NULL, b = NULL, Aeq = NULL, beq = NULL,
                     lb = lower_bounds, ub = upper_bounds,
                     decisions = df$Decisions)

    df$PredictionAAO = vector('numeric')
    for (k in 1:length(df$Decisions)){
        Utility = vector('numeric', length(Choices))
        for (n in 1:length(Choices)){
            Utility[n] = utility(parameter1 = results$par[1],
                                parameter2 = results$par[2],
                                construct1 = construct1(df$IV[k], df$Constant[k], Choices[n]),
                                construct2 = construct2(df$IV[k], df$Constant[k], Choices[n])),
                                construct3 = construct3(df$IV[k], df$Constant[k], Choices[n])
        }
        correct_choice = which(Utility == max(Utility))
        if (length(correct_choice) > 1){
            correct_choice = correct_choice[sample(correct_choice, 1)]
        }
        df$PredictionAAO[k] = Choices[correct_choice]
    }

    NLL_AAO = -2 * log(sum(dnorm(df$Decision, mean = df$Prediction)))
    SS_AAO = sum((df$Decision - df$Prediction)**2)

    subjectData[i, ] = c(included_subjects[i], sum(SS_PerCondition), NLL_PerCondition, SS_AAO, NLL_AAO,
                         Par1_PerCondition, Par2_PerCondition, result$par[1], result$par[2])

    start = length(subjectData[, 1]) + 1
    end = start + length(df$Decisions)
    trialData[start:end, 1] = included_subjects[i]
    trialData[start:end, 2] = df$IV
    trialData[start:end, 3] = df$Constant
    trialData[start:end, 4] = df$Decision
    trialData[start:end, 5] = df$Condition
    trialData[start:end, 6] = df$Prediction
    trialData[start:end, 7] = df$PredictionAAO

}
colnames(subjectData) = c('SubjectID', 'modelSS_PerCondition', 'modelNLL_PerCondition', 'modelSS_AllAtOnce', 'modelNLL_AllAtOnce',
                          'Parameter1_Condition1', ..., 'Parameter2_Condition1', ..., 'Parameter1_AllAtOnce', 'Parameter2_AllAtOnce')
#levels(df$Condition) will always be in the same order for all subjects so conditions will be saved in the same columns
colnames(trailData) = c('SubjectID', 'IV', 'Constant', 'Decision', 'Prediction_PerCondition', 'Prediction_AllAtOnce')

subjectData$AIC_PerCondition = length(df$IV) * log(subjectData$SS_PerCondition/length(df$IV)) + 2 * 2 * (length(levels(df$Condition)))
subjectData$AIC_AllAtOnce = length(df$IV) * log(subjectData$SS_AllAtOnce/length(df$IV)) + 2 * 2 * (length(levels(df$Condition)))

t.test(subjectData$AIC_PerCondition, subjectData$AIC_AllAtOnce, paired = T)
function obj_function_aao = obj_function(params, decisions, method)
    Parameter1 = params(1);
    Parameter2 = params(2);

    trialList = % must redefine and also must be of the same length as decisions

    predicted_utility = zeros(1, length(trialList(:, 1)));
    observed_utility = zeros(1, length(trialList(:, 1)));

    for k = 1:length(trialList(:, 1))
        IV = trialList(k, 1);
        Constant = trialList(k, 2);
        Choices = % something

        Utility = zeros(1, length(Choices));
        for n = 1:length(Choices)
            Utility(n) = utility(Parameter1, Parameter2, construct1(IV, Constant, Choices(n)), construct2(IV, Constant, Choices(n)), construct3(IV, Constant, Choices(n)));
        end
        predicted_utility(k) = max(Utility);
        observed_utility(k) = Utility(chosen(k));
    end

    if strcmp(method, 'OLS')
        obj_function_aao = sum((predicted_utility - observed_utility).^2);
    elseif strcmp(method, 'MLE')
        obj_function_aao = -1 * sum(log(normpdf(observed_utility, predicted_utility, sd)));
    end
end

for i = 1:length(included_subjects)
    datafile = strcat(parentfolder, included_subjects(i), restoffilepath);
    df = readtable(datafile);

    df.Prediction = zeros(1, length(df.IV));
    Par1_PerCondition = zeros(1, length(unique(df.Condition)));
    Par2_PerCondition = zeros(1, length(unique(df.Condition)));
    SS_PerCondition = zeros(1, length(unique(df.Condition)));
    Deviance_PerCondition = zeros(1, length(unique(df.Condition)));

    for c = 1:length(unique(df.Condition))

        result = fmincon(@obj_function, initial_params, [], [], [], [], lower_bounds, upper_bounds, df);

        closestPoint = find(freeParameters(:, 1) == round(result(1)) & freeParameters(:, 2) == round(result(2)));
        Prediction = zeros(1, length(df.Decisions));
        for k = 1:length(df.Decisions)
            Utility = zeros(1, length(Choices));
            for n = 1:length(Choices)
                Utility(n) = utility(result(1), result(2), construct1(df.IV(k), df.Constant(k), Choices(n)), construct2(df.IV(k), df.Constant(k), Choices(n)), construct3(df.IV(k), df.Constant(k), Choices(n)));
            end
            correct_choice = find(Utility == max(Utility));
            if length(correct_choice) > 1
                correct_choice = correct_choice(randi(length(correct_choice)));
            end
            Prediction(k) = Choices(correct_choice);
        end

        Deviance_PerCondition(c) = normpdf(df.Decision, Prediction);
        SS_PerCondition(c) = sum((df.Decision - Prediction).^2);
        df.Prediction(df.Condition == unique(df.Condition)(c)) = Prediction;
    end
    NLL_PerCondition = -2 * sum(log(Deviance_PerCondition));

    result = fmincon(@obj_function_aao, initial_params, [], [], [], [], lower_bounds, upper_bounds, df.Decisions);

    df.PredictionAAO = zeros(1, length(df.Decisions));
    for k = 1:length(df.Decisions)
        Utility = zeros(1, length(Choices));
        for n = 1:length(Choices)
            Utility(n) = utility(result(1), result(2), construct1(df.IV(k), df.Constant(k), Choices(n)), construct2(df.IV(k), df.Constant(k), Choices(n)), construct3(df.IV(k), df.Constant(k), Choices(n)));
        end
        correct_choice = find(Utility == max(Utility));
        if length(correct_choice) > 1
            correct_choice = correct_choice(randi(length(correct_choice)));
        end
        df.PredictionAAO(k) = Choices(correct_choice);
    end

    NLL_AAO = -2 * sum(log(normpdf(df.Decision, df.Prediction)));
    SS_AAO = sum((df.Decision - df.Prediction).^2);

    subjectData(i, :) = [included_subjects(i), sum(SS_PerCondition), NLL_PerCondition, SS_AAO, NLL_AAO, Par1_PerCondition, Par2_PerCondition, result(1), result(2)];

    start = size(subjectData, 1) + 1;
    endIdx = start + length(df.Decisions) - 1;
    trialData(start:endIdx, 1) = included_subjects(i);
    trialData(start:endIdx, 2) = df.IV;
    trialData(start:endIdx, 3) = df.Constant;
    trialData(start:endIdx, 4) = df.Decision;
    trialData(start:endIdx, 5) = df.Condition;
    trialData(start:endIdx, 6) = df.Prediction;
    trialData(start:endIdx, 7) = df.PredictionAAO;
end

subjectData.Properties.VariableNames = {'SubjectID', 'modelSS_PerCondition', 'modelNLL_PerCondition', 'modelSS_AllAtOnce', 'modelNLL_AllAtOnce', 'Parameter1_Condition1', 'Parameter2_Condition1', 'Parameter1_Condition2', 'Parameter2_Condition2', 'Parameter1_AllAtOnce', 'Parameter2_AllAtOnce'};
trailData.Properties.VariableNames = {'SubjectID', 'IV', 'Constant', 'Decision', 'Prediction_PerCondition', 'Prediction_AllAtOnce'};

subjectData.AIC_PerCondition = length(df.IV) * log(subjectData.modelSS_PerCondition/length(df.IV)) + 2 * 2 * length(unique(df.Condition));
subjectData.AIC_AllAtOnce = length(df.IV) * log(subjectData.modelSS_AllAtOnce/length(df.IV)) + 2 * 2 * length(unique(df.Condition));

ttest(subjectData.AIC_PerCondition, subjectData.AIC_AllAtOnce, 'Paired', true);
def obj_function(params, decisions, method):
    Parameter1 = params[0]
    Parameter2 = params[1]

    trialList = # must redefine and also must be of the same length as decisions

    predicted_utility = np.zeros(len(trialList[:, 0]))
    observed_utility = np.zeros(len(trialList[:, 0]))

    for k in range(len(trialList[:, 0])):
        IV = trialList[k, 0]
        Constant = trialList[k, 1]
        Choices = # something

        Utility = np.zeros(len(Choices))
        for n in range(len(Choices)):
            Utility[n] = utility(Parameter1, Parameter2, construct1(IV, Constant, Choices[n]), construct2(IV, Constant, Choices[n]), construct3(IV, Constant, Choices[n]))
        predicted_utility[k] = max(Utility)
        observed_utility[k] = Utility[chosen[k]]

    if method == 'OLS':
        return np.sum((predicted_utility - observed_utility)**2)
    elif method == 'MLE':
        return -1 * np.sum(np.log(norm.pdf(observed_utility, loc=predicted_utility, scale=sd)))

for i in range(len(included_subjects)):
    datafile = parentfolder + included_subjects[i] + restoffilepath
    df = pd.read_csv(datafile)

    df['Prediction'] = np.zeros(len(df['IV']))
    Par1_PerCondition = np.zeros(len(df['Condition'].unique()))
    Par2_PerCondition = np.zeros(len(df['Condition'].unique()))
    SS_PerCondition = np.zeros(len(df['Condition'].unique()))
    Deviance_PerCondition = np.zeros(len(df['Condition'].unique()))

    for c in range(len(df['Condition'].unique())):

        result = minimize(obj_function, initial_params, args=(df), bounds=list(zip(lower_bounds, upper_bounds)))

        closestPoint = np.where((freeParameters[:, 0] == round(result.x[0])) & (freeParameters[:, 1] == round(result.x[1])))
        Prediction = np.zeros(len(df['Decisions']))
        for k in range(len(df['Decisions'])):
            Utility = np.zeros(len(Choices))
            for n in range(len(Choices)):
                Utility[n] = utility(result.x[0], result.x[1], construct1(df['IV'][k], df['Constant'][k], Choices[n]), construct2(df['IV'][k], df['Constant'][k], Choices[n]), construct3(df['IV'][k], df['Constant'][k], Choices[n]))
            correct_choice = np.where(Utility == max(Utility))
            if len(correct_choice) > 1:
                correct_choice = correct_choice[np.random.choice(len(correct_choice))]
            Prediction[k] = Choices[correct_choice[0]]

        Deviance_PerCondition[c] = norm.pdf(df['Decision'], Prediction)
        SS_PerCondition[c] = np.sum((df['Decision'] - Prediction)**2)
        df['Prediction'][df['Condition'] == df['Condition'].unique()[c]] = Prediction
    NLL_PerCondition = -2 * np.sum(np.log(Deviance_PerCondition))

    result = minimize(obj_function_aao, initial_params, args=(df['Decisions'],), bounds=list(zip(lower_bounds, upper_bounds)))

    df['PredictionAAO'] = np.zeros(len(df['Decisions']))
    for k in range(len(df['Decisions'])):
        Utility = np.zeros(len(Choices))
        for n in range(len(Choices)):
            Utility[n] = utility(result.x[0], result.x[1], construct1(df['IV'][k], df['Constant'][k], Choices[n]), construct2(df['IV'][k], df['Constant'][k], Choices[n]), construct3(df['IV'][k], df['Constant'][k], Choices[n]))
        correct_choice = np.where(Utility == max(Utility))
        if len(correct_choice) > 1:
            correct_choice = correct_choice[np.random.choice(len(correct_choice))]
        df['PredictionAAO'][k] = Choices[correct_choice[0]]

    NLL_AAO = -2 * np.sum(np.log(norm.pdf(df['Decision'], df['Prediction'])))
    SS_AAO = np.sum((df['Decision'] - df['Prediction'])**2)

    subjectData[i, :] = [included_subjects[i], np.sum(SS_PerCondition), NLL_PerCondition, SS_AAO, NLL_AAO, Par1_PerCondition, Par2_PerCondition, result.x[0], result.x[1]]

    start = subjectData.shape[0] + 1
    endIdx = start + len(df['Decisions']) - 1
    trialData[start:endIdx, 0] = included_subjects[i]
    trialData[start:endIdx, 1] = df['IV']
    trialData[start:endIdx, 2] = df['Constant']
    trialData[start:endIdx, 3] = df['Decision']
    trialData[start:endIdx, 4] = df['Condition']
    trialData[start:endIdx, 5] = df['Prediction']
    trialData[start:endIdx, 6] = df['PredictionAAO']

subjectData.columns = ['SubjectID', 'modelSS_PerCondition', 'modelNLL_PerCondition', 'modelSS_AllAtOnce', 'modelNLL_AllAtOnce', 'Parameter1_Condition1', 'Parameter2_Condition1', 'Parameter1_Condition2', 'Parameter2_Condition2', 'Parameter1_AllAtOnce', 'Parameter2_AllAtOnce']

subjectData['AIC_PerCondition'] = len(df['IV']) * np.log(subjectData['modelSS_PerCondition'] / len(df['IV'])) + 2 * 2 * len(df['Condition'].unique())
subjectData['AIC_AllAtOnce'] = len(df['IV']) * np.log(subjectData['modelSS_AllAtOnce'] / len(df['IV'])) + 2 * 2 * len(df['Condition'].unique())

ttest_rel(subjectData['AIC_PerCondition'], subjectData['AIC_AllAtOnce'])
Testing a Modulatory Hypothesis

Now, if we’ve shown that people do indeed make different Decisions in each Condition and are convinced that our Free Parameters can be trusted as a valid continuous measure of Subjects’ preferences, we can now test our modulatory hypotheses. To reiterate, these are hypotheses about directional, group-level differences: you’ve already shown that preferences change and now you want to show how they specifically change as a function of Condition. Occasionally, if you have more than two Conditions, you might also want to first do an omnibus test for Condition effects and then you might also do post-hoc tests after. We’ll show you how to do both of these.

t.test(subjectData$Parameter1_Condition1, subjectData$Parameter1_Condition2, paired = T) #change in parameter 1 between condition 1 and 2

#now we need to put this data in long format to do a linear mixed effects model

omnibusData = data.frame(c(subjectData$Parameter1_Condition1, subjectData$Parameter1_Condition2, subjectData$Parameter1_Condition3),
                         rep(c('Condition 1', 'Condition 2', 'Condition 3'), each = length(subjectData$SubjectID)),
                         rep(subjectData$SubjectID, times = 3))
colnames(omnibusData) = c('Parameter1', 'Condition', 'SubjectID')

ombnibusModulatoryEffect = lmer(data = ombnibusData, Parameter1 ~ Condition + (1 | SubjectID)) #our omnnibus test

summary(ombnibusModulatoryEffect) #if omnibus test is signficiant, proceed to post hoc tests below

library(emmeans)
Parameter1_PostHocModulationEffect = emmeans(ombnibusModulatoryEffect, "Condition")
summary(pairs(Parameter1_PostHocModulationEffect)) #post hoc pairwise test across condition
% t-test
[h, p, ci, stats] = ttest(subjectData.Parameter1_Condition1, subjectData.Parameter1_Condition2, 'paired', true);

% Put data in long format
omnibusData = table([subjectData.Parameter1_Condition1; subjectData.Parameter1_Condition2; subjectData.Parameter1_Condition3], ...
    repelem({'Condition 1', 'Condition 2', 'Condition 3'}, length(subjectData.SubjectID)), ...
    repmat(subjectData.SubjectID, 1, 3), ...
    'VariableNames', {'Parameter1', 'Condition', 'SubjectID'});

% Fit linear mixed effects model
omnibusModulatoryEffect = fitlme(omnibusData, 'Parameter1 ~ Condition + (1|SubjectID)');

% Display summary
disp(omnibusModulatoryEffect);

% Perform post hoc tests
Parameter1_PostHocModulationEffect = emmeans(omnibusModulatoryEffect, 'Condition');
disp(pairs(Parameter1_PostHocModulationEffect));
from statsmodels.formula.api import mixedlm
from pingouin import pairwise_tukey

# t-test
t_stat, p_val = sm.stats.ttest_rel(subjectData['Parameter1_Condition1'], subjectData['Parameter1_Condition2'])

# Put data in long format
data_dict = {
    'Parameter1': subjectData['Parameter1_Condition1'].append([subjectData['Parameter1_Condition2'], subjectData['Parameter1_Condition3']]),
    'Condition': ['Condition 1', 'Condition 2', 'Condition 3'] * len(subjectData['SubjectID']),
    'SubjectID': list(subjectData['SubjectID']) * 3
}
omnibusData = pd.DataFrame(data_dict)

# Fit linear mixed effects model
omnibusModulatoryEffect = mixedlm('Parameter1 ~ Condition', omnibusData, groups=omnibusData['SubjectID']).fit()

# Display summary
print(omnibusModulatoryEffect.summary())

# Perform post hoc tests
posthoc_results = pairwise_tukey(data=omnibusData, dv='Parameter1', between='Condition')
print(posthoc_results)
Using Categorical Clusters to Test Modulatory Hypotheses

Sometimes, treating Free Parameters as an outcome measure doesn’t tell us everything that we want to know about the strategies that people use to make Decisions. This is one time where using a priori clustering can come in handy: rather than saying Free Parameter 1 changed a certain amount between Condition 1 and Condition 2 and that Free Parameter 2 changed a certain amount, we summarize behavioral patterns in a way that accounts for both at once and also tells us something about how the prevalence of each strategy changed betweeen Conditions. To do this, we want to use a Chi-Square Test.

subjectData$Strategy_Condition1 = vector('character', length(subjectData$SubjectID))
subjectData$Strategy_Condition2 = vector('character', length(subjectData$SubjectID))
subjectData$Strategy_Condition3 = vector('character', length(subjectData$SubjectID))
subjectData$Strategy_Condition4 = vector('character', length(subjectData$SubjectID))
for (i in 1:length(subjectData$SubjectID)){
    subjectData$Strategy_Condition1[i] = freeParameters$Strategy[which(round(freeParameters$Parameter1, 2) == round(subjectData$Parameter1_Condition1[i], 2) & round(freeParameters$Parameter2, 2) == round(subjectData$Parameter2_Condition1[i], 2))]
    subjectData$Strategy_Condition2[i] = freeParameters$Strategy[which(round(freeParameters$Parameter1, 2) == round(subjectData$Parameter1_Condition2[i], 2) & round(freeParameters$Parameter2, 2) == round(subjectData$Parameter2_Condition2[i], 2))]
    subjectData$Strategy_Condition3[i] = freeParameters$Strategy[which(round(freeParameters$Parameter1, 2) == round(subjectData$Parameter1_Condition3[i], 2) & round(freeParameters$Parameter2, 2) == round(subjectData$Parameter2_Condition3[i], 2))]
    subjectData$Strategy_Condition4[i] = freeParameters$Strategy[which(round(freeParameters$Parameter1, 2) == round(subjectData$Parameter1_Condition4[i], 2) & round(freeParameters$Parameter2, 2) == round(subjectData$Parameter2_Condition4[i], 2))]
}

conditions = c('condition1', 'condition2', 'condition3', 'condition4')
strategies = c('strategy1', 'strategy2', 'strategy3')
group_by_condition = data.frame()
columns = #columns of subjectData where Condition1-4 are kept

for (i in 1:4){
    for (j in 1:3){
        group_by_condition[i, j] = sum(subjectData[, columns[i]] == strategies[j])
    }
}
colnames(group_by_condition) = strategies
rownames(group_by_condition) = conditions
chisq.test(group_by_condition)
subjectData.Strategy_Condition1 = cell(1, length(subjectData.SubjectID));
subjectData.Strategy_Condition2 = cell(1, length(subjectData.SubjectID));
subjectData.Strategy_Condition3 = cell(1, length(subjectData.SubjectID));
subjectData.Strategy_Condition4 = cell(1, length(subjectData.SubjectID));
for i = 1:length(subjectData.SubjectID)
    subjectData.Strategy_Condition1{i} = freeParameters.Strategy(round(freeParameters.Parameter1, 2) == round(subjectData.Parameter1_Condition1(i), 2) & round(freeParameters.Parameter2, 2) == round(subjectData.Parameter2_Condition1(i), 2));
    subjectData.Strategy_Condition2{i} = freeParameters.Strategy(round(freeParameters.Parameter1, 2) == round(subjectData.Parameter1_Condition2(i), 2) & round(freeParameters.Parameter2, 2) == round(subjectData.Parameter2_Condition2(i), 2));
    subjectData.Strategy_Condition3{i} = freeParameters.Strategy(round(freeParameters.Parameter1, 2) == round(subjectData.Parameter1_Condition3(i), 2) & round(freeParameters.Parameter2, 2) == round(subjectData.Parameter2_Condition3(i), 2));
    subjectData.Strategy_Condition4{i} = freeParameters.Strategy(round(freeParameters.Parameter1, 2) == round(subjectData.Parameter1_Condition4(i), 2) & round(freeParameters.Parameter2, 2) == round(subjectData.Parameter2_Condition4(i), 2));
end

conditions = {'condition1', 'condition2', 'condition3', 'condition4'};
strategies = {'strategy1', 'strategy2', 'strategy3'};
group_by_condition = zeros(4, 3);
columns = % columns of subjectData where Condition1-4 are kept

for i = 1:4
    for j = 1:3
        group_by_condition(i, j) = sum(strcmp(subjectData.(columns{i}), strategies{j}));
    end
end

group_by_condition = array2table(group_by_condition, 'VariableNames', strategies, 'RowNames', conditions);
chisqtest(group_by_condition)
from scipy.stats import chi2_contingency

subjectData['Strategy_Condition1'] = [None] * len(subjectData['SubjectID'])
subjectData['Strategy_Condition2'] = [None] * len(subjectData['SubjectID'])
subjectData['Strategy_Condition3'] = [None] * len(subjectData['SubjectID'])
subjectData['Strategy_Condition4'] = [None] * len(subjectData['SubjectID'])

for i in range(len(subjectData['SubjectID'])):
    subjectData.at[i, 'Strategy_Condition1'] = freeParameters['Strategy'][(np.round(freeParameters['Parameter1'], 2) == np.round(subjectData['Parameter1_Condition1'][i], 2)) & (np.round(freeParameters['Parameter2'], 2) == np.round(subjectData['Parameter2_Condition1'][i], 2))]
    subjectData.at[i, 'Strategy_Condition2'] = freeParameters['Strategy'][(np.round(freeParameters['Parameter1'], 2) == np.round(subjectData['Parameter1_Condition2'][i], 2)) & (np.round(freeParameters['Parameter2'], 2) == np.round(subjectData['Parameter2_Condition2'][i], 2))]
    subjectData.at[i, 'Strategy_Condition3'] = freeParameters['Strategy'][(np.round(freeParameters['Parameter1'], 2) == np.round(subjectData['Parameter1_Condition3'][i], 2)) & (np.round(freeParameters['Parameter2'], 2) == np.round(subjectData['Parameter2_Condition3'][i], 2))]
    subjectData.at[i, 'Strategy_Condition4'] = freeParameters['Strategy'][(np.round(freeParameters['Parameter1'], 2) == np.round(subjectData['Parameter1_Condition4'][i], 2)) & (np.round(freeParameters['Parameter2'], 2) == np.round(subjectData['Parameter2_Condition4'][i], 2))]

conditions = ['condition1', 'condition2', 'condition3', 'condition4']
strategies = ['strategy1', 'strategy2', 'strategy3']
group_by_condition = pd.DataFrame(0, index=conditions, columns=strategies)
columns = []  # columns of subjectData where Condition1-4 are kept

for i in range(4):
    for j in range(3):
        group_by_condition.iloc[i, j] = np.sum(subjectData[columns[i]] == strategies[j])

chi2, p, _, _ = chi2_contingency(group_by_condition)

Tutorials

Tutorial 1 - van Baar, Chang, & Sanfey, 2019

Not Applicable to this Data.

Tutorial 2 - Galvan & Sanfey, 2024

Preliminary Validation

Theta is a candidate for such an analysis - it is estimated uniquely with respect to Phi and we did not see any significant violations of homodscedasticity in our data.

Phi, on the other hand, is estimated depednent on Theta - remember that both appearances of Phi in our utility equation are where it interacts with Theta. Thus, we have to account for the fact that Phi is not meaningful at high values of Theta in order to use it in this way.

### Robust and Reliable Free Parameters

subjectData$phiMeritAdjusted = (subjectData$phiMerit - 0.5) #subtracting 0.5 puts 0 where phi weights on equality-seeking and equity-seeking in the same way (i.e. indifference point)
subjectData$phiMeritAdjusted = (subjectData$phiMeritAdjusted * (1 - subjectData$thetaMerit)) #the greater theta is, the closer to 0 this adjusted value is
subjectData$phiMeritAdjusted = subjectData$phiMeritAdjusted + 0.5 #returning to initial scale value (i.e. 0 to 1 with 0.5 beingthe indifference point)

subjectData$phiEntitlementAdjusted = ((subjectData$phiEntitlement - 0.5) * (1 - subjectData$thetaEntitlement)) + 0.5
subjectData$phiCorruptionAdjusted = ((subjectData$phiCorruption - 0.5) * (1 - subjectData$thetaCorruption)) + 0.5
subjectData$phiLuckAdjusted = ((subjectData$phiLuck - 0.5) * (1 - subjectData$thetaLuck)) + 0.5

### Meaningful Condition Differences

subjectData$SSAAO = vector('numeric')

for (i in 1:length(included_subjects)){
    datafile = paste(parentfolder, included_subjects[i], restoffilepath, sep = '') # produces a character vector 'parentfolder/included_subjects[i]**.filetype'
    df = read.csv2(datafile)
    df = df[, c(49, 40:48, 33)]
    df$redistributionRate = df$redistributionRate/100 #converting to a decimal from a percent
    result = fmincon(obj_function,x0 = initial_params, A = NULL, b = NULL, Aeq = NULL, beq = NULL,
                     lb = lower_bounds, ub = upper_bounds,
                     df = df)

    df$PredictionAAO = vector('numeric')
    df$ObservedOutcome = new_value(df$myself, df$redistributionRate)
    for (k in 1:length(df$redistributionRate)){
        Utility = vector('numeric', length(Choices))
        for (n in 1:length(Choices)){
            Utility[n] = utility(theta = result,
                                 phi = phiPerCondition[j],
                                 Equity = equity(new_value(df[k, 1:10], choices[n]), df[k, 1:10], choices[n]),
                                 Equality = equality(new_value(df[k, 1:10], choices[n]), df[k, 1:10], choices[n]),
                                 Payout = payout(new_value(df[k, 1], choices[n]), df[k, 1], choices[n]))
        }
        correct_choice = which(Utility == max(Utility))
        df$PredictionAAO[k] = new_value(df$myself[k], Choices[correct_choice[sample(length(correct_choice), 1)]])
    }
    subjectData$SSAAO[i] = sum((df$PredictionAAO - df$ObservedOutcome)**2)
}
N = length(df[, 1])
k = 2

subjectData$AICAAO = N * log(subjectData$SSAAO/N) + 2*2
subjectData$SSCond = subjectData$SSMerit + subjectData$SSEntitlement + subjectData$SSCorruption + subjectData$SSLuck
subjectData$AICCond = N * log(subjectData$SSCond/N) + 2*2*length(conditions)

excluded = which(subjectData$SSCond == 0)
t.test(subjectData$AICCond[-excluded], subjectData$AICAAO, paired = T, alternative = 'less')
% Robust and Reliable Free Parameters

subjectData.phiMeritAdjusted = (subjectData.phiMerit - 0.5); % subtracting 0.5 puts 0 where phi weights on equality-seeking and equity-seeking in the same way (i.e. indifference point)
subjectData.phiMeritAdjusted = subjectData.phiMeritAdjusted * (1 - subjectData.thetaMerit); % the greater theta is, the closer to 0 this adjusted value is
subjectData.phiMeritAdjusted = subjectData.phiMeritAdjusted + 0.5; % returning to initial scale value (i.e. 0 to 1 with 0.5 being the indifference point)

subjectData.phiEntitlementAdjusted = ((subjectData.phiEntitlement - 0.5) * (1 - subjectData.thetaEntitlement)) + 0.5;
subjectData.phiCorruptionAdjusted = ((subjectData.phiCorruption - 0.5) * (1 - subjectData.thetaCorruption)) + 0.5;
subjectData.phiLuckAdjusted = ((subjectData.phiLuck - 0.5) * (1 - subjectData.thetaLuck)) + 0.5;

% Meaningful Condition Differences

subjectData.SSAAO = zeros(1, length(included_subjects));

for i = 1:length(included_subjects)
    datafile = strcat(parentfolder, included_subjects{i}, restoffilepath); % produces a character vector 'parentfolder/included_subjects[i]**.filetype'
    df = readtable(datafile);
    df = df(:, [49, 40:48, 33]);
    df.redistributionRate = df.redistributionRate / 100; % converting to a decimal from a percent

    result = fmincon(@(params) obj_function(params, df), initial_params, [], [], [], [], lower_bounds, upper_bounds);

    df.PredictionAAO = zeros(height(df), 1);
    df.ObservedOutcome = new_value(df.myself, df.redistributionRate);

    for k = 1:height(df.redistributionRate)
        Utility = zeros(1, length(Choices));

        for n = 1:length(Choices)
            Utility(n) = utility(result, phiPerCondition(j), ...
                equity(new_value(table2array(df(k, 1:10)), Choices(n)), table2array(df(k, 1:10)), Choices(n)), ...
                equality(new_value(table2array(df(k, 1:10)), Choices(n)), table2array(df(k, 1:10)), Choices(n)), ...
                payout(new_value(table2array(df(k, 1)), Choices(n)), table2array(df(k, 1)), Choices(n)));
        }

        correct_choice = find(Utility == max(Utility));
        df.PredictionAAO(k) = new_value(df.myself(k), Choices(correct_choice(randi(length(correct_choice)))));
    }

    subjectData.SSAAO(i) = sum((df.PredictionAAO - df.ObservedOutcome).^2);
end

N = height(df);
k = 2;

subjectData.AICAAO = N * log(subjectData.SSAAO / N) + 2 * 2;
subjectData.SSCond = subjectData.SSMerit + subjectData.SSEntitlement + subjectData.SSCorruption + subjectData.SSLuck;
subjectData.AICCond = N * log(subjectData.SSCond / N) + 2 * 2 * length(conditions);

excluded = find(subjectData.SSCond == 0);
ttest(subjectData.AICCond(~excluded), subjectData.AICAAO, 'Tail', 'left', 'Alpha', alpha);
# Robust and Reliable Free Parameters

subjectData['phiMeritAdjusted'] = (subjectData['phiMerit'] - 0.5)  # subtracting 0.5 puts 0 where phi weights on equality-seeking and equity-seeking in the same way (i.e. indifference point)
subjectData['phiMeritAdjusted'] = subjectData['phiMeritAdjusted'] * (1 - subjectData['thetaMerit'])  # the greater theta is, the closer to 0 this adjusted value is
subjectData['phiMeritAdjusted'] = subjectData['phiMeritAdjusted'] + 0.5  # returning to initial scale value (i.e. 0 to 1 with 0.5 being the indifference point)

subjectData['phiEntitlementAdjusted'] = ((subjectData['phiEntitlement'] - 0.5) * (1 - subjectData['thetaEntitlement'])) + 0.5
subjectData['phiCorruptionAdjusted'] = ((subjectData['phiCorruption'] - 0.5) * (1 - subjectData['thetaCorruption'])) + 0.5
subjectData['phiLuckAdjusted'] = ((subjectData['phiLuck'] - 0.5) * (1 - subjectData['thetaLuck'])) + 0.5

# Meaningful Condition Differences

subjectData['SSAAO'] = np.zeros(len(included_subjects))

for i in range(len(included_subjects)):
    datafile = parentfolder + included_subjects[i] + restoffilepath  # produces a character vector 'parentfolder/included_subjects[i]**.filetype'
    df = pd.read_csv(datafile)
    df = df.iloc[:, [48, 39, 40, 41, 42, 43, 44, 45, 46, 47, 32]]
    df['redistributionRate'] = df['redistributionRate'] / 100  # converting to a decimal from a percent

    result = minimize(lambda params: obj_function(params, df), initial_params, bounds=list(zip(lower_bounds, upper_bounds)))

    df['PredictionAAO'] = np.zeros(len(df))
    df['ObservedOutcome'] = new_value(df['myself'], df['redistributionRate'])

    for k in range(len(df['redistributionRate'])):
        Utility = np.zeros(len(Choices))

        for n in range(len(Choices)):
            Utility[n] = utility(result.x, phiPerCondition[j],
                                equity(new_value(df.iloc[k, 0:9], Choices[n]), df.iloc[k, 0:9], Choices[n]),
                                equality(new_value(df.iloc[k, 0:10], Choices[n]), df.iloc[k, 0:9], Choices[n]),
                                payout(new_value(df.iloc[k, 0], Choices[n]), df.iloc[k, 0], Choices[n]))

        correct_choice = np.argmax(Utility)
        df.at[k, 'PredictionAAO'] = new_value(df['myself'][k], Choices[correct_choice[np.random.randint(len(correct_choice))]])

    subjectData['SSAAO'][i] = np.sum((df['PredictionAAO'] - df['ObservedOutcome'])**2)

N = len(df)
k = 2

subjectData['AICAAO'] = N * np.log(subjectData['SSAAO'] / N) + 2 * 2
subjectData['SSCond'] = subjectData['SSMerit'] + subjectData['SSEntitlement'] + subjectData['SSCorruption'] + subjectData['SSLuck']
subjectData['AICCond'] = N * np.log(subjectData['SSCond'] / N) + 2 * 2 * len(conditions)

excluded = np.where(subjectData['SSCond'] == 0)[0]
ttest_rel(subjectData['AICCond'][~excluded], subjectData['AICAAO'], alternative='less', nan_policy='omit')
Testing a Modulatory Hypothesis
omnibusData = data.frame(Theta = c(subjectData$thetaMerit, subjectData$thetaEntitlement, subjectData$thetaCorruption, subjectData$thetaLuck),
                         Phi = c(subjectData$phiMeritAdjusted, subjectData$phiEntitlementAdjusted, subjectData$phiCorruptionAdjusted, subjectData$phiLuckAdjusted),
                         Condition = rep(conditions, each = length(subjectData$SubjectID)),
                         SubjectID = rep(subjectData$SubjectID, times = length(conditions)))

library(lme4)

omnibusModulatoryEffectTheta = lmer(data = omnibusData, Theta ~ Condition + (1 | SubjectID))
omnibusModulatoryEffectPhi = lmer(data = omnibusData, Phi ~ Condition + (1 | SubjectID))
Theta = [subjectData.thetaMerit; subjectData.thetaEntitlement; subjectData.thetaCorruption; subjectData.thetaLuck];
Phi = [subjectData.phiMeritAdjusted; subjectData.phiEntitlementAdjusted; subjectData.phiCorruptionAdjusted; subjectData.phiLuckAdjusted];
Conditions = repmat(conditions', length(subjectData.SubjectID), 1);
SubjectID = repmat(subjectData.SubjectID, length(conditions), 1);

omnibusData = table(Theta, Phi, Conditions, SubjectID);

omnibusModulatoryEffectTheta = fitlme(omnibusData, 'Theta ~ Condition + (1|SubjectID)');
omnibusModulatoryEffectPhi = fitlme(omnibusData, 'Phi ~ Condition + (1|SubjectID)');
# Requires 'statsmodels' - you can ``pip install statsmodels u``

import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf

# Assuming omnibusData is a pandas DataFrame
omnibusData = pd.DataFrame({
    'Theta': np.concatenate([subjectData['thetaMerit'], subjectData['thetaEntitlement'], subjectData['thetaCorruption'], subjectData['thetaLuck']]),
    'Phi': np.concatenate([subjectData['phiMeritAdjusted'], subjectData['phiEntitlementAdjusted'], subjectData['phiCorruptionAdjusted'], subjectData['phiLuckAdjusted']]),
    'Condition': np.tile(conditions, len(subjectData['SubjectID'])),
    'SubjectID': np.repeat(subjectData['SubjectID'], len(conditions))
})

# Fit mixed-effects models
omnibusModulatoryEffectTheta = smf.mixedlm("Theta ~ Condition", omnibusData, groups=omnibusData["SubjectID"]).fit()
omnibusModulatoryEffectPhi = smf.mixedlm("Phi ~ Condition", omnibusData, groups=omnibusData["SubjectID"]).fit()

# Print model summaries
print(omnibusModulatoryEffectTheta.summary())
print(omnibusModulatoryEffectPhi.summary())
Using Categorical Clusters to Test Modulatory Hypotheses
subjectData$strategyMerit = as.factor(subjectData$strategyMerit)
subjectData$strategyEntitlement = as.factor(subjectData$strategyEntitlement)
subjectData$strategyCorruption = as.factor(subjectData$strategyCorruption)
subjectData$strategyLuck = as.factor(subjectData$strategyLuck)
strategies = levels(subjectData)

columns = 10:13 #columns of subject data correspond, in order, to the conditions in the conditions vector

for (i in 1:length(conditions)){
    for (j in 1:length(strategies)){
        group_by_condition[i, j] = sum(subjectData[, columns[i]] == strategies[j])
    }
}

colnames(group_by_condition) = strategies
rownames(group_by_condition) = conditions
chisq.test(group_by_condition)
% Assuming subjectData is a table with columns strategyMerit, strategyEntitlement, strategyCorruption, strategyLuck,
% and conditions is a cell array containing condition names

subjectData.strategyMerit = categorical(subjectData.strategyMerit);
subjectData.strategyEntitlement = categorical(subjectData.strategyEntitlement);
subjectData.strategyCorruption = categorical(subjectData.strategyCorruption);
subjectData.strategyLuck = categorical(subjectData.strategyLuck);

strategies = categories(subjectData.strategyMerit);  % Assuming all strategy columns have the same set of levels

columns = 10:13;  % columns of subject data correspond, in order, to the conditions in the conditions vector

group_by_condition = zeros(length(conditions), length(strategies));

for i = 1:length(conditions)
    for j = 1:length(strategies)
        group_by_condition(i, j) = sum(subjectData{:, columns(i)} == strategies{j});
    end
end

group_by_condition_table = array2table(group_by_condition, 'VariableNames', strategies, 'RowNames', conditions);

[p, chi2stat, ~, ~] = chi2test(group_by_condition);
from scipy.stats import chi2_contingency

# Assuming subjectData is a pandas DataFrame
subjectData['strategyMerit'] = subjectData['strategyMerit'].astype('category')
subjectData['strategyEntitlement'] = subjectData['strategyEntitlement'].astype('category')
subjectData['strategyCorruption'] = subjectData['strategyCorruption'].astype('category')
subjectData['strategyLuck'] = subjectData['strategyLuck'].astype('category')

strategies = subjectData['strategyMerit'].cat.categories  # Assuming all strategy columns have the same set of levels

columns = list(range(9, 13))  # columns of subject data correspond, in order, to the conditions in the conditions vector

group_by_condition = np.zeros((len(conditions), len(strategies)))

for i, condition in enumerate(conditions):
    for j, strategy in enumerate(strategies):
        group_by_condition[i, j] = np.sum(subjectData.iloc[:, columns[i]] == strategy)

group_by_condition = pd.DataFrame(group_by_condition, index=conditions, columns=strategies)

chi2, p, _, _ = chi2_contingency(group_by_condition)

print("Chi-squared statistic:", chi2)
print("P-value:", p)

Tutorial 3 - Crockett et al., 2014

Testing a Modulatory Hypothesis
t.test(as.numeric(subjectData$Kappa_Self[-20]), as.numeric(subjectData$Kappa_Other[-20]), paired = T)
mean(as.numeric(subjectData$Kappa_Self[-20]))
mean(as.numeric(subjectData$Kappa_Other[-20]))

Tutorial 4 - Li et al., 2022

Not Applicable to this Data.