Validate the Best Model
*************

Lesson
================

.. article-info::
    :avatar: UCLA_Suit.jpg
    :avatar-link: https://www.decisionneurosciencelab.com/elijahgalvan
    :author: Elijah Galvan
    :date: September 1, 2023
    :read-time: 17 min read
    :class-container: sd-p-2 sd-outline-muted sd-rounded-1

Goal During this Stage
---------------

We want to see if the model is actually valid: 
there are two things that we need to prove in order for our model to be valid: first, we have to show that our model accurately captures the data generation process and, second, we have to show that our parameter recovery process is robust.

How to Achieve this Goal
------------

.. dropdown:: Assessing Accuracy in Capturing the Data Generation Process

    In order to show that our model accurately captures the data generation process, we have to show that the model predicts behavior equally well at all values. 
    Essentially, we must test four assumptions about how our model predictions relate to the behavioral data that they were trained upon: linearity, normality of error, independence of error, and homoscedasticty. 
    These are also four of the five assumptions of linear regression (the fifth is the independence of :bdg-primary:`Independent Variables`, but we don't rely on this assumption).

    .. Note:: 

        The assumptions of our model are (with one exception) the assumptions of linear regression, but not the assumptions of mixed effects regression despite the fact that we obviously are using repeated measures - why is this?

        Well, the answer is essentially the same answer as why this fifth assumption - the independence of :bdg-primary:`Independent Variables` - is not included. 
        According to our model, all of the ways that :bdg-success:`Subjects` can be different are encapsulated in the :bdg-success:`Free Parameters`. 
        Another way to phrase the assumption of the independence of :bdg-primary:`Independent Variables` is the independence of observations. 
        Linear modeling requires that observations are not related, so if :bdg-primary:`Independent Variables` are collinear, then the :bdg-danger:`Dependent Variable` are predicted by both :bdg-primary:`Independent Variables` which means that the observations are dependent upon each other. 
        Thus since we don't know what actually causes the :bdg-danger:`Dependent Variable` to change neither :bdg-primary:`Independent Variable` can be used to predict the variance explained by both. 
        In a similar vein, linear mixed effects models control for the independence of observations (i.e. produced by the same :bdg-success:`Subject`) using random effects. 

        See, where we're going here? 
        Since our model asserts that all of the ways that :bdg-success:`Subjects` can be different are encapsulated in the :bdg-success:`Free Parameters`, we essentially assert that the way our model predicts :bdg-danger:`Decisions` does not differ across :bdg-success:`Subjects`. 
        We have one :bdg-primary:`Independent Variable` - the model predicted :bdg-danger:`Decisions`. 
        According to our model, *nothing else* can predict :bdg-danger:`Decisions`. 
        However, we're not going to blindly assume this - we'll also verify that this is true.

    .. dropdown:: Model Performance

        .. tab-set::

            .. tab-item:: Plain English

                Before we start checking assumptions, we need to do a simple regression: using the predicted :bdg-danger:`Decisions` of our best model to predict actual :bdg-danger:`Decisions`.
                So, we will have to use the trial-by-trial data to accomplish this.

                What we want to see first is the R-squared: the multiple R-squared should always be essentially equal to the adjusted R-squared since there are going to be a lot of observations. 
                The R-squared tells us how much variance our model explains: any model with an R-squared above 0.70 is very good. 
                Particularly for models without noise parameters, it means that you are really capturing the data generation process and essentially rules out the possibility that you are missing an additionally strategy or additional motive. 
                As long as you don't find any assumptions to be grotesquely violated, you should proceed and feel very confident that your model is a good representation of the data generation process. 

                An R-squared between 0.5 and 0.7 is acceptable. 
                Although this good model performance by objective standards, these tasks are designed to elicit very well-defined, consistent preferences with a very high signal to noise ratio. 
                You might be missing an additional strategy in your model - so use the steps below and see if this is the case!
                If not, you should proceed with apprehension but it does not immediately invalidate any conclusions that you want to draw. 

                An R-squared of less than 0.5 is bad. 
                Either your task-and-experimental procedure was 1) poorly thought out, producing too high of a signal-to-noise ratio, or 2) your model does not account for one or more preferences that govern behavior in your task.
                Ask yourself what the cause is: if it is a problem with the task either consider adding noise or bias parameters, find a different kind of model to analyze your data, or otherwise throw the data out because you cannot analyze it with a utility model. 
                If it is the latter then identify :bdg-success:`Subjects` who are fit particularly poorly by the model, identify the behavioral trend, and try to think about a value-based preference that could lead one to behave in such a way. 
                We can try this on all of these :bdg-success:`Subjects` - if this doesn't offer clear insight then we might want to look at individual subjects.

            .. tab-item:: R

                ::
                    
                    modelPredictions = lm(data = trialData, Decisions ~ Prediction)
                    summary(modelPredictions) # R-squared

                    #we can identify if we're missing a strategy using a density plot of MFIs - a plot with a group of really high AICs with a group of lower AICs would suggest a missing strategy
                    qplot(x = subjectData$modelAIC, geom = 'density')

                    #if we're missing something let's identify worst explained quartile of subjects according to our model            
                    worstExplained = which(subjectData$modelAIC > as.numeric(summary(subjectData$modelAIC)[5]))
                    qplot(data = trialData[which(trialData$SubjectID == subjectData$SubjectID[worstExplained]), x = IV, y = DV, group = trialData$SubjectID]) + geom_smooth() #a loess line for all subjects

            .. tab-item:: MatLab

                ::

                    modelPredictions = fitlm(trialData, 'Decisions ~ Prediction');
                    summary(modelPredictions); % R-squared

                    % We can identify if we're missing a strategy using a density plot of MFIs - a plot with a group of really high AICs with a group of lower AICs would suggest a missing strategy
                    figure;
                    ksdensity(subjectData.modelAIC);

                    % If we're missing something, let's identify the worst-explained quartile of subjects according to our model            
                    worstExplained = find(subjectData.modelAIC > str2double(summary(subjectData.modelAIC).Variables{5, 1}));
                    figure;
                    for i = 1:length(worstExplained)
                        subjectIndex = worstExplained(i);
                        subplot(length(worstExplained), 1, i);
                        scatter(trialData.IV(trialData.SubjectID == subjectData.SubjectID(subjectIndex)), trialData.DV(trialData.SubjectID == subjectData.SubjectID(subjectIndex)), 'filled');
                        hold on;
                        lsline;
                        hold off;
                    end


            .. tab-item:: Python

                ::

                    import statsmodels.api as sm
                    import seaborn as sns
                    import matplotlib.pyplot as plt

                    modelPredictions = sm.OLS(trialData['Decisions'], sm.add_constant(trialData['Prediction'])).fit()
                    print(modelPredictions.summary())  # R-squared

                    # We can identify if we're missing a strategy using a density plot of MFIs - a plot with a group of really high AICs with a group of lower AICs would suggest a missing strategy
                    sns.kdeplot(subjectData['modelAIC'])

                    # If we're missing something, let's identify the worst-explained quartile of subjects according to our model
                    worstExplained = subjectData[subjectData['modelAIC'] > float(subjectData['modelAIC'].describe()['75%'])].index
                    plt.figure(figsize=(8, 6))
                    for i, subjectIndex in enumerate(worstExplained):
                        plt.subplot(len(worstExplained), 1, i + 1)
                        sns.scatterplot(x=trialData.loc[trialData['SubjectID'] == subjectData.loc[subjectIndex, 'SubjectID'], 'IV'],
                                        y=trialData.loc[trialData['SubjectID'] == subjectData.loc[subjectIndex, 'SubjectID'], 'DV'], 
                                        hue=trialData['SubjectID'])
                        sns.regplot(x=trialData.loc[trialData['SubjectID'] == subjectData.loc[subjectIndex, 'SubjectID'], 'IV'],
                                    y=trialData.loc[trialData['SubjectID'] == subjectData.loc[subjectIndex, 'SubjectID'], 'DV'], 
                                    scatter=False)
                    plt.show()


    .. dropdown:: Visually Checking Assumptions

        .. dropdown:: Linearity

            .. tab-set::

                .. tab-item:: Plain English

                    We need to ensure that the relationship between model predictions of :bdg-danger:`Decisions` and observed :bdg-danger:`Decisions` is linear. 
                    If this relationship is flat (slope is 0) rather than linear (slope is 1) then our model is doing terribly at predicting :bdg-danger:`Decisions`, essentially making predictions completely at chance-level.
                    Thus our model would not be capturing the data generation process and, most likely, our :bdg-success:`Free Parameters` are fitted to noise at all values.
                    So let's plot a regression line with model predictions on the x-axis and observed :bdg-danger:`Decisions` on the y-axis. 
                    The slope should be essentially 1.

                .. tab-item:: R

                    ::
                        
                        qplot(x = trialData$Prediction, y = trialData$Decision, geom = 'smooth') + 
                            geom_abline(slope = 1, intercept = 0)
                        
                .. tab-item:: MatLab

                    ::

                        scatter(trialData.Prediction, trialData.Decision);
                        hold on;
                        loess_smooth = fitloess(trialData.Prediction, trialData.Decision);
                        plot(loess_smooth);
                        plot(trialData.Prediction, trialData.Prediction, 'k--');
                        hold off;


                .. tab-item:: Python

                    ::

                        plt.scatter(trialData['Prediction'], trialData['Decision'])
                        lowess = sm.nonparametric.lowess(trialData['Decision'], trialData['Prediction'])
                        plt.plot(lowess[:, 0], lowess[:, 1], 'r-')
                        plt.plot(trialData['Prediction'], trialData['Prediction'], 'k--')
                        plt.show()


        .. dropdown:: Normality of Error

            .. tab-set::

                .. tab-item:: Plain English

                    We need to ensure that prediction errors are normally distributed. 
                    If prediction errors are skewed, then our model is making making more underpredictions (negative) or overpredictions (positive) of :bdg-danger:`Decisions`.
                    If the distribution is kurtosed (i.e. too skinny or too fat) then we're more or less likely to have extreme errors in predicting :bdg-danger:`Decisons` compared to a normal distribution. 
                    We can create a density plot of prediction errors (i.e. the difference between model predictions of :bdg-danger:`Decisions` and observed :bdg-danger:`Decisions`) and see if it follows a bell-curve.
                    We can create a density plot with a normal distribution where the standard deviation is the standard deviation of prediction errors to check this.

                .. tab-item:: R

                    ::
                        
                        normvals = rnorm(1000, mean = 0, sd = sd(trialData$Prediction - trialData$Decision))
                        qplot(x = trialData$Prediction - trialData$Decision, geom = 'density', bw = 1, color = 'Actual') + 
                            geom_density(aes(x = normvals, color = 'Predicted'), bw = 1)
                        
                .. tab-item:: MatLab

                    ::

                        rng('default'); % Set random number generator seed for reproducibility
                        normvals = normrnd(0, std(trialData.Prediction - trialData.Decision), 1000, 1);

                        figure;
                        hold on;
                        histogram(trialData.Prediction - trialData.Decision, 'Normalization', 'pdf', 'EdgeColor', 'b', 'FaceColor', 'none');
                        histogram(normvals, 'Normalization', 'pdf', 'EdgeColor', 'r', 'FaceColor', 'none');
                        legend('Actual', 'Predicted');


                .. tab-item:: Python

                    ::

                        np.random.seed(0)  # Set random number generator seed for reproducibility
                        normvals = np.random.normal(0, np.std(trialData['Prediction'] - trialData['Decision']), 1000)

                        plt.hist(trialData['Prediction'] - trialData['Decision'], bins='auto', density=True, color='b', edgecolor='b', alpha=0.5)
                        plt.hist(normvals, bins='auto', density=True, color='r', edgecolor='r', alpha=0.5)
                        plt.legend(['Actual', 'Predicted'])
                        plt.show()


        .. dropdown:: Independence of Error

            .. tab-set::

                .. tab-item:: Plain English

                    Next, we need to ensure that our model's prediction errors of :bdg-danger:`Decisions` are not confounded with values of :bdg-primary:`Independent Variable`. 
                    If our model predicts :bdg-danger:`Decisions` worse or better at certain values of :bdg-primary:`Independent Variable` compared to others, then our recovery of :bdg-success:`Free Parameters` is influenced disproportionately by :bdg-danger:`Decisions` made when :bdg-primary:`Independent Variable` have a certain value. 
                    Thus, our :bdg-success:`Free Parameters` would be overfit for certain values and underfit for other values. 
                    We can check this assumption by creating loess line with our :bdg-primary:`Independent Variable` on the x-axis and the model prediction errors on the y-axis.
                    This slope should be 0 with an intercept of 0.

                .. tab-item:: R

                    ::
                        
                        qplot(x = trialData$IV, y = (trialData$Prediction-trialData$Decision), geom = 'smooth')
                        
                .. tab-item:: MatLab

                    ::

                        x = trialData.IV;
                        y = trialData.Prediction - trialData.Decision;
                        loess = fit(x, y, 'loess');
                        scatter(x, y);
                        hold on;
                        plot(loess, x, y);
                        hold off;

                .. tab-item:: Python

                    ::

                        x = trialData['IV']
                        y = trialData['Prediction'] - trialData['Decision']

                        lowess = sm.nonparametric.lowess(y, x, frac=0.25)  # Adjust the frac parameter as needed
                        plt.scatter(x, y)
                        plt.plot(lowess[:, 0], lowess[:, 1], 'r-')
                        plt.show()

        .. dropdown:: Homoscedasticty

            .. tab-set::

                .. tab-item:: Plain English

                    Finally, we need to ensure that variance in model prediction errors does not change as a function of an :bdg-primary:`Independent Variable`. 
                    Of all of the assumptions, this one is the least problematic if violated - essentially it indicates that our model has less predictive accuracy at certain values of the :bdg-primary:`Independent Variable`. 
                    This is a bigger issue for regression models because it can make Confidence Intervals of regression coefficients which are too narrow - we don't have this issue unless you're doing means testing on your :bdg-success:`Free Parameters` (i.e. testing modulatory hypotheses). 
                    Nonetheless, this could produce unreliable :bdg-success:`Free Parameter` estimates so we have to rely on out-of-sample validation to rule this out.
                    If this assumption is very badly violated, it might make sense to included a noise parameter into your model to scale with the :bdg-primary:`Independent Variable` or to, instead, use a different estimator such as Weighted Least Squares to recover your :bdg-success:`Free Parameters`.
                    To check this, we can create a loess line with a variance cloud with our :bdg-primary:`Independent Variable` on the x-axis and the model prediction errors on the y-axis.
                    The cloud should be a constant width around the loess line. 

                .. tab-item:: R

                    ::
                        
                        qplot(x = trialData$IV, y = (trialData$Prediction-trialData$Decision), geom = 'smooth')
                        
                .. tab-item:: MatLab

                    ::

                        x = trialData.IV;
                        y = trialData.Prediction - trialData.Decision;
                        loessSmoothing = fit(x, y, 'loess');
                        plot(loessSmoothing, x, y);
                        hold on;
                        plot(x, y, 'o', 'MarkerEdgeColor', 'b', 'MarkerFaceColor', 'b', 'MarkerSize', 5);
                        legend('LOESS Smoothed Line', 'Data Points');
                        xlabel('IV');
                        ylabel('Prediction - Decision');
                        hold off;

                .. tab-item:: Python

                    ::
                        x = trialData['IV']
                        y = trialData['Prediction'] - trialData['Decision']

                        lowess = sm.nonparametric.lowess(y, x, frac=0.3)  # Adjust frac as needed
                        smoothed_x, smoothed_y = lowess.T

                        plt.plot(smoothed_x, smoothed_y, label='LOESS Smoothed Line')
                        plt.scatter(x, y, marker='o', edgecolors='b', facecolors='b', s=50, label='Data Points')
                        plt.legend()
                        plt.xlabel('IV')
                        plt.ylabel('Prediction - Decision')
                        plt.show()

    .. dropdown:: Assessing Independence of Observations

        We want to ensure that accounting for :bdg-success:`Subjects`' differences from each other using :bdg-success:`Free Parameters` results in model predictions of :bdg-danger:`Decisions` which are not attributable to individual differences.
        We can accomplish this using the same linear modeling formula, but including random intercepts for the subject. 

        .. tab-set::

            .. tab-item:: Plain English

                First, we can try a model with a random intercept and random slope of model predicted value. 
                This model might have convergence issues or singularity issues - if so, great! Otherwise, don't worry just go to the next model. 

                Next, we can try a model with a only a random intercept. 
                We can estimate an R-squared for this model - the marginal R-squared (i.e. including variance explained by random effects) should be roughly equivalent to the conditional R-squared (i.e. variance only explained by fixed effects - i.e. your model) - within 0.05 is reasonable.

            .. tab-item:: R

                ::
                    
                    ris_model = lmer(data = trialData, Decision ~ Prediction + (1 + Prediction | SubjectID)) 
                    summary(ris_model) #model should have issues
                    ri_model = lmer(data = trialData, Decision ~ Prediction + (1 | SubjectID))
                    summary(ri_model)
                    library(MuMin)
                    r.squaredGLMM(ri_model) #if conditional Rsq is between 0 and 0.05 lower than the multiple Rsq, that's good enough

            .. tab-item:: MatLab

                ::

                    ris_model = fitlme(trialData, 'Decision ~ Prediction + (1 + Prediction | SubjectID)');
                    disp(ris_model);
                    ri_model = fitlme(trialData, 'Decision ~ Prediction + (1 | SubjectID)');
                    disp(ri_model);
                    disp(ri_model.Rsquared); #if conditional Rsq is between 0 and 0.05 lower than the multiple Rsq, that's good enough

            .. tab-item:: Python

                ::

                    import statsmodels.api as sm
                    import r2glmm

                    ris_model = sm.MixedLM.from_formula('Decision ~ Prediction + (1 + Prediction | SubjectID)', data=trialData)
                    result_ris = ris_model.fit()
                    print(result_ris.summary())

                    ri_model = sm.MixedLM.from_formula('Decision ~ Prediction + (1 | SubjectID)', data=trialData)
                    result_ri = ri_model.fit()
                    print(result_ri.summary())

                    r_squared = r2glmm.get_r2(result_ri)
                    print(r_squared) #if conditional Rsq is between 0 and 0.05 lower than the multiple Rsq, that's good enough


.. dropdown:: Validating Parameter Recovery Process

    In order to show that our parameter recovery process is robust, we have to show that the model can predict behavior that it was not trained on. 
    We accomplish this by performing Fivefold Validation.

    .. dropdown:: Fivefold Validation

        .. tab-set::

            .. tab-item:: Plain English

                We essentially want to prove that our :bdg-success:`Free Parameters` are not overfitting the :bdg-danger:`Decisions` that they are training on. 
                In other words, we want to rule out that our favored model isn't outperforming other models because it's fitting weird quirks in :bdg-success:`Subjects`' :bdg-danger:`Decisions`. 
                It should seem really intuitive that, in order to prove this, we're going to separate :bdg-danger:`Decisions` into a training set and a testing set. 
                Let's talk about how specifically we're going to do this. 

                So, we're going to randomly split each :bdg-success:`Subjects`' :bdg-danger:`Decisions` into one of five groups, called folds. 
                We're going to take one fold and remove those :bdg-primary:`Trials` from our training set - we're going to recover :bdg-success:`Free Parameters` from four-fifths of :bdg-danger:`Decisions` and we're going to test it on this fold - this fifth of :bdg-danger:`Decisions` that we excluded. 
                We're going to save the :bdg-success:`Free Parameters` from this four-fifths and the predicted :bdg-danger:`Decisions` of the withheld one-fifth. 
                We rinse and repeat for the remaining four folds. 
                Once this is done, we can get the model error for the predicted-against-observed :bdg-danger:`Decisions` for all :bdg-primary:`Trials`.
                Now we should have five sets of :bdg-success:`Free Parameters` for each :bdg-success:`Subject` and, combining the model error of all five folds, we should have a model error estimate across all :bdg-primary:`Trials`.

                Now, we want to look at two things: first we want to assess how much larger this model error is compared to the model error of the model trained on all of the data. 
                Here, you should report the change in root mean squared error per trial - you can also do a paired t-test on the MFI of the fivefold predictions compared to the standard model predictions but this is not necessary.
                Second, we also want to assess the similarity of the :bdg-success:`Free Parameters` we recovered withholding each of the five folds to the :bdg-success:`Free Parameters` recovered on the entire data set. 
                You should report the average cosine similarity across all folds for each of the :bdg-success:`Free Parameters`.

            .. tab-item:: R

                ::

                    fivefold = data.frame() #preallocate for parameters and errors from the fivefold validation to go into

                    for (i in 1:length(included_subjects)){
                        datafile = paste(parentfolder, included_subjects[i], restoffilepath, sep = '') # produces a character vector 'parentfolder/included_subjects[i]**.filetype'
                        df = read.csv2(datafile)

                        order = sample(20)
                        Parameter1_ff = vector('numeric', length = 5)
                        Parameter2_ff = vector('numeric', length = 5)
                        SS_ff = 0
                        Prediction_ff = vector('numeric', length(df$Decision))
                        for (z in 1:5){
                            j = (z - 1) * 4 + 1
                            n = z * 4
                            withheld = order[j:n]
                            m = ((i - 1) * 5) + z
                            
                            result_ff = fmincon(obj_function,x0 = initial_params, A = NULL, b = NULL, Aeq = NULL, beq = NULL,
                                                lb = lower_bounds, ub = upper_bounds,
                                                df = df[-withheld,])
                            
                            Parameter1_ff[m] = result_ff$par[1]
                            Parameter2_ff[m] = result_ff$par[2]
                            for (n in 1:length(withheld)){
                                utility_choices = vector('numeric', length(Choices))
                                for (q in 1:length(Choices)){
                                    utility_choices[q] = utility(result_ff$par[1], result_ff$par[2],
                                                        construct1(df$IV[withheld[n]], df$Constant[withheld[n]], Choices[q]),
                                                        construct2(df$IV[withheld[n]], df$Constant[withheld[n]], Choices[q]),
                                                        construct3(df$IV[withheld[n]], df$Constant[withheld[n]], Choices[q]))
                                }
                                Prediction_ff[withheld[n]] = Choices[which(utility_choices == max(utility_choices))]
                            }
                        }
                        SS_ff = sum((df$Decision - Prediction_ff)**2)
                        fivefold[i, 1:11] = c(SS_ff, Parameter1_ff, Parameter2_ff)
                    }
                    colnames(fivefold) = c('SS', 'Par1_fold1', 'Par1_fold2', 'Par1_fold3', 'Par1_fold4', 'Par1_fold5', 
                                        'Par2_fold1', 'Par2_fold2', 'Par2_fold3', 'Par2_fold4', 'Par2_fold5', SubjectID)

                    sqrt(mean(fivefold$SS)/length(df$IV)) - sqrt(mean(subjectData$modelSS)/length(df$IV)) #the change in root mean squared error, per trial
                    fivefold$AIC = length(df$IV) * log(fivefold$SS/length(df$IV)) + 2 * 2
                    t.test(fivefold$AIC, subjectData$AIC, paired = T) #test fivefold MFI against normal MFI for this model

                    library(lsa)
                    cosines = vector('numeric', length = 10)
                    for (i in 1:5){
                        cosines[i] = cosine(subjectData$Parameter1, fivefold[, (i + 1)]) #to get the correct columns in the fivefold dataframe (2-6)
                        cosines[(i+5)] = cosine(subjectData$Parameter1, fivefold[, (i + 6)]) #to get the correct columns in the fivefold dataframe (7-11)
                    }

                    mean(cosines[1:5]) #cosine similarity of parameter 1
                    mean(cosines[6:10]) #cosine similarity of parameter 2

            .. tab-item:: MatLab

                ::

                    fivefold = [];

                    for i = 1:length(included_subjects)
                        datafile = strcat(parentfolder, included_subjects{i}, restoffilepath);
                        df = readtable(datafile);

                        order = randperm(20);
                        Parameter1_ff = zeros(1, 5);
                        Parameter2_ff = zeros(1, 5);
                        SS_ff = 0;
                        Prediction_ff = zeros(1, length(df.Decision));
                        
                        for z = 1:5
                            j = (z - 1) * 4 + 1;
                            n = z * 4;
                            withheld = order(j:n);
                            m = ((i - 1) * 5) + z;

                            options = optimset('fmincon');
                            options.Display = 'off';
                            result_ff = fmincon(@(params) obj_function(params, df(~ismember(1:size(df, 1), withheld), :)), initial_params, [], [], [], [], lower_bounds, upper_bounds, [], options);
                            
                            Parameter1_ff(m) = result_ff(1);
                            Parameter2_ff(m) = result_ff(2);
                            
                            for n = 1:length(withheld)
                                utility_choices = zeros(1, length(Choices));
                                for q = 1:length(Choices)
                                    utility_choices(q) = utility(result_ff(1), result_ff(2), ...
                                                                 construct1(df.IV(withheld(n)), df.Constant(withheld(n)), Choices(q)), ...
                                                                 construct2(df.IV(withheld(n)), df.Constant(withheld(n)), Choices(q)), ...
                                                                 construct3(df.IV(withheld(n)), df.Constant(withheld(n)), Choices(q)));
                                end
                                [~, max_utility_idx] = max(utility_choices);
                                Prediction_ff(withheld(n)) = Choices(max_utility_idx);
                            end
                        end
                        
                        SS_ff = sum((df.Decision - Prediction_ff).^2);
                        fivefold(i, 1:11) = [SS_ff, Parameter1_ff, Parameter2_ff];
                    end

                    fivefold.Properties.VariableNames = {'SS', 'Par1_fold1', 'Par1_fold2', 'Par1_fold3', 'Par1_fold4', 'Par1_fold5', ...
                        'Par2_fold1', 'Par2_fold2', 'Par2_fold3', 'Par2_fold4', 'Par2_fold5'};

                    root_mean_squared_error_change = sqrt(mean(fivefold.SS) / length(df.IV)) - sqrt(mean(subjectData.modelSS) / length(df.IV));
                    fivefold.AIC = length(df.IV) * log(fivefold.SS / length(df.IV)) + 2 * 2;
                    ttest_result = ttest(fivefold.AIC, subjectData.AIC, 'paired', true);

                    cosines = zeros(1, 10);
                    for i = 1:5
                        cosines(i) = cosine(subjectData.Parameter1, fivefold{:, i + 1});
                        cosines(i + 5) = cosine(subjectData.Parameter1, fivefold{:, i + 6});
                    end

                    mean_cosine_parameter1 = mean(cosines(1:5));
                    mean_cosine_parameter2 = mean(cosines(6:10));


            .. tab-item:: Python
                
                ::

                    from scipy.stats import ttest_rel
                    from lsa.cosine import cosine

                    fivefold = pd.DataFrame()

                    for i in range(len(included_subjects)):
                        datafile = parentfolder + included_subjects[i] + restoffilepath
                        df = pd.read_csv(datafile, sep=';')

                        order = np.random.permutation(20)
                        Parameter1_ff = np.zeros(5)
                        Parameter2_ff = np.zeros(5)
                        SS_ff = 0
                        Prediction_ff = np.zeros(len(df['Decision']))

                        for z in range(1, 6):
                            j = (z - 1) * 4
                            n = z * 4
                            withheld = order[j:n]
                            m = ((i - 1) * 5) + z

                            result_ff = fmincon(obj_function_ff, x0=initial_params, args=( df[~df.index.isin(withheld)]),
                                                bounds=(lower_bounds, upper_bounds))

                            Parameter1_ff[m - 1] = result_ff[0]
                            Parameter2_ff[m - 1] = result_ff[1]

                            for n in withheld:
                                utility_choices = np.zeros(len(Choices))
                                for q in range(len(Choices)):
                                    utility_choices[q] = utility(result_ff[0], result_ff[1],
                                                                 construct1(df['IV'][n], df['Constant'][n], Choices[q]),
                                                                 construct2(df['IV'][n], df['Constant'][n], Choices[q]),
                                                                 construct3(df['IV'][n], df['Constant'][n], Choices[q]))
                                max_utility_idx = np.argmax(utility_choices)
                                Prediction_ff[n] = Choices[max_utility_idx]

                        SS_ff = sum((df['Decision'] - Prediction_ff)**2)
                        fivefold.loc[i, 0:10] = [SS_ff] + list(Parameter1_ff) + list(Parameter2_ff)

                    fivefold.columns = ['SS', 'Par1_fold1', 'Par1_fold2', 'Par1_fold3', 'Par1_fold4', 'Par1_fold5',
                                        'Par2_fold1', 'Par2_fold2', 'Par2_fold3', 'Par2_fold4', 'Par2_fold5']

                    root_mean_squared_error_change = np.sqrt(fivefold['SS'].mean() / len(df['IV'])) - np.sqrt(subjectData['modelSS'].mean() / len(df['IV']))
                    fivefold['AIC'] = len(df['IV']) * np.log(fivefold['SS'] / len(df['IV'])) + 2 * 2
                    ttest_result = ttest_rel(fivefold['AIC'], subjectData['AIC'])

                    cosines = np.zeros(10)
                    for i in range(5):
                        cosines[i] = cosine(subjectData['Parameter1'], fivefold.iloc[:, i + 1])
                        cosines[i + 5] = cosine(subjectData['Parameter1'], fivefold.iloc[:, i + 6])

                    mean_cosine_parameter1 = np.mean(cosines[0:5])
                    mean_cosine_parameter2 = np.mean(cosines[5:10])

Tutorials
==========

Tutorial 1 - van Baar, Chang, & Sanfey, 2019
----------------------

.. dropdown:: Assessing Accuracy in Capturing the Data Generation Process

    .. tab-set::

        .. tab-item:: R

            ::

                ### Model Performance

                modelPredictions = lm(data = trialData, as.numeric(Returned) ~ Prediction)
                summary(modelPredictions) # R-squared

                ### Visually Checking Assumptions

                qplot(x = trialData$Prediction, y = as.numeric(trialData$Returned), geom = 'smooth') + 
                geom_abline(slope = 1, intercept = 0) + labs(x = 'Prediction', y = 'Observed') + lims(x = c(0, 30), y = c(0, 30)) # linearity

                normvals = rnorm(1000, mean = 0, sd = sd(trialData$Prediction - as.numeric(trialData$Returned)))
                qplot(x = trialData$Prediction - as.numeric(trialData$Returned), geom = 'density', bw = 1, color = 'Actual') + 
                geom_density(aes(x = normvals, color = 'Predicted'), bw = 1) + labs(x = 'Prediction - Observed', y = 'Density') #normality

                qplot(x = trialData$Investment, y = (trialData$Prediction-as.numeric(trialData$Returned)), group = as.factor(trialData$Multiplier), color = as.factor(trialData$Multiplier), geom = 'smooth')  + 
                labs(x = 'Investment', y = 'Prediction - Observed', color = 'Multiplier') #independence of error and homoscedasticity

                ### Assessing Independence of Observations

                library(lme4)
                library(MuMIn)

                ris_model = lmer(data = trialData, as.numeric(Returned) ~ Prediction + (1 + Prediction | Subject))
                r.squaredGLMM(ris_model)

                ri_model = lmer(data = trialData, as.numeric(Returned) ~ Prediction + (1 | Subject))
                r.squaredGLMM(ri_model)

                ric_model = lmer(data = trialData, as.numeric(Returned) ~ Prediction + (1 | Subject))
                r.squaredGLMM(ric_model)

        .. tab-item:: MatLab

            ::

                % Model Performance
                modelPredictions = fitlm(trialData, 'Returned ~ Prediction');
                disp('R-squared:');
                disp(modelPredictions.Rsquared.Ordinary);

                % Visually Checking Assumptions
                figure;

                % Linearity
                subplot(2, 2, 1);
                scatter(trialData.Prediction, trialData.Returned, 'filled');
                hold on;
                refline(1, 0);
                xlabel('Prediction');
                ylabel('Observed');
                xlim([0, 30]);
                ylim([0, 30]);
                title('Linearity');

                % Normality
                subplot(2, 2, 2);
                diffResiduals = modelPredictions.Residuals.Raw - trialData.Returned;
                histogram(diffResiduals, 'Normalization', 'pdf', 'EdgeColor', 'none');
                hold on;
                normvals = normrnd(0, std(diffResiduals), 1000, 1);
                histogram(normvals, 'Normalization', 'pdf', 'EdgeColor', 'none');
                xlabel('Prediction - Observed');
                ylabel('Density');
                title('Normality');

                % Independence of Error and Homoscedasticity
                subplot(2, 2, 3);
                gscatter(trialData.Investment, diffResiduals, trialData.Multiplier, [], [], 15);
                xlabel('Investment');
                ylabel('Prediction - Observed');
                title('Independence of Error and Homoscedasticity');

                % Assessing Independence of Observations

                ris_model = fitlme(trialData, 'Returned ~ Prediction + (1 + Prediction | SubjectID)');
                disp(ris_model); 
                disp(r2(ris_model));

                ri_model = fitlme(trialData, 'Returned ~ Prediction + (1 | SubjectID)');
                disp(ri_model)
                disp(r2(ri_model));

                ric_model = fitlme(trialData, 'Returned ~ Prediction + (1 | SubjectID)');
                disp(ric_model); 
                disp(r2(ric_model));


        .. tab-item:: Python

            ::

                import seaborn as sns
                import matplotlib.pyplot as plt
                import statsmodels.api as sm

                # Model Performance
                modelPredictions = sm.OLS(trialData['Returned'], sm.add_constant(trialData['Prediction'])).fit()
                print(modelPredictions.summary())

                # Visually Checking Assumptions
                fig, axs = plt.subplots(2, 2, figsize=(12, 10))

                # Linearity
                axs[0, 0].scatter(trialData['Prediction'], trialData['Returned'], color='blue', alpha=0.5)
                axs[0, 0].plot([0, 30], [0, 30], color='red', linestyle='--')
                axs[0, 0].set_xlabel('Prediction')
                axs[0, 0].set_ylabel('Observed')
                axs[0, 0].set_xlim([0, 30])
                axs[0, 0].set_ylim([0, 30])
                axs[0, 0].set_title('Linearity')

                # Normality
                axs[0, 1].hist(modelPredictions.resid, bins='auto', color='blue', alpha=0.5, density=True)
                sns.histplot(normvals, bins='auto', color='red', alpha=0.5, kde=True, ax=axs[0, 1])
                axs[0, 1].set_xlabel('Prediction - Observed')
                axs[0, 1].set_ylabel('Density')
                axs[0, 1].set_title('Normality')

                # Independence of Error and Homoscedasticity
                axs[1, 0].scatter(trialData['Investment'], modelPredictions.resid, c=trialData['Multiplier'], cmap='viridis', s=15)
                axs[1, 0].set_xlabel('Investment')
                axs[1, 0].set_ylabel('Prediction - Observed')
                axs[1, 0].set_title('Independence of Error and Homoscedasticity')

                plt.tight_layout()
                plt.show()
                
                ### Assessing Independence

                # can do directly in R

.. dropdown:: Validating Parameter Recovery Process

    .. tab-set::

        .. tab-item:: R

            ::

                fivefold = data.frame() #preallocate for parameters and errors from the fivefold validation to go into

                for (i in 1:length(included_subjects)){
                df = trialData[which(included_subjects[i] == trialData$Subject), ]
                order = sample(length(df$Returned))
                Theta_ff = vector('numeric', length = 5)
                Phi_ff = vector('numeric', length = 5)
                SS_ff = 0
                Prediction_ff = vector('numeric', length(df$Returned))
                for (z in 1:5){
                    j = round((z - 1) * (length(df$Returned)/5) + 1)
                    n = round(z * (length(df$Returned)/5))
                    withheld = order[j:n]
                    m = ((i - 1) * 5) + z
                    
                    result_ff = fmincon(obj_function,x0 = initial_params, A = NULL, b = NULL, Aeq = NULL, beq = NULL,
                                        lb = lower_bounds, ub = upper_bounds,
                                        df = df[-withheld,])
                    
                    Theta_ff[z] = result_ff$par[1]
                    Phi_ff[z] = result_ff$par[2]
                    for (n in 1:length(withheld)){
                    if (df$Investment[withheld[n]] > 10) {
                        Choices = seq(0, (df$Investment[withheld[n]] * df$Multiplier[withheld[n]]), round((df$Investment[withheld[n]]*df$Multiplier[withheld[n]])/10))
                    } else {
                        Choices = seq(0, (df$Investment[withheld[n]] * df$Multiplier[withheld[n]]), 1)
                    }
                    utility_choices = vector('numeric', length(Choices))
                    for (q in 1:length(Choices)){
                        utility_choices[q] = utility(theta = result_ff$par[1], 
                                            phi = result_ff$par[2],
                                            guilt = guilt(df$Investment[withheld[n]], believed_multiplier = 4, Choices[q], df$Multiplier[withheld[n]]),
                                            payout = payout_maximization(df$Investment[withheld[n]], df$Multiplier[withheld[n]], Choices[q]),
                                            inequity = inequity(df$Investment[withheld[n]], df$Multiplier[withheld[n]], Choices[q], endowment = 10))
                    }
                    Prediction_ff[withheld[n]] = Choices[which(utility_choices == max(utility_choices))[1]]
                    }
                }
                SS_ff = sum((as.numeric(df$Returned) - Prediction_ff)**2)
                fivefold[i, 1:12] = c(SS_ff, Theta_ff, Phi_ff, included_subjects[i])
                }
                colnames(fivefold) = c('SS', 'Par1_fold1', 'Par1_fold2', 'Par1_fold3', 'Par1_fold4', 'Par1_fold5', 
                                    'Par2_fold1', 'Par2_fold2', 'Par2_fold3', 'Par2_fold4', 'Par2_fold5', 'SubjectID')

                sqrt(mean(fivefold$SS)/length(df$Investment)) - sqrt(mean(subjectData$modelSS)/length(df$Investment)) #the change in root mean squared error, per trial
                fivefold$AIC = length(df$Investment) * log(fivefold$SS/length(df$Investment)) + 2 * 2
                t.test(fivefold$AIC, subjectData$modelAIC, paired = T) #test fivefold MFI against normal MFI for this model

                library(lsa)
                cosines = vector('numeric', length = 10)
                for (i in 1:5){
                    cosines[i] = cosine(subjectData$Theta, fivefold[, (i + 1)]) #to get the correct columns in the fivefold dataframe (2-6)
                    cosines[(i+5)] = cosine(subjectData$Phi, fivefold[, (i + 6)]) #to get the correct columns in the fivefold dataframe (7-11)
                }

                mean(cosines[1:5]) #cosine similarity of parameter 1
                mean(cosines[6:10]) #cosine similarity of parameter 2

        .. tab-item:: MatLab

            ::

                fivefold = table(); % preallocate for parameters and errors from the fivefold validation to go into

                for i = 1:length(included_subjects)
                    df = trialData(included_subjects(i) == trialData.Subject, :);
                    order = randperm(length(df.Returned));
                    Theta_ff = NaN(1, 5);
                    Phi_ff = NaN(1, 5);
                    SS_ff = 0;
                    Prediction_ff = NaN(1, length(df.Returned));
                    
                    for z = 1:5
                        j = round((z - 1) * (length(df.Returned)/5) + 1);
                        n = round(z * (length(df.Returned)/5));
                        withheld = order(j:n);
                        
                        result_ff = fmincon(@(params) obj_function(params, df(~ismember(1:length(df.Returned), withheld), :)), ...
                            initial_params, [], [], [], [], lower_bounds, upper_bounds);
                        
                        Theta_ff(z) = result_ff(1);
                        Phi_ff(z) = result_ff(2);
                        
                        for n = 1:length(withheld)
                            if df.Investment(withheld(n)) > 10
                                Choices = 0:round((df.Investment(withheld(n)) * df.Multiplier(withheld(n)))/10):(df.Investment(withheld(n)) * df.Multiplier(withheld(n)));
                            else
                                Choices = 0:1:(df.Investment(withheld(n)) * df.Multiplier(withheld(n)));
                            end
                            
                            utility_choices = NaN(1, length(Choices));
                            for q = 1:length(Choices)
                                utility_choices(q) = utility(result_ff(1), result_ff(2), ...
                                    guilt(df.Investment(withheld(n)), 4, Choices(q), df.Multiplier(withheld(n))), ...
                                    payout_maximization(df.Investment(withheld(n)), df.Multiplier(withheld(n)), Choices(q)), ...
                                    inequity(df.Investment(withheld(n)), df.Multiplier(withheld(n)), Choices(q), 10));
                            end
                            
                            [~, maxIndex] = max(utility_choices);
                            Prediction_ff(withheld(n)) = Choices(maxIndex);
                        end
                    end
                    
                    SS_ff = sum((df.Returned - Prediction_ff).^2);
                    fivefold(i, 1:12) = [SS_ff, Theta_ff, Phi_ff, included_subjects(i)];
                end

                fivefold.Properties.VariableNames = {'SS', 'Par1_fold1', 'Par1_fold2', 'Par1_fold3', 'Par1_fold4', 'Par1_fold5', ...
                    'Par2_fold1', 'Par2_fold2', 'Par2_fold3', 'Par2_fold4', 'Par2_fold5', 'SubjectID'};

                sqrt(mean(fivefold.SS)/length(df.Investment)) - sqrt(mean(subjectData.modelSS)/length(df.Investment)); % the change in root mean squared error, per trial

                fivefold.AIC = length(df.Investment) * log(fivefold.SS/length(df.Investment)) + 2 * 2;
                ttest(fivefold.AIC, subjectData.modelAIC); % test fivefold MFI against normal MFI for this model

                cosines = NaN(1, 10);
                for i = 1:5
                    cosines(i) = cosine(subjectData.Theta, fivefold{:, (i + 1)}); % to get the correct columns in the fivefold dataframe (2-6)
                    cosines(i + 5) = cosine(subjectData.Phi, fivefold{:, (i + 6)}); % to get the correct columns in the fivefold dataframe (7-11)
                end

                mean(cosines(1:5)) % cosine similarity of parameter 1
                mean(cosines(6:10)) % cosine similarity of parameter 2

        .. tab-item:: Python

            ::

                fivefold = pd.DataFrame()  # preallocate for parameters and errors from the fivefold validation to go into

                for i in range(1, len(included_subjects) + 1):
                    df = trialData[included_subjects[i - 1] == trialData['Subject']]
                    order = np.random.permutation(len(df['Returned']))
                    Theta_ff = np.zeros(5)
                    Phi_ff = np.zeros(5)
                    SS_ff = 0
                    Prediction_ff = np.zeros(len(df['Returned']))
                    
                    for z in range(1, 6):
                        j = round((z - 1) * (len(df['Returned']) / 5) + 1)
                        n = round(z * (len(df['Returned']) / 5))
                        withheld = order[j-1:n]
                        
                        result_ff = minimize(lambda params: obj_function(params, df.loc[~df.index.isin(withheld), :]), 
                                            initial_params, bounds=list(zip(lower_bounds, upper_bounds)))
                        
                        Theta_ff[z-1] = result_ff.x[0]
                        Phi_ff[z-1] = result_ff.x[1]
                        
                        for n in withheld:
                            if df['Investment'].iloc[n] > 10:
                                Choices = np.arange(0, (df['Investment'].iloc[n] * df['Multiplier'].iloc[n]), 
                                                round((df['Investment'].iloc[n] * df['Multiplier'].iloc[n])/10))
                            else:
                                Choices = np.arange(0, (df['Investment'].iloc[n] * df['Multiplier'].iloc[n]), 1)
                            
                            utility_choices = np.zeros(len(Choices))
                            for q in range(len(Choices)):
                                utility_choices[q] = utility(result_ff.x[0], result_ff.x[1], 
                                                            guilt(df['Investment'].iloc[n], 4, Choices[q], df['Multiplier'].iloc[n]), 
                                                            payout_maximization(df['Investment'].iloc[n], df['Multiplier'].iloc[n], Choices[q]), 
                                                            inequity(df['Investment'].iloc[n], df['Multiplier'].iloc[n], Choices[q], 10))
                            
                            correct_choice = Choices[np.argmax(utility_choices)]
                            Prediction_ff[n] = correct_choice
                    
                    SS_ff = np.sum((df['Returned'].to_numpy() - Prediction_ff)**2)
                    fivefold[i - 1:i] = [SS_ff] + list(Theta_ff) + list(Phi_ff) + [included_subjects[i - 1]]

                sqrt(np.mean(fivefold['SS'])/len(df['Investment'])) - np.sqrt(np.mean(subjectData['modelSS'])/len(df['Investment'])) # the change in root mean squared error, per trial

                fivefold['AIC'] = len(df['Investment']) * np.log(fivefold['SS']/len(df['Investment'])) + 2 * 2
                ttest_rel(fivefold['AIC'], subjectData['modelAIC']) # test fivefold MFI against normal MFI for this model

                cosines = np.zeros(10)
                for i in range(1, 6):
                    cosines[i - 1] = cosine_similarity(subjectData['Theta'].to_numpy().reshape(1, -1), fivefold.iloc[:, i:i+1].to_numpy().T)[0, 0]
                    cosines[i + 4] = cosine_similarity(subjectData['Phi'].to_numpy().reshape(1, -1), fivefold.iloc[:, i+5:i+6].to_numpy().T)[0, 0]

                np.mean(cosines[0:5]) # cosine similarity of parameter 1
                np.mean(cosines[5:10]) # cosine similarity of parameter 2

Tutorial 2 - Galvan & Sanfey, 2024
-------------------

.. dropdown:: Assessing Accuracy in Capturing the Data Generation Process

    .. tab-set::

        .. tab-item:: R

            ::

                ### Model Performance

                taxRatePredictions = lm(data = trialData, observedTaxRate ~ predictedTaxRate)
                summary(taxRatePredictions) # R-squared predicting the subject's tax rate

                payoutPredictions = lm(data = trialData, observedOutcome ~ predictedOutcome)
                summary(payoutPredictions) # R-squared predicting the subject's payout, we're going to focus on this

                ### Visually Checking Assumptions

                qplot(x = trialData$predictedOutcome, y = trialData$observedOutcome, geom = 'smooth') +
                geom_abline(slope = 1, intercept = 0) + labs(x = 'Prediction', y = 'Observed') + lims(x = c(0, 30), y = c(0, 30)) # linearity

                normvals = rnorm(1000, mean = 0, sd = sd(trialData$predictedOutcome - trialData$observedOutcome))
                qplot(x = (trialData$predictedOutcome - trialData$observedOutcome), geom = 'density', bw = 1, color = 'Actual') +
                geom_density(aes(x = normvals, color = 'Predicted'), bw = 1) + labs(x = 'Prediction - Observed', y = 'Density') #normality

                qplot(x = trialData$initialAllocation, y = (trialData$predictedOutcome - trialData$observedOutcome), group = as.factor(trialData$Multiplier), color = as.factor(trialData$Multiplier), geom = 'smooth')  +
                labs(x = 'Initial Allocation', y = 'Prediction - Observed', color = 'Multiplier') #independence of error and homoscedasticity

                ### Assessing Independence

                library(lme4)
                library(MuMin)
                ris_model = lmer(data = trialData, observedOutcome ~ predictedOutcome + (1 + predictedOutcome | SubjectID))
                summary(ris_model) #model should have issues
                r.squaredGLMM(ris_model)

                cri_model = lmer(data = trialData, observedOutcome ~ predictedOutcome + condition + (1 | SubjectID))
                summary(cri_model) #model should have issues
                r.squaredGLMM(cri_model)

                cric_model = lmer(data = trialData, observedOutcome ~ predictedOutcome + condition + (1 + condition | SubjectID))
                summary(cric_model) #model should have issues
                r.squaredGLMM(cric_model)

                ri_model = lmer(data = trialData, observedOutcome ~ predictedOutcome + (1 | SubjectID))
                summary(ri_model)
                r.squaredGLMM(ri_model) #if conditional Rsq is between 0 and 0.05 lower than the multiple Rsq, that's good enough

        .. tab-item:: MatLab

            ::

                % Model Performance
                taxRatePredictions = fitlm(trialData, 'observedTaxRate ~ predictedTaxRate');
                disp(taxRatePredictions); % Display summary

                payoutPredictions = fitlm(trialData, 'observedOutcome ~ predictedOutcome');
                disp(payoutPredictions); % Display summary

                % Visually Checking Assumptions
                figure;

                % Linearity
                subplot(2, 2, 1);
                scatter(trialData.predictedOutcome, trialData.observedOutcome, 'filled');
                hold on;
                plot([0 30], [0 30], '--k');
                xlabel('Prediction');
                ylabel('Observed');
                xlim([0 30]);
                ylim([0 30]);
                title('Linearity');

                % Normality
                subplot(2, 2, 2);
                histogram(trialData.predictedOutcome - trialData.observedOutcome, 'Normalization', 'probability');
                hold on;
                normvals = normrnd(0, std(trialData.predictedOutcome - trialData.observedOutcome), 1, 1000);
                histogram(normvals, 'Normalization', 'probability');
                xlabel('Prediction - Observed');
                ylabel('Density');
                title('Normality');

                % Independence of Error and Homoscedasticity
                subplot(2, 2, 3);
                gscatter(trialData.initialAllocation, trialData.predictedOutcome - trialData.observedOutcome, trialData.Multiplier, 'br', '.', 10);
                xlabel('Initial Allocation');
                ylabel('Prediction - Observed');
                title('Independence of Error and Homoscedasticity');
                legend('off'); % Turn off automatic legend

                % Assessing Independence

                ris_model = fitlme(trialData, 'observedOutcome ~ predictedOutcome + (1 + predictedOutcome | SubjectID)');
                disp(ris_model);
                disp(r2(ris_model)); 

                cri_model = fitlme(trialData, 'observedOutcome ~ predictedOutcome + condition + (1 | SubjectID)');
                disp(cri_model); 
                disp(r2(cri_model));

                cric_model = fitlme(trialData, 'observedOutcome ~ predictedOutcome + condition + (1 + condition | SubjectID)');
                disp(cric_model); 
                disp(r2(cric_model)); 

                ri_model = fitlme(trialData, 'observedOutcome ~ predictedOutcome + (1 | SubjectID)');
                disp(ri_model); 
                disp(r2(ri_model));

        .. tab-item:: Python

            ::

                import seaborn as sns
                import matplotlib.pyplot as plt
                from statsmodels.regression.linear_model import OLS
                import statsmodels.api as sm

                # Model Performance
                tax_rate_predictions = OLS.from_formula('observedTaxRate ~ predictedTaxRate', trialData).fit()
                print(tax_rate_predictions.summary())

                payout_predictions = OLS.from_formula('observedOutcome ~ predictedOutcome', trialData).fit()
                print(payout_predictions.summary())

                # Visually Checking Assumptions

                # Linearity
                plt.subplot(2, 2, 1)
                sns.scatterplot(x='predictedOutcome', y='observedOutcome', data=trialData)
                plt.plot([0, 30], [0, 30], '--k')
                plt.xlabel('Prediction')
                plt.ylabel('Observed')
                plt.title('Linearity')

                # Normality
                plt.subplot(2, 2, 2)
                sns.histplot(trialData['predictedOutcome'] - trialData['observedOutcome'], kde=True, color='blue', label='Actual')
                norm_vals = np.random.normal(0, np.std(trialData['predictedOutcome'] - trialData['observedOutcome']), 1000)
                sns.histplot(norm_vals, kde=True, color='orange', label='Predicted')
                plt.xlabel('Prediction - Observed')
                plt.ylabel('Density')
                plt.title('Normality')
                plt.legend()

                # Independence of Error and Homoscedasticity
                plt.subplot(2, 2, 3)
                sns.lineplot(x='initialAllocation', y='predictedOutcome - observedOutcome', hue='Multiplier', style='Multiplier', markers=True, data=trialData)
                plt.xlabel('Initial Allocation')
                plt.ylabel('Prediction - Observed')
                plt.title('Independence of Error and Homoscedasticity')

                plt.tight_layout()
                plt.show()
                
                ### Assessing Independence

                # can do directly in R

.. dropdown:: Validating Parameter Recovery Process

    .. tab-set::

        .. tab-item:: R

            ::

                fivefold = data.frame()

                for (i in 1:length(included_subjects)){
                    datafile = paste(parentfolder, included_subjects[i], restoffilepath, sep = '') # produces a character vector 'parentfolder/included_subjects[i]**.filetype'
                    fullData = read.csv2(datafile)

                    thetaPerCondition = vector('numeric', length(conditions) * 5)
                    phiPerCondition = vector('numeric', length(conditions) * 5)
                    SSPerCondition = vector('numeric', length(conditions))

                    for (j in 1:length(conditions)){

                        df = fullData[which(fullData$condition == conditions[j]), c(49, 40:48), 33] #49 is subject's initial allocation, 40:48 are players 1:9 initial allocation, 33 is redistribution rate
                        df$redistributionRate = df$redistributionRate/100 #converting to a decimal from a percent

                        order = sample(20)

                        for (z in 1:5){
                            j = (z - 1) * 4 + 1
                            n = z * 4
                            withheld = order[j:n]
                            m = ((j - 1) * 5) + z
                            result = fmincon(obj_function,x0 = initial_params, A = NULL, b = NULL, Aeq = NULL, beq = NULL,
                                             lb = lower_bounds, ub = upper_bounds,
                                             df = df[-withheld,])  

                            thetaPerCondition[m] = result$par[1]
                            phiPerCondition[m] = result$par[2]   
                            for (k in 1:length(withheld)){
                                y = withheld[k]
                                Utility = vector('numeric', length(Choices))
                                for (n in 1:length(Choices)){
                                    Utility[n] = utility(theta = thetaPerCondition[m],
                                                         phi = phiPerCondition[m],
                                                         Equity = equity(new_value(df[y, 1:10], choices[n]), df[y, 1:10], choices[n]),
                                                         Equality = equality(new_value(df[y, 1:10], choices[n]), df[y, 1:10], choices[n]),
                                                         Payout = payout(new_value(df[y, 1], choices[n]), df[y, 1], choices[n]))
                                }
                                correct_choice = which(Utility == max(Utility))
                                predictedOutcome = new_value(df$myself[y], Choices[correct_choice[sample(length(correct_choice), 1)]])
                                observedOutcome = new_value(df$myself[y], df$redistributionRate)
                                SSPerCondition = SSPerCondition + (observedOutcome - predictedOutcome)**2
                            }
                        }
                    }

                    fivefold[i, 1:45] = c(included_subjects[i], thetaPerCondition, phiPerCondition, SSPerCondition)
                }
                colnames(fivefold) = c('SubjectID', 
                                       'thetaMeritF1', 'thetaMeritF2', 'thetaMeritF3', 'thetaMeritF4', 'thetaMeritF5',
                                       'thetaEntitlementF1', 'thetaEntitlementF2', 'thetaEntitlementF3', 'thetaEntitlementF4', 'thetaEntitlementF5', 
                                       'thetaCorruptionF1', 'thetaCorruptionF2', 'thetaCorruptionF3', 'thetaCorruptionF4', 'thetaCorruptionF5', 
                                       'thetaLuckF1', 'thetaLuckF2', 'thetaLuckF3', 'thetaLuckF4', 'thetaLuckF5', 
                                       'phiMeritF1', 'phiMeritF2', 'phiMeritF3', 'phiMeritF4', 'phiMeritF5', 
                                       'phiEntitlementF1', 'phiEntitlementF2', 'phiEntitlementF3', 'phiEntitlementF4', 'phiEntitlementF5', 
                                       'phiCorruptionF1', 'phiCorruptionF2', 'phiCorruptionF3', 'phiCorruptionF4', 'phiCorruptionF5', 
                                       'phiLuckF1', 'phiLuckF2', 'phiLuckF3', 'phiLuckF4', 'phiLuckF5', 
                                       'SSMerit', 'SSEntitlement', 'SSCorruption', 'SSLuck')

                sqrt(mean(fivefold$SSMerit)/length(df$redistributionRate)) - sqrt(mean(subjectData$SSMerit)/length(df$redistributionRate)) #the change in root mean squared error, per trial in merit condition
                sqrt(mean(fivefold$SSEntitlement)/length(df$redistributionRate)) - sqrt(mean(subjectData$SSEntitlement)/length(df$redistributionRate)) #the change in root mean squared error, per trial in entilement condition
                sqrt(mean(fivefold$SSCorruption)/length(df$redistributionRate)) - sqrt(mean(subjectData$SSCorruption)/length(df$redistributionRate)) #the change in root mean squared error, per trial in corruption condition
                sqrt(mean(fivefold$SSLuck)/length(df$redistributionRate)) - sqrt(mean(subjectData$SSLuck)/length(df$redistributionRate)) #the change in root mean squared error, per trial in luck condition

                fivefold$AICMerit = length(df$redistributionRate) * log(fivefold$SSMerit/length(df$redistributionRate)) + 2 * 2
                fivefold$AICEntitlement = length(df$redistributionRate) * log(fivefold$SSEntitlement/length(df$redistributionRate)) + 2 * 2
                fivefold$AICCorruption = length(df$redistributionRate) * log(fivefold$SSCorruption/length(df$redistributionRate)) + 2 * 2
                fivefold$AICLuck = length(df$redistributionRate) * log(fivefold$SSLuck/length(df$redistributionRate)) + 2 * 2

                t.test(fivefold$AICMerit, subjectData$AIC, paired = T) #test fivefold MFI against normal MFI for this model
                t.test(fivefold$AICEntitlement, subjectData$AICEntitlement, paired = T) #test fivefold MFI against normal MFI for this model
                t.test(fivefold$AICCorruption, subjectData$AICCorruption, paired = T) #test fivefold MFI against normal MFI for this model
                t.test(fivefold$AICLuck, subjectData$AICLuck, paired = T) #test fivefold MFI against normal MFI for this model

                library(lsa)
                cosines = vector('numeric', length = length(conditions) * 2 * 5) #number of conditions times number of parameters times the number of folds
                for (i in 1:5){
                    for (j in 1:length(conditions)){
                        k = ((i - 1) * 5) + j
                        cosines[k] = cosine(subjectData$Parameter1, fivefold[, (k + 1)])
                        cosines[(k+20)] = cosine(subjectData$Parameter1, fivefold[, (k + 21)])
                    }
                }

                mean(cosines[1:5]) #cosine similarity of theta in merit condition
                mean(cosines[6:10]) #cosine similarity of theta in entitlement condition
                mean(cosines[11:15]) #cosine similarity of theta in corruption condition
                mean(cosines[16:20]) #cosine similarity of theta in luck condition
                mean(cosines[21:25]) #cosine similarity of phi in merit condition
                mean(cosines[26:30]) #cosine similarity of phi in entitlement condition
                mean(cosines[31:35]) #cosine similarity of phi in corruption condition
                mean(cosines[36:40]) #cosine similarity of phi in luck condition

                
                mean(cosines[1:20]) #cosine similarity of theta across all conditions
                mean(cosines[21:40]) #cosine similarity of phi across all conditions

        .. tab-item:: MatLab

            ::

                fivefold = table();

                for i = 1:length(included_subjects)
                    datafile = strcat(parentfolder, included_subjects{i}, restoffilepath);
                    fullData = readtable(datafile);

                    thetaPerCondition = zeros(length(conditions) * 5, 1);
                    phiPerCondition = zeros(length(conditions) * 5, 1);
                    SSPerCondition = zeros(length(conditions), 1);

                    for j = 1:length(conditions)
                        df = fullData(fullData.condition == conditions{j}, [49, 40:48, 33]);
                        df.redistributionRate = df.redistributionRate / 100;

                        order = randperm(20);

                        for z = 1:5
                            start_index = (z - 1) * 4 + 1;
                            end_index = z * 4;
                            withheld = order(start_index:end_index);
                            m = ((start_index - 1) * 5) + z;

                            result = fmincon(@obj_function, initial_params, [], [], [], [], lower_bounds, upper_bounds, df(~ismember(1:20, withheld), :));

                            thetaPerCondition(m) = result(1);
                            phiPerCondition(m) = result(2);

                            for k = 1:length(withheld)
                                y = withheld(k);
                                Utility = zeros(length(Choices), 1);

                                for n = 1:length(Choices)
                                    Utility(n) = utility(result(1), result(2), ...
                                        equity(new_value(df{y, 1:10}, choices{n}), df{y, 1:10}, choices{n}), ...
                                        equality(new_value(df{y, 1:10}, choices{n}), df{y, 1:10}, choices{n}), ...
                                        payout(new_value(df{y, 1}, choices{n}), df{y, 1}, choices{n}));
                                end

                                [~, correct_choice] = max(Utility);
                                predictedOutcome = new_value(df.myself(y), Choices(correct_choice(randi(length(correct_choice)))));
                                observedOutcome = new_value(df.myself(y), df.redistributionRate(y));
                                SSPerCondition(j) = SSPerCondition(j) + (observedOutcome - predictedOutcome)^2;
                            end
                        end
                    end

                    fivefold{i, 1:45} = [included_subjects{i}, thetaPerCondition', phiPerCondition', SSPerCondition'];
                end

                fivefold.Properties.VariableNames = {'SubjectID', ...
                    'thetaMeritF1', 'thetaMeritF2', 'thetaMeritF3', 'thetaMeritF4', 'thetaMeritF5', ...
                    'thetaEntitlementF1', 'thetaEntitlementF2', 'thetaEntitlementF3', 'thetaEntitlementF4', 'thetaEntitlementF5', ...
                    'thetaCorruptionF1', 'thetaCorruptionF2', 'thetaCorruptionF3', 'thetaCorruptionF4', 'thetaCorruptionF5', ...
                    'thetaLuckF1', 'thetaLuckF2', 'thetaLuckF3', 'thetaLuckF4', 'thetaLuckF5', ...
                    'phiMeritF1', 'phiMeritF2', 'phiMeritF3', 'phiMeritF4', 'phiMeritF5', ...
                    'phiEntitlementF1', 'phiEntitlementF2', 'phiEntitlementF3', 'phiEntitlementF4', 'phiEntitlementF5', ...
                    'phiCorruptionF1', 'phiCorruptionF2', 'phiCorruptionF3', 'phiCorruptionF4', 'phiCorruptionF5', ...
                    'phiLuckF1', 'phiLuckF2', 'phiLuckF3', 'phiLuckF4', 'phiLuckF5', ...
                    'SSMerit', 'SSEntitlement', 'SSCorruption', 'SSLuck'};

                % Change in root mean squared error per trial in each condition
                disp(sqrt(mean(fivefold.SSMerit) / height(df) - sqrt(mean(subjectData.SSMerit) / height(df))));
                disp(sqrt(mean(fivefold.SSEntitlement) / height(df) - sqrt(mean(subjectData.SSEntitlement) / height(df))));
                disp(sqrt(mean(fivefold.SSCorruption) / height(df) - sqrt(mean(subjectData.SSCorruption) / height(df))));
                disp(sqrt(mean(fivefold.SSLuck) / height(df) - sqrt(mean(subjectData.SSLuck) / height(df))));

                fivefold.AICMerit = height(df) * log(fivefold.SSMerit / height(df)) + 2 * 2;
                fivefold.AICEntitlement = height(df) * log(fivefold.SSEntitlement / height(df)) + 2 * 2;
                fivefold.AICCorruption = height(df) * log(fivefold.SSCorruption / height(df)) + 2 * 2;
                fivefold.AICLuck = height(df) * log(fivefold.SSLuck / height(df)) + 2 * 2;

                % Test fivefold MFI against normal MFI for this model
                disp(ttest(fivefold.AICMerit, subjectData.AIC, 'paired', true));
                disp(ttest(fivefold.AICEntitlement, subjectData.AICEntitlement, 'paired', true));
                disp(ttest(fivefold.AICCorruption, subjectData.AICCorruption, 'paired', true));
                disp(ttest(fivefold.AICLuck, subjectData.AICLuck, 'paired', true));

                cosines = zeros(length(conditions) * 2 * 5, 1);

                for i = 1:5
                    for j = 1:length(conditions)
                        k = ((i - 1) * 5) + j;
                        cosines(k) = cosine(subjectData.Parameter1, fivefold{:, k + 1});
                        cosines(k + 20) = cosine(subjectData.Parameter1, fivefold{:, k + 21});
                    end
                end

                % Cosine similarity of theta and phi across conditions and folds
                disp(mean(cosines(1:5)));
                disp(mean(cosines(6:10)));
                disp(mean(cosines(11:15)));
                disp(mean(cosines(16:20)));
                disp(mean(cosines(21:25)));
                disp(mean(cosines(26:30)));
                disp(mean(cosines(31:35)));
                disp(mean(cosines(36:40)));

                % Cosine similarity of theta across all conditions and folds
                disp(mean(cosines(1:20)));

                % Cosine similarity of phi across all conditions and folds
                disp(mean(cosines(21:40)));

        .. tab-item:: Python

            ::

                fivefold = pd.DataFrame()

                for i in range(len(included_subjects)):
                    datafile = parentfolder + included_subjects[i] + restoffilepath
                    fullData = pd.read_csv(datafile)

                    thetaPerCondition = np.zeros(len(conditions) * 5)
                    phiPerCondition = np.zeros(len(conditions) * 5)
                    SSPerCondition = np.zeros(len(conditions))

                    for j in range(len(conditions)):
                        df = fullData[fullData['condition'] == conditions[j]][[48, *range(39, 48), 32]]
                        df['redistributionRate'] = df['redistributionRate'] / 100

                        order = np.random.permutation(20)

                        for z in range(1, 6):
                            start_index = (z - 1) * 4
                            end_index = z * 4
                            withheld = order[start_index:end_index]
                            m = (start_index * 5) + z - 1  # Corrected indexing

                            result = minimize(obj_function, initial_params, args=(df[~df.index.isin(withheld)],), bounds=list(zip(lower_bounds, upper_bounds)))

                            thetaPerCondition[m] = result.x[0]
                            phiPerCondition[m] = result.x[1]

                            for k in withheld:
                                Utility = np.zeros(len(Choices))

                                for n in range(len(Choices)):
                                    Utility[n] = utility(result.x[0], result.x[1],
                                                        equity(new_value(df.iloc[k, 0:9], choices[n]), df.iloc[k, 0:9], choices[n]),
                                                        equality(new_value(df.iloc[k, 0:9], choices[n]), df.iloc[k, 0:9], choices[n]),
                                                        payout(new_value(df.iloc[k, 0], choices[n]), df.iloc[k, 1], choices[n]))

                                correct_choice = np.argmax(Utility)
                                predictedOutcome = new_value(df['myself'].iloc[k], Choices[np.random.choice(np.where(Utility == max(Utility))[0])])
                                observedOutcome = new_value(df['myself'].iloc[k], df['redistributionRate'].iloc[k])
                                SSPerCondition[j] += (observedOutcome - predictedOutcome) ** 2

                    fivefold.loc[i, 0:44] = [included_subjects[i], *thetaPerCondition, *phiPerCondition, *SSPerCondition]

                fivefold.columns = ['SubjectID', 
                                    'thetaMeritF1', 'thetaMeritF2', 'thetaMeritF3', 'thetaMeritF4', 'thetaMeritF5',
                                    'thetaEntitlementF1', 'thetaEntitlementF2', 'thetaEntitlementF3', 'thetaEntitlementF4', 'thetaEntitlementF5', 
                                    'thetaCorruptionF1', 'thetaCorruptionF2', 'thetaCorruptionF3', 'thetaCorruptionF4', 'thetaCorruptionF5', 
                                    'thetaLuckF1', 'thetaLuckF2', 'thetaLuckF3', 'thetaLuckF4', 'thetaLuckF5', 
                                    'phiMeritF1', 'phiMeritF2', 'phiMeritF3', 'phiMeritF4', 'phiMeritF5', 
                                    'phiEntitlementF1', 'phiEntitlementF2', 'phiEntitlementF3', 'phiEntitlementF4', 'phiEntitlementF5', 
                                    'phiCorruptionF1', 'phiCorruptionF2', 'phiCorruptionF3', 'phiCorruptionF4', 'phiCorruptionF5', 
                                    'phiLuckF1', 'phiLuckF2', 'phiLuckF3', 'phiLuckF4', 'phiLuckF5', 
                                    'SSMerit', 'SSEntitlement', 'SSCorruption', 'SSLuck']

                # Change in root mean squared error per trial in each condition
                print(np.sqrt(np.mean(fivefold['SSMerit']) / len(df) - np.sqrt(np.mean(subjectData['SSMerit']) / len(df))))
                print(np.sqrt(np.mean(fivefold['SSEntitlement']) / len(df) - np.sqrt(np.mean(subjectData['SSEntitlement']) / len(df))))
                print(np.sqrt(np.mean(fivefold['SSCorruption']) / len(df) - np.sqrt(np.mean(subjectData['SSCorruption']) / len(df))))
                print(np.sqrt(np.mean(fivefold['SSLuck']) / len(df) - np.sqrt(np.mean(subjectData['SSLuck']) / len(df))))

                fivefold['AICMerit'] = len(df) * np.log(fivefold['SSMerit'] / len(df)) + 2 * 2
                fivefold['AICEntitlement'] = len(df) * np.log(fivefold['SSEntitlement'] / len(df)) + 2 * 2
                fivefold['AICCorruption'] = len(df) * np.log(fivefold['SSCorruption'] / len(df)) + 2 * 2
                fivefold['AICLuck'] = len(df) * np.log(fivefold['SSLuck'] / len(df)) + 2 * 2

                # Test fivefold MFI against normal MFI for this model
                print(ttest_rel(fivefold['AICMerit'], subjectData['AIC']))
                print(ttest_rel(fivefold['AICEntitlement'], subjectData['AICEntitlement']))
                print(ttest_rel(fivefold['AICCorruption'], subjectData['AICCorruption']))
                print(ttest_rel(fivefold['AICLuck'], subjectData['AICLuck']))

                cosines = np.zeros(len(conditions) * 2 * 5)

                for i in range(5):
                    for j in range(len(conditions)):
                        k = (i * 5) + j
                        cosines[k] = cosine(subjectData['Parameter1'], fivefold.iloc[:, k + 1])
                        cosines[k + 20] = cosine(subjectData['Parameter1'], fivefold.iloc[:, k + 21])

                # Cosine similarity of theta and phi across conditions and folds
                print(np.mean(cosines[0:4]))
                print(np.mean(cosines[5:9]))
                print(np.mean(cosines[10:15]))
                print(np.mean(cosines[15:19]))
                print(np.mean(cosines[20:24]))
                print(np.mean(cosines[25:29]))
                print(np.mean(cosines[30:34]))
                print(np.mean(cosines[35:39]))

                # Cosine similarity of theta across all conditions and folds
                print(np.mean(cosines[0:19]))

                # Cosine similarity of phi across all conditions and folds
                print(np.mean(cosines[20:39]))

Tutorial 3 - Crockett et al., 2014
-------------------

.. dropdown:: Assessing Accuracy in Capturing the Data Generation Process

    .. tab-set::

        .. tab-item:: R

            ::

                ### Assessing Model Performance

                sum(trialData$choseHarm == round(trialData$ProbHarm))/nrow(trialData)

                qplot(subjectData$BIC[-20], geom = 'density')
                worstExplained = which(subjectData$BIC[-20] > as.numeric(summary(subjectData$BIC[-20])[4]))

                qplot(data = trialData[which(trialData$SubjectID %in% altSubjectData$SubjectID[worstExplained]),], x = moneyDifference, y = choseHarm - ProbHarm, group = SubjectID) + geom_smooth()
                qplot(data = trialData[which(trialData$SubjectID %in% altSubjectData$SubjectID[worstExplained]),], x = shocksDifference, y = choseHarm - ProbHarm, group = SubjectID) + geom_smooth()

                ### Checking Assumptions

                ggplot(data = trialData) + geom_smooth(aes(x = ProbHarm, y = choseHarm, group = isSelf, color = factor(isSelf))) 
                ggplot(data = trialData) + geom_density(aes(x = choseHarm, fill = factor(choseHarm)))
                normvals = rnorm(1000, mean = 0, sd = sd(trialData$ProbHarm - trialData$choseHarm))
                qplot(x = trialData$ProbHarm - trialData$choseHarm, geom = 'density', bw = sd(trialData$ProbHarm - trialData$choseHarm), color = 'Actual') +
                geom_density(aes(x = normvals, color = 'Predicted'), bw = sd(trialData$ProbHarm - trialData$choseHarm))
                qplot(x = trialData$shocksDifference, y = trialData$ProbHarm - trialData$choseHarm, geom = 'jitter') + geom_smooth()
                qplot(x = trialData$moneyDifference, y = trialData$ProbHarm - trialData$choseHarm, geom = 'jitter') + geom_smooth()
                qplot(x = trialData$shocksDifference, y = trialData$ProbHarm - trialData$choseHarm, color = factor(trialData$isSelf), geom = 'smooth') + lims(y = c(-1,1))
                qplot(x = trialData$moneyDifference, y = trialData$ProbHarm - trialData$choseHarm, color = factor(trialData$isSelf), geom = 'smooth') + lims(y = c(-1,1))

                ### Assessing Independence

                library(lme4)
                library(MuMIn)

                ris_model = glmer(data = trialData, choseHarm ~ ProbHarm + (1 + ProbHarm | SubjectID), family = "binomial")
                r.squaredGLMM(ris_model)

                ri_model = glmer(data = trialData, choseHarm ~ ProbHarm + (1 | SubjectID), family = "binomial")
                r.squaredGLMM(ri_model)

                ric_model = glmer(data = trialData, choseHarm ~ ProbHarm + isSelf + (1 | SubjectID), family = "binomial")
                r.squaredGLMM(ric_model)

        .. tab-item:: MatLab

            ::

        .. tab-item:: Python

            ::
.. dropdown:: Validating Parameter Recovery Process

    .. tab-set::

        .. tab-item:: R

            ::

                fivefold = data.frame() #preallocate for parameters and errors from the fivefold validation to go into
                trialData$ProbHarm_ff = 0
                optimize = function(df){
                tryCatch({
                    fmincon(obj_function, x0 = initial_params, lb = lower_bounds, ub = upper_bounds, df = df, optimMethod = "MLE")
                }, error = function(e){
                    fmincon(obj_function, x0 = initial_params, lb = lower_bounds, ub = upper_bounds, df = df, optimMethod = "OLS")
                })
                }
                adj = 1
                for (i in 1:length(included_subjects)){
                df = grab_data(included_subjects[i])
                if (is.character(df)){adj = adj + 1; next}
                df$ProbHarm = 0
                
                order = sample(152)
                KSelf_ff = vector('numeric', length = 5)
                KOther_ff = vector('numeric', length = 5)
                BSelf_ff = vector('numeric', length = 5)
                BOther_ff = vector('numeric', length = 5)
                E_ff = vector('numeric', length = 5)
                G_ff = vector('numeric', length = 5)
                Deviance_ff = 0
                df$Pred = 0
                for (z in 1:5){
                    j = round((z - 1) * (152/5) + 1)
                    n = round(z * (152/5))
                    withheld = order[j:n]
                    
                    result_ff = optimize(df[-withheld,])
                    
                    KSelf_ff[z] = result_ff$par[1]
                    KOther_ff[z] = result_ff$par[2]
                    BSelf_ff[z] = result_ff$par[3]
                    BOther_ff[z] = result_ff$par[4]
                    E_ff[z] = result_ff$par[5]
                    G_ff[z] = result_ff$par[6]
                    df[withheld, ] = generate_predictions(df[withheld, ], result_ff)
                }
                Deviance_ff = -2*sum(df$choseHarm * log(df$ProbHarm) + (1 - df$choseHarm) * log(1 - df$ProbHarm))
                fivefold[i, 1:32] = c(included_subjects[i], Deviance_ff, KSelf_ff, KOther_ff, BSelf_ff, BOther_ff, E_ff, G_ff)
                j = (i-adj)*152 + 1; n = (i-adj + 1)*152
                trialData$ProbHarm_ff[j:n] = df$ProbHarm
                }
                colnames(fivefold) = c('SubjectID', 'Deviance', 'KS_F1', 'KS_F2', 'KS_F3', 'KS_F4', 'KS_F5',
                                    'KO_F1','KO_F2','KO_F3','KO_F4','KO_F5', 'BS_F1', 'BS_F2', 'BS_F3', 'BS_F4', 'BS_F5',
                                    'BO_F1','BO_F2','BO_F3','BO_F4','BO_F5', 'E_F1', 'E_F2', 'E_F3', 'E_F4', 'E_F5', 
                                    'G_F1', 'G_F2', 'G_F3', 'G_F4', 'G_F5')

                sum(round(trialData$ProbHarm_ff) == trialData$choseHarm)/nrow(trialData)
                fivefold$BIC = as.numeric(fivefold$Deviance) + log(152) * 6
                t.test(fivefold$BIC, subjectData$BIC, paired = T) #test fivefold MFI against normal MFI for this model

                library(lsa)
                cosines = vector('numeric', length = 60)
                for (i in 1:5){
                cosines[i] = cosine(as.numeric(subjectData$Kappa_Self[-20]), as.numeric(fivefold[-20, (i + 2)]))
                cosines[(i+5)] = cosine(as.numeric(subjectData$Kappa_Other[-20]), as.numeric(fivefold[-20, (i + 7)]))
                cosines[(i+10)] = cosine(as.numeric(subjectData$Beta_Self[-20]), as.numeric(fivefold[-20, (i + 12)]))
                cosines[(i+15)] = cosine(as.numeric(subjectData$Beta_Other[-20]), as.numeric(fivefold[-20, (i + 17)]))
                cosines[(i+20)] = cosine(as.numeric(subjectData$Epsilon[-20]), as.numeric(fivefold[-20, (i + 22)]))
                cosines[(i+25)] = cosine(as.numeric(subjectData$Gamma[-20]), as.numeric(fivefold[-20, (i + 27)]))
                }

                mean(cosines[1:5]) 
                mean(cosines[6:10])
                mean(cosines[11:15]) 
                mean(cosines[16:20]) 
                mean(cosines[21:25]) 
                mean(cosines[26:30])  

        .. tab-item:: MatLab

            ::

        .. tab-item:: Python

            ::

Tutorial 4 - Li et al., 2022
-------------------

.. dropdown:: Assessing Accuracy in Capturing the Data Generation Process

    .. tab-set::

        .. tab-item:: R

            ::

                ### Assessing Model Performance

                sum(altTrialData$Chose1 == round(altTrialData$ar_Prob1))/nrow(altTrialData)

                qplot(subjectData$BIC, geom = 'density')
                worstExplained = which(subjectData$BIC > as.numeric(summary(subjectData$BIC)[4]))

                qplot(data = altTrialData[which(altTrialData$SubjectID %in% altSubjectData$SubjectID[worstExplained]),], x = a0-a1, y = Chose1 - Prob1, group = SubjectID) + geom_smooth(se=F)
                qplot(data = altTrialData[which(altTrialData$SubjectID %in% altSubjectData$SubjectID[worstExplained]),], x = a0-a2, y = Chose1 - Prob1, group = SubjectID) + geom_smooth(se=F)
                qplot(data = altTrialData[which(altTrialData$SubjectID %in% altSubjectData$SubjectID[worstExplained]),], x = b0-b1, y = Chose1 - Prob1, group = SubjectID) + geom_smooth(se=F)
                qplot(data = altTrialData[which(altTrialData$SubjectID %in% altSubjectData$SubjectID[worstExplained]),], x = b0-b2, y = Chose1 - Prob1, group = SubjectID) + geom_smooth(se=F)


                ### Checking Assumptions

                ggplot(data = altTrialData) + geom_smooth(aes(x = ar_Prob1, y = Chose1, group = a0 < b0, color = factor(a0 < b0))) 
                ggplot(data = altTrialData) + geom_density(aes(x = Chose1, group = a0 < b0, fill = factor(a0 < b0), alpha = 0.5))
                normvals = rnorm(1000, mean = 0, sd = sd(altTrialData$ar_Prob1 - altTrialData$Chose1))
                qplot(x = altTrialData$ar_Prob1 - altTrialData$Chose1, geom = 'density', bw = sd(altTrialData$ar_Prob1 - altTrialData$Chose1), color = 'Actual') +
                geom_density(aes(x = normvals, color = 'Predicted'), bw = sd(altTrialData$ar_Prob1 - altTrialData$Chose1))
                qplot(x = abs(altTrialData$a1 - altTrialData$a2), y = altTrialData$ar_Prob1 - altTrialData$Chose1, geom = 'jitter') + geom_smooth(method = 'loess')
                qplot(x = abs(altTrialData$a1 - altTrialData$a2), y = altTrialData$ar_Prob1 - altTrialData$Chose1, color = factor(altTrialData$a0 > altTrialData$b0), geom = 'jitter') + geom_smooth(method = 'loess')

                ### Assessing Independence

                library(lme4)
                library(MuMIn)

                ris_model = glmer(data = altTrialData, Chose1 ~ ar_Prob1 + (1 + ar_Prob1 | SubjectID), family = "binomial")
                r.squaredGLMM(ris_model)

                ri_model = glmer(data = altTrialData, Chose1 ~ ar_Prob1 + (1 | SubjectID), family = "binomial")
                r.squaredGLMM(ri_model)

                ric_model = glmer(data = altTrialData, Chose1 ~ ar_Prob1 + factor(a0<b0) + (1 | SubjectID), family = "binomial")
                r.squaredGLMM(ric_model)

        .. tab-item:: MatLab

            ::

        .. tab-item:: Python

            ::
.. dropdown:: Validating Parameter Recovery Process

    .. tab-set::

        .. tab-item:: R

            ::

                fivefold = data.frame() 
                trialData$Prob1_ff = 0

                for (i in 1:length(included_subjects)){
                df = grab_data(included_subjects[i])
                df$Prob1 = 0
                
                order = sample(nrow(df))
                A_ff = vector('numeric', length = 5)
                R_ff = vector('numeric', length = 5)
                B_ff = vector('numeric', length = 5)
                E_ff = vector('numeric', length = 5)
                G_ff = vector('numeric', length = 5)
                Deviance_ff = 0
                df$Pred = 0
                for (z in 1:5){
                    j = round((z - 1) * (nrow(df)/5) + 1)
                    n = round(z * (nrow(df)/5))
                    withheld = order[j:n]
                    
                    result_ff = optimize(of_ar, 
                                        initial_params[c(1,3:6)], 
                                        lower_bounds[c(1,3:6)], 
                                        upper_bounds[c(1,3:6)], 
                                        df[-withheld,])
                    
                    A_ff[z] = result_ff$par[1]
                    R_ff[z] = result_ff$par[2]
                    B_ff[z] = result_ff$par[3]
                    E_ff[z] = result_ff$par[4]
                    G_ff[z] = result_ff$par[5]
                    pars = c(result_ff$par[1], 0, result_ff$par[2:5])
                    df$Prob1[withheld] = generatePredictions(pars, df[withheld, ])
                }
                Deviance_ff = -2*sum(df$Chose1 * log(df$Prob1) + 
                                        (1 - df$Chose1) * log(1 - df$Prob1))
                fivefold[i, 1:27] = c(included_subjects[i], Deviance_ff, 
                                        A_ff, R_ff, B_ff, E_ff, G_ff)
                trialData$Prob1_ff[which(trialData$SubjectID == included_subjects[i])] = df$Prob1
                }
                colnames(fivefold) = c('SubjectID', 'Deviance', 
                                    'A_F1', 'A_F2', 'A_F3', 'A_F4', 'A_F5',
                                    'R_F1', 'R_F2', 'R_F3', 'R_F4', 'R_F5', 
                                    'B_F1','B_F2','B_F3','B_F4','B_F5', 
                                    'E_F1', 'E_F2', 'E_F3', 'E_F4', 'E_F5', 
                                    'G_F1', 'G_F2', 'G_F3', 'G_F4', 'G_F5')

                sum(round(trialData$Prob1_ff) == trialData$Chose1)/nrow(trialData)
                fivefold$BIC = as.numeric(fivefold$Deviance) + log(65) * 6
                t.test(fivefold$BIC, altSubjectData$BIC_M5, paired = T) #test fivefold MFI against normal MFI for this model

                library(lsa)
                cosines = vector('numeric', length = 60)
                for (i in 1:5){
                cosines[i] = cosine(as.numeric(altSubjectData$Alpha_M5), as.numeric(fivefold[, (i + 2)]))
                cosines[(i+5)] = cosine(as.numeric(altSubjectData$Rho_M5), as.numeric(fivefold[, (i + 7)]))
                cosines[(i+10)] = cosine(as.numeric(altSubjectData$Beta_M5), as.numeric(fivefold[, (i + 12)]))
                cosines[(i+15)] = cosine(as.numeric(altSubjectData$Epsilon_M5), as.numeric(fivefold[, (i + 17)]))
                cosines[(i+20)] = cosine(as.numeric(altSubjectData$Gamma_M5), as.numeric(fivefold[, (i + 22)]))
                }

                mean(cosines[1:5]) 
                mean(cosines[6:10])
                mean(cosines[11:15]) 
                mean(cosines[16:20]) 
                mean(cosines[21:25])

        .. tab-item:: MatLab

            ::

        .. tab-item:: Python

            ::