Validate the Best Model
Lesson
Goal During this Stage
We want to see if the model is actually valid: there are two things that we need to prove in order for our model to be valid: first, we have to show that our model accurately captures the data generation process and, second, we have to show that our parameter recovery process is robust.
How to Achieve this Goal
Assessing Accuracy in Capturing the Data Generation Process
In order to show that our model accurately captures the data generation process, we have to show that the model predicts behavior equally well at all values. Essentially, we must test four assumptions about how our model predictions relate to the behavioral data that they were trained upon: linearity, normality of error, independence of error, and homoscedasticty. These are also four of the five assumptions of linear regression (the fifth is the independence of Independent Variables, but we don’t rely on this assumption).
Note
The assumptions of our model are (with one exception) the assumptions of linear regression, but not the assumptions of mixed effects regression despite the fact that we obviously are using repeated measures - why is this?
Well, the answer is essentially the same answer as why this fifth assumption - the independence of Independent Variables - is not included. According to our model, all of the ways that Subjects can be different are encapsulated in the Free Parameters. Another way to phrase the assumption of the independence of Independent Variables is the independence of observations. Linear modeling requires that observations are not related, so if Independent Variables are collinear, then the Dependent Variable are predicted by both Independent Variables which means that the observations are dependent upon each other. Thus since we don’t know what actually causes the Dependent Variable to change neither Independent Variable can be used to predict the variance explained by both. In a similar vein, linear mixed effects models control for the independence of observations (i.e. produced by the same Subject) using random effects.
See, where we’re going here? Since our model asserts that all of the ways that Subjects can be different are encapsulated in the Free Parameters, we essentially assert that the way our model predicts Decisions does not differ across Subjects. We have one Independent Variable - the model predicted Decisions. According to our model, nothing else can predict Decisions. However, we’re not going to blindly assume this - we’ll also verify that this is true.
Model Performance
Before we start checking assumptions, we need to do a simple regression: using the predicted Decisions of our best model to predict actual Decisions. So, we will have to use the trial-by-trial data to accomplish this.
What we want to see first is the R-squared: the multiple R-squared should always be essentially equal to the adjusted R-squared since there are going to be a lot of observations. The R-squared tells us how much variance our model explains: any model with an R-squared above 0.70 is very good. Particularly for models without noise parameters, it means that you are really capturing the data generation process and essentially rules out the possibility that you are missing an additionally strategy or additional motive. As long as you don’t find any assumptions to be grotesquely violated, you should proceed and feel very confident that your model is a good representation of the data generation process.
An R-squared between 0.5 and 0.7 is acceptable. Although this good model performance by objective standards, these tasks are designed to elicit very well-defined, consistent preferences with a very high signal to noise ratio. You might be missing an additional strategy in your model - so use the steps below and see if this is the case! If not, you should proceed with apprehension but it does not immediately invalidate any conclusions that you want to draw.
An R-squared of less than 0.5 is bad. Either your task-and-experimental procedure was 1) poorly thought out, producing too high of a signal-to-noise ratio, or 2) your model does not account for one or more preferences that govern behavior in your task. Ask yourself what the cause is: if it is a problem with the task either consider adding noise or bias parameters, find a different kind of model to analyze your data, or otherwise throw the data out because you cannot analyze it with a utility model. If it is the latter then identify Subjects who are fit particularly poorly by the model, identify the behavioral trend, and try to think about a value-based preference that could lead one to behave in such a way. We can try this on all of these Subjects - if this doesn’t offer clear insight then we might want to look at individual subjects.
modelPredictions = lm(data = trialData, Decisions ~ Prediction)
summary(modelPredictions) # R-squared
#we can identify if we're missing a strategy using a density plot of MFIs - a plot with a group of really high AICs with a group of lower AICs would suggest a missing strategy
qplot(x = subjectData$modelAIC, geom = 'density')
#if we're missing something let's identify worst explained quartile of subjects according to our model
worstExplained = which(subjectData$modelAIC > as.numeric(summary(subjectData$modelAIC)[5]))
qplot(data = trialData[which(trialData$SubjectID == subjectData$SubjectID[worstExplained]), x = IV, y = DV, group = trialData$SubjectID]) + geom_smooth() #a loess line for all subjects
modelPredictions = fitlm(trialData, 'Decisions ~ Prediction');
summary(modelPredictions); % R-squared
% We can identify if we're missing a strategy using a density plot of MFIs - a plot with a group of really high AICs with a group of lower AICs would suggest a missing strategy
figure;
ksdensity(subjectData.modelAIC);
% If we're missing something, let's identify the worst-explained quartile of subjects according to our model
worstExplained = find(subjectData.modelAIC > str2double(summary(subjectData.modelAIC).Variables{5, 1}));
figure;
for i = 1:length(worstExplained)
subjectIndex = worstExplained(i);
subplot(length(worstExplained), 1, i);
scatter(trialData.IV(trialData.SubjectID == subjectData.SubjectID(subjectIndex)), trialData.DV(trialData.SubjectID == subjectData.SubjectID(subjectIndex)), 'filled');
hold on;
lsline;
hold off;
end
import statsmodels.api as sm
import seaborn as sns
import matplotlib.pyplot as plt
modelPredictions = sm.OLS(trialData['Decisions'], sm.add_constant(trialData['Prediction'])).fit()
print(modelPredictions.summary()) # R-squared
# We can identify if we're missing a strategy using a density plot of MFIs - a plot with a group of really high AICs with a group of lower AICs would suggest a missing strategy
sns.kdeplot(subjectData['modelAIC'])
# If we're missing something, let's identify the worst-explained quartile of subjects according to our model
worstExplained = subjectData[subjectData['modelAIC'] > float(subjectData['modelAIC'].describe()['75%'])].index
plt.figure(figsize=(8, 6))
for i, subjectIndex in enumerate(worstExplained):
plt.subplot(len(worstExplained), 1, i + 1)
sns.scatterplot(x=trialData.loc[trialData['SubjectID'] == subjectData.loc[subjectIndex, 'SubjectID'], 'IV'],
y=trialData.loc[trialData['SubjectID'] == subjectData.loc[subjectIndex, 'SubjectID'], 'DV'],
hue=trialData['SubjectID'])
sns.regplot(x=trialData.loc[trialData['SubjectID'] == subjectData.loc[subjectIndex, 'SubjectID'], 'IV'],
y=trialData.loc[trialData['SubjectID'] == subjectData.loc[subjectIndex, 'SubjectID'], 'DV'],
scatter=False)
plt.show()
Visually Checking Assumptions
Linearity
We need to ensure that the relationship between model predictions of Decisions and observed Decisions is linear. If this relationship is flat (slope is 0) rather than linear (slope is 1) then our model is doing terribly at predicting Decisions, essentially making predictions completely at chance-level. Thus our model would not be capturing the data generation process and, most likely, our Free Parameters are fitted to noise at all values. So let’s plot a regression line with model predictions on the x-axis and observed Decisions on the y-axis. The slope should be essentially 1.
qplot(x = trialData$Prediction, y = trialData$Decision, geom = 'smooth') +
geom_abline(slope = 1, intercept = 0)
scatter(trialData.Prediction, trialData.Decision);
hold on;
loess_smooth = fitloess(trialData.Prediction, trialData.Decision);
plot(loess_smooth);
plot(trialData.Prediction, trialData.Prediction, 'k--');
hold off;
plt.scatter(trialData['Prediction'], trialData['Decision'])
lowess = sm.nonparametric.lowess(trialData['Decision'], trialData['Prediction'])
plt.plot(lowess[:, 0], lowess[:, 1], 'r-')
plt.plot(trialData['Prediction'], trialData['Prediction'], 'k--')
plt.show()
Normality of Error
We need to ensure that prediction errors are normally distributed. If prediction errors are skewed, then our model is making making more underpredictions (negative) or overpredictions (positive) of Decisions. If the distribution is kurtosed (i.e. too skinny or too fat) then we’re more or less likely to have extreme errors in predicting Decisons compared to a normal distribution. We can create a density plot of prediction errors (i.e. the difference between model predictions of Decisions and observed Decisions) and see if it follows a bell-curve. We can create a density plot with a normal distribution where the standard deviation is the standard deviation of prediction errors to check this.
normvals = rnorm(1000, mean = 0, sd = sd(trialData$Prediction - trialData$Decision))
qplot(x = trialData$Prediction - trialData$Decision, geom = 'density', bw = 1, color = 'Actual') +
geom_density(aes(x = normvals, color = 'Predicted'), bw = 1)
rng('default'); % Set random number generator seed for reproducibility
normvals = normrnd(0, std(trialData.Prediction - trialData.Decision), 1000, 1);
figure;
hold on;
histogram(trialData.Prediction - trialData.Decision, 'Normalization', 'pdf', 'EdgeColor', 'b', 'FaceColor', 'none');
histogram(normvals, 'Normalization', 'pdf', 'EdgeColor', 'r', 'FaceColor', 'none');
legend('Actual', 'Predicted');
np.random.seed(0) # Set random number generator seed for reproducibility
normvals = np.random.normal(0, np.std(trialData['Prediction'] - trialData['Decision']), 1000)
plt.hist(trialData['Prediction'] - trialData['Decision'], bins='auto', density=True, color='b', edgecolor='b', alpha=0.5)
plt.hist(normvals, bins='auto', density=True, color='r', edgecolor='r', alpha=0.5)
plt.legend(['Actual', 'Predicted'])
plt.show()
Independence of Error
Next, we need to ensure that our model’s prediction errors of Decisions are not confounded with values of Independent Variable. If our model predicts Decisions worse or better at certain values of Independent Variable compared to others, then our recovery of Free Parameters is influenced disproportionately by Decisions made when Independent Variable have a certain value. Thus, our Free Parameters would be overfit for certain values and underfit for other values. We can check this assumption by creating loess line with our Independent Variable on the x-axis and the model prediction errors on the y-axis. This slope should be 0 with an intercept of 0.
qplot(x = trialData$IV, y = (trialData$Prediction-trialData$Decision), geom = 'smooth')
x = trialData.IV;
y = trialData.Prediction - trialData.Decision;
loess = fit(x, y, 'loess');
scatter(x, y);
hold on;
plot(loess, x, y);
hold off;
x = trialData['IV']
y = trialData['Prediction'] - trialData['Decision']
lowess = sm.nonparametric.lowess(y, x, frac=0.25) # Adjust the frac parameter as needed
plt.scatter(x, y)
plt.plot(lowess[:, 0], lowess[:, 1], 'r-')
plt.show()
Homoscedasticty
Finally, we need to ensure that variance in model prediction errors does not change as a function of an Independent Variable. Of all of the assumptions, this one is the least problematic if violated - essentially it indicates that our model has less predictive accuracy at certain values of the Independent Variable. This is a bigger issue for regression models because it can make Confidence Intervals of regression coefficients which are too narrow - we don’t have this issue unless you’re doing means testing on your Free Parameters (i.e. testing modulatory hypotheses). Nonetheless, this could produce unreliable Free Parameter estimates so we have to rely on out-of-sample validation to rule this out. If this assumption is very badly violated, it might make sense to included a noise parameter into your model to scale with the Independent Variable or to, instead, use a different estimator such as Weighted Least Squares to recover your Free Parameters. To check this, we can create a loess line with a variance cloud with our Independent Variable on the x-axis and the model prediction errors on the y-axis. The cloud should be a constant width around the loess line.
qplot(x = trialData$IV, y = (trialData$Prediction-trialData$Decision), geom = 'smooth')
x = trialData.IV;
y = trialData.Prediction - trialData.Decision;
loessSmoothing = fit(x, y, 'loess');
plot(loessSmoothing, x, y);
hold on;
plot(x, y, 'o', 'MarkerEdgeColor', 'b', 'MarkerFaceColor', 'b', 'MarkerSize', 5);
legend('LOESS Smoothed Line', 'Data Points');
xlabel('IV');
ylabel('Prediction - Decision');
hold off;
- ::
x = trialData[‘IV’] y = trialData[‘Prediction’] - trialData[‘Decision’]
lowess = sm.nonparametric.lowess(y, x, frac=0.3) # Adjust frac as needed smoothed_x, smoothed_y = lowess.T
plt.plot(smoothed_x, smoothed_y, label=’LOESS Smoothed Line’) plt.scatter(x, y, marker=’o’, edgecolors=’b’, facecolors=’b’, s=50, label=’Data Points’) plt.legend() plt.xlabel(‘IV’) plt.ylabel(‘Prediction - Decision’) plt.show()
Assessing Independence of Observations
We want to ensure that accounting for Subjects’ differences from each other using Free Parameters results in model predictions of Decisions which are not attributable to individual differences. We can accomplish this using the same linear modeling formula, but including random intercepts for the subject.
First, we can try a model with a random intercept and random slope of model predicted value. This model might have convergence issues or singularity issues - if so, great! Otherwise, don’t worry just go to the next model.
Next, we can try a model with a only a random intercept. We can estimate an R-squared for this model - the marginal R-squared (i.e. including variance explained by random effects) should be roughly equivalent to the conditional R-squared (i.e. variance only explained by fixed effects - i.e. your model) - within 0.05 is reasonable.
ris_model = lmer(data = trialData, Decision ~ Prediction + (1 + Prediction | SubjectID))
summary(ris_model) #model should have issues
ri_model = lmer(data = trialData, Decision ~ Prediction + (1 | SubjectID))
summary(ri_model)
library(MuMin)
r.squaredGLMM(ri_model) #if conditional Rsq is between 0 and 0.05 lower than the multiple Rsq, that's good enough
ris_model = fitlme(trialData, 'Decision ~ Prediction + (1 + Prediction | SubjectID)');
disp(ris_model);
ri_model = fitlme(trialData, 'Decision ~ Prediction + (1 | SubjectID)');
disp(ri_model);
disp(ri_model.Rsquared); #if conditional Rsq is between 0 and 0.05 lower than the multiple Rsq, that's good enough
import statsmodels.api as sm
import r2glmm
ris_model = sm.MixedLM.from_formula('Decision ~ Prediction + (1 + Prediction | SubjectID)', data=trialData)
result_ris = ris_model.fit()
print(result_ris.summary())
ri_model = sm.MixedLM.from_formula('Decision ~ Prediction + (1 | SubjectID)', data=trialData)
result_ri = ri_model.fit()
print(result_ri.summary())
r_squared = r2glmm.get_r2(result_ri)
print(r_squared) #if conditional Rsq is between 0 and 0.05 lower than the multiple Rsq, that's good enough
Validating Parameter Recovery Process
In order to show that our parameter recovery process is robust, we have to show that the model can predict behavior that it was not trained on. We accomplish this by performing Fivefold Validation.
Fivefold Validation
We essentially want to prove that our Free Parameters are not overfitting the Decisions that they are training on. In other words, we want to rule out that our favored model isn’t outperforming other models because it’s fitting weird quirks in Subjects’ Decisions. It should seem really intuitive that, in order to prove this, we’re going to separate Decisions into a training set and a testing set. Let’s talk about how specifically we’re going to do this.
So, we’re going to randomly split each Subjects’ Decisions into one of five groups, called folds. We’re going to take one fold and remove those Trials from our training set - we’re going to recover Free Parameters from four-fifths of Decisions and we’re going to test it on this fold - this fifth of Decisions that we excluded. We’re going to save the Free Parameters from this four-fifths and the predicted Decisions of the withheld one-fifth. We rinse and repeat for the remaining four folds. Once this is done, we can get the model error for the predicted-against-observed Decisions for all Trials. Now we should have five sets of Free Parameters for each Subject and, combining the model error of all five folds, we should have a model error estimate across all Trials.
Now, we want to look at two things: first we want to assess how much larger this model error is compared to the model error of the model trained on all of the data. Here, you should report the change in root mean squared error per trial - you can also do a paired t-test on the MFI of the fivefold predictions compared to the standard model predictions but this is not necessary. Second, we also want to assess the similarity of the Free Parameters we recovered withholding each of the five folds to the Free Parameters recovered on the entire data set. You should report the average cosine similarity across all folds for each of the Free Parameters.
fivefold = data.frame() #preallocate for parameters and errors from the fivefold validation to go into
for (i in 1:length(included_subjects)){
datafile = paste(parentfolder, included_subjects[i], restoffilepath, sep = '') # produces a character vector 'parentfolder/included_subjects[i]**.filetype'
df = read.csv2(datafile)
order = sample(20)
Parameter1_ff = vector('numeric', length = 5)
Parameter2_ff = vector('numeric', length = 5)
SS_ff = 0
Prediction_ff = vector('numeric', length(df$Decision))
for (z in 1:5){
j = (z - 1) * 4 + 1
n = z * 4
withheld = order[j:n]
m = ((i - 1) * 5) + z
result_ff = fmincon(obj_function,x0 = initial_params, A = NULL, b = NULL, Aeq = NULL, beq = NULL,
lb = lower_bounds, ub = upper_bounds,
df = df[-withheld,])
Parameter1_ff[m] = result_ff$par[1]
Parameter2_ff[m] = result_ff$par[2]
for (n in 1:length(withheld)){
utility_choices = vector('numeric', length(Choices))
for (q in 1:length(Choices)){
utility_choices[q] = utility(result_ff$par[1], result_ff$par[2],
construct1(df$IV[withheld[n]], df$Constant[withheld[n]], Choices[q]),
construct2(df$IV[withheld[n]], df$Constant[withheld[n]], Choices[q]),
construct3(df$IV[withheld[n]], df$Constant[withheld[n]], Choices[q]))
}
Prediction_ff[withheld[n]] = Choices[which(utility_choices == max(utility_choices))]
}
}
SS_ff = sum((df$Decision - Prediction_ff)**2)
fivefold[i, 1:11] = c(SS_ff, Parameter1_ff, Parameter2_ff)
}
colnames(fivefold) = c('SS', 'Par1_fold1', 'Par1_fold2', 'Par1_fold3', 'Par1_fold4', 'Par1_fold5',
'Par2_fold1', 'Par2_fold2', 'Par2_fold3', 'Par2_fold4', 'Par2_fold5', SubjectID)
sqrt(mean(fivefold$SS)/length(df$IV)) - sqrt(mean(subjectData$modelSS)/length(df$IV)) #the change in root mean squared error, per trial
fivefold$AIC = length(df$IV) * log(fivefold$SS/length(df$IV)) + 2 * 2
t.test(fivefold$AIC, subjectData$AIC, paired = T) #test fivefold MFI against normal MFI for this model
library(lsa)
cosines = vector('numeric', length = 10)
for (i in 1:5){
cosines[i] = cosine(subjectData$Parameter1, fivefold[, (i + 1)]) #to get the correct columns in the fivefold dataframe (2-6)
cosines[(i+5)] = cosine(subjectData$Parameter1, fivefold[, (i + 6)]) #to get the correct columns in the fivefold dataframe (7-11)
}
mean(cosines[1:5]) #cosine similarity of parameter 1
mean(cosines[6:10]) #cosine similarity of parameter 2
fivefold = [];
for i = 1:length(included_subjects)
datafile = strcat(parentfolder, included_subjects{i}, restoffilepath);
df = readtable(datafile);
order = randperm(20);
Parameter1_ff = zeros(1, 5);
Parameter2_ff = zeros(1, 5);
SS_ff = 0;
Prediction_ff = zeros(1, length(df.Decision));
for z = 1:5
j = (z - 1) * 4 + 1;
n = z * 4;
withheld = order(j:n);
m = ((i - 1) * 5) + z;
options = optimset('fmincon');
options.Display = 'off';
result_ff = fmincon(@(params) obj_function(params, df(~ismember(1:size(df, 1), withheld), :)), initial_params, [], [], [], [], lower_bounds, upper_bounds, [], options);
Parameter1_ff(m) = result_ff(1);
Parameter2_ff(m) = result_ff(2);
for n = 1:length(withheld)
utility_choices = zeros(1, length(Choices));
for q = 1:length(Choices)
utility_choices(q) = utility(result_ff(1), result_ff(2), ...
construct1(df.IV(withheld(n)), df.Constant(withheld(n)), Choices(q)), ...
construct2(df.IV(withheld(n)), df.Constant(withheld(n)), Choices(q)), ...
construct3(df.IV(withheld(n)), df.Constant(withheld(n)), Choices(q)));
end
[~, max_utility_idx] = max(utility_choices);
Prediction_ff(withheld(n)) = Choices(max_utility_idx);
end
end
SS_ff = sum((df.Decision - Prediction_ff).^2);
fivefold(i, 1:11) = [SS_ff, Parameter1_ff, Parameter2_ff];
end
fivefold.Properties.VariableNames = {'SS', 'Par1_fold1', 'Par1_fold2', 'Par1_fold3', 'Par1_fold4', 'Par1_fold5', ...
'Par2_fold1', 'Par2_fold2', 'Par2_fold3', 'Par2_fold4', 'Par2_fold5'};
root_mean_squared_error_change = sqrt(mean(fivefold.SS) / length(df.IV)) - sqrt(mean(subjectData.modelSS) / length(df.IV));
fivefold.AIC = length(df.IV) * log(fivefold.SS / length(df.IV)) + 2 * 2;
ttest_result = ttest(fivefold.AIC, subjectData.AIC, 'paired', true);
cosines = zeros(1, 10);
for i = 1:5
cosines(i) = cosine(subjectData.Parameter1, fivefold{:, i + 1});
cosines(i + 5) = cosine(subjectData.Parameter1, fivefold{:, i + 6});
end
mean_cosine_parameter1 = mean(cosines(1:5));
mean_cosine_parameter2 = mean(cosines(6:10));
from scipy.stats import ttest_rel
from lsa.cosine import cosine
fivefold = pd.DataFrame()
for i in range(len(included_subjects)):
datafile = parentfolder + included_subjects[i] + restoffilepath
df = pd.read_csv(datafile, sep=';')
order = np.random.permutation(20)
Parameter1_ff = np.zeros(5)
Parameter2_ff = np.zeros(5)
SS_ff = 0
Prediction_ff = np.zeros(len(df['Decision']))
for z in range(1, 6):
j = (z - 1) * 4
n = z * 4
withheld = order[j:n]
m = ((i - 1) * 5) + z
result_ff = fmincon(obj_function_ff, x0=initial_params, args=( df[~df.index.isin(withheld)]),
bounds=(lower_bounds, upper_bounds))
Parameter1_ff[m - 1] = result_ff[0]
Parameter2_ff[m - 1] = result_ff[1]
for n in withheld:
utility_choices = np.zeros(len(Choices))
for q in range(len(Choices)):
utility_choices[q] = utility(result_ff[0], result_ff[1],
construct1(df['IV'][n], df['Constant'][n], Choices[q]),
construct2(df['IV'][n], df['Constant'][n], Choices[q]),
construct3(df['IV'][n], df['Constant'][n], Choices[q]))
max_utility_idx = np.argmax(utility_choices)
Prediction_ff[n] = Choices[max_utility_idx]
SS_ff = sum((df['Decision'] - Prediction_ff)**2)
fivefold.loc[i, 0:10] = [SS_ff] + list(Parameter1_ff) + list(Parameter2_ff)
fivefold.columns = ['SS', 'Par1_fold1', 'Par1_fold2', 'Par1_fold3', 'Par1_fold4', 'Par1_fold5',
'Par2_fold1', 'Par2_fold2', 'Par2_fold3', 'Par2_fold4', 'Par2_fold5']
root_mean_squared_error_change = np.sqrt(fivefold['SS'].mean() / len(df['IV'])) - np.sqrt(subjectData['modelSS'].mean() / len(df['IV']))
fivefold['AIC'] = len(df['IV']) * np.log(fivefold['SS'] / len(df['IV'])) + 2 * 2
ttest_result = ttest_rel(fivefold['AIC'], subjectData['AIC'])
cosines = np.zeros(10)
for i in range(5):
cosines[i] = cosine(subjectData['Parameter1'], fivefold.iloc[:, i + 1])
cosines[i + 5] = cosine(subjectData['Parameter1'], fivefold.iloc[:, i + 6])
mean_cosine_parameter1 = np.mean(cosines[0:5])
mean_cosine_parameter2 = np.mean(cosines[5:10])
Tutorials
Tutorial 1 - van Baar, Chang, & Sanfey, 2019
Assessing Accuracy in Capturing the Data Generation Process
### Model Performance
modelPredictions = lm(data = trialData, as.numeric(Returned) ~ Prediction)
summary(modelPredictions) # R-squared
### Visually Checking Assumptions
qplot(x = trialData$Prediction, y = as.numeric(trialData$Returned), geom = 'smooth') +
geom_abline(slope = 1, intercept = 0) + labs(x = 'Prediction', y = 'Observed') + lims(x = c(0, 30), y = c(0, 30)) # linearity
normvals = rnorm(1000, mean = 0, sd = sd(trialData$Prediction - as.numeric(trialData$Returned)))
qplot(x = trialData$Prediction - as.numeric(trialData$Returned), geom = 'density', bw = 1, color = 'Actual') +
geom_density(aes(x = normvals, color = 'Predicted'), bw = 1) + labs(x = 'Prediction - Observed', y = 'Density') #normality
qplot(x = trialData$Investment, y = (trialData$Prediction-as.numeric(trialData$Returned)), group = as.factor(trialData$Multiplier), color = as.factor(trialData$Multiplier), geom = 'smooth') +
labs(x = 'Investment', y = 'Prediction - Observed', color = 'Multiplier') #independence of error and homoscedasticity
### Assessing Independence of Observations
library(lme4)
library(MuMIn)
ris_model = lmer(data = trialData, as.numeric(Returned) ~ Prediction + (1 + Prediction | Subject))
r.squaredGLMM(ris_model)
ri_model = lmer(data = trialData, as.numeric(Returned) ~ Prediction + (1 | Subject))
r.squaredGLMM(ri_model)
ric_model = lmer(data = trialData, as.numeric(Returned) ~ Prediction + (1 | Subject))
r.squaredGLMM(ric_model)
% Model Performance
modelPredictions = fitlm(trialData, 'Returned ~ Prediction');
disp('R-squared:');
disp(modelPredictions.Rsquared.Ordinary);
% Visually Checking Assumptions
figure;
% Linearity
subplot(2, 2, 1);
scatter(trialData.Prediction, trialData.Returned, 'filled');
hold on;
refline(1, 0);
xlabel('Prediction');
ylabel('Observed');
xlim([0, 30]);
ylim([0, 30]);
title('Linearity');
% Normality
subplot(2, 2, 2);
diffResiduals = modelPredictions.Residuals.Raw - trialData.Returned;
histogram(diffResiduals, 'Normalization', 'pdf', 'EdgeColor', 'none');
hold on;
normvals = normrnd(0, std(diffResiduals), 1000, 1);
histogram(normvals, 'Normalization', 'pdf', 'EdgeColor', 'none');
xlabel('Prediction - Observed');
ylabel('Density');
title('Normality');
% Independence of Error and Homoscedasticity
subplot(2, 2, 3);
gscatter(trialData.Investment, diffResiduals, trialData.Multiplier, [], [], 15);
xlabel('Investment');
ylabel('Prediction - Observed');
title('Independence of Error and Homoscedasticity');
% Assessing Independence of Observations
ris_model = fitlme(trialData, 'Returned ~ Prediction + (1 + Prediction | SubjectID)');
disp(ris_model);
disp(r2(ris_model));
ri_model = fitlme(trialData, 'Returned ~ Prediction + (1 | SubjectID)');
disp(ri_model)
disp(r2(ri_model));
ric_model = fitlme(trialData, 'Returned ~ Prediction + (1 | SubjectID)');
disp(ric_model);
disp(r2(ric_model));
import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels.api as sm
# Model Performance
modelPredictions = sm.OLS(trialData['Returned'], sm.add_constant(trialData['Prediction'])).fit()
print(modelPredictions.summary())
# Visually Checking Assumptions
fig, axs = plt.subplots(2, 2, figsize=(12, 10))
# Linearity
axs[0, 0].scatter(trialData['Prediction'], trialData['Returned'], color='blue', alpha=0.5)
axs[0, 0].plot([0, 30], [0, 30], color='red', linestyle='--')
axs[0, 0].set_xlabel('Prediction')
axs[0, 0].set_ylabel('Observed')
axs[0, 0].set_xlim([0, 30])
axs[0, 0].set_ylim([0, 30])
axs[0, 0].set_title('Linearity')
# Normality
axs[0, 1].hist(modelPredictions.resid, bins='auto', color='blue', alpha=0.5, density=True)
sns.histplot(normvals, bins='auto', color='red', alpha=0.5, kde=True, ax=axs[0, 1])
axs[0, 1].set_xlabel('Prediction - Observed')
axs[0, 1].set_ylabel('Density')
axs[0, 1].set_title('Normality')
# Independence of Error and Homoscedasticity
axs[1, 0].scatter(trialData['Investment'], modelPredictions.resid, c=trialData['Multiplier'], cmap='viridis', s=15)
axs[1, 0].set_xlabel('Investment')
axs[1, 0].set_ylabel('Prediction - Observed')
axs[1, 0].set_title('Independence of Error and Homoscedasticity')
plt.tight_layout()
plt.show()
### Assessing Independence
# can do directly in R
Validating Parameter Recovery Process
fivefold = data.frame() #preallocate for parameters and errors from the fivefold validation to go into
for (i in 1:length(included_subjects)){
df = trialData[which(included_subjects[i] == trialData$Subject), ]
order = sample(length(df$Returned))
Theta_ff = vector('numeric', length = 5)
Phi_ff = vector('numeric', length = 5)
SS_ff = 0
Prediction_ff = vector('numeric', length(df$Returned))
for (z in 1:5){
j = round((z - 1) * (length(df$Returned)/5) + 1)
n = round(z * (length(df$Returned)/5))
withheld = order[j:n]
m = ((i - 1) * 5) + z
result_ff = fmincon(obj_function,x0 = initial_params, A = NULL, b = NULL, Aeq = NULL, beq = NULL,
lb = lower_bounds, ub = upper_bounds,
df = df[-withheld,])
Theta_ff[z] = result_ff$par[1]
Phi_ff[z] = result_ff$par[2]
for (n in 1:length(withheld)){
if (df$Investment[withheld[n]] > 10) {
Choices = seq(0, (df$Investment[withheld[n]] * df$Multiplier[withheld[n]]), round((df$Investment[withheld[n]]*df$Multiplier[withheld[n]])/10))
} else {
Choices = seq(0, (df$Investment[withheld[n]] * df$Multiplier[withheld[n]]), 1)
}
utility_choices = vector('numeric', length(Choices))
for (q in 1:length(Choices)){
utility_choices[q] = utility(theta = result_ff$par[1],
phi = result_ff$par[2],
guilt = guilt(df$Investment[withheld[n]], believed_multiplier = 4, Choices[q], df$Multiplier[withheld[n]]),
payout = payout_maximization(df$Investment[withheld[n]], df$Multiplier[withheld[n]], Choices[q]),
inequity = inequity(df$Investment[withheld[n]], df$Multiplier[withheld[n]], Choices[q], endowment = 10))
}
Prediction_ff[withheld[n]] = Choices[which(utility_choices == max(utility_choices))[1]]
}
}
SS_ff = sum((as.numeric(df$Returned) - Prediction_ff)**2)
fivefold[i, 1:12] = c(SS_ff, Theta_ff, Phi_ff, included_subjects[i])
}
colnames(fivefold) = c('SS', 'Par1_fold1', 'Par1_fold2', 'Par1_fold3', 'Par1_fold4', 'Par1_fold5',
'Par2_fold1', 'Par2_fold2', 'Par2_fold3', 'Par2_fold4', 'Par2_fold5', 'SubjectID')
sqrt(mean(fivefold$SS)/length(df$Investment)) - sqrt(mean(subjectData$modelSS)/length(df$Investment)) #the change in root mean squared error, per trial
fivefold$AIC = length(df$Investment) * log(fivefold$SS/length(df$Investment)) + 2 * 2
t.test(fivefold$AIC, subjectData$modelAIC, paired = T) #test fivefold MFI against normal MFI for this model
library(lsa)
cosines = vector('numeric', length = 10)
for (i in 1:5){
cosines[i] = cosine(subjectData$Theta, fivefold[, (i + 1)]) #to get the correct columns in the fivefold dataframe (2-6)
cosines[(i+5)] = cosine(subjectData$Phi, fivefold[, (i + 6)]) #to get the correct columns in the fivefold dataframe (7-11)
}
mean(cosines[1:5]) #cosine similarity of parameter 1
mean(cosines[6:10]) #cosine similarity of parameter 2
fivefold = table(); % preallocate for parameters and errors from the fivefold validation to go into
for i = 1:length(included_subjects)
df = trialData(included_subjects(i) == trialData.Subject, :);
order = randperm(length(df.Returned));
Theta_ff = NaN(1, 5);
Phi_ff = NaN(1, 5);
SS_ff = 0;
Prediction_ff = NaN(1, length(df.Returned));
for z = 1:5
j = round((z - 1) * (length(df.Returned)/5) + 1);
n = round(z * (length(df.Returned)/5));
withheld = order(j:n);
result_ff = fmincon(@(params) obj_function(params, df(~ismember(1:length(df.Returned), withheld), :)), ...
initial_params, [], [], [], [], lower_bounds, upper_bounds);
Theta_ff(z) = result_ff(1);
Phi_ff(z) = result_ff(2);
for n = 1:length(withheld)
if df.Investment(withheld(n)) > 10
Choices = 0:round((df.Investment(withheld(n)) * df.Multiplier(withheld(n)))/10):(df.Investment(withheld(n)) * df.Multiplier(withheld(n)));
else
Choices = 0:1:(df.Investment(withheld(n)) * df.Multiplier(withheld(n)));
end
utility_choices = NaN(1, length(Choices));
for q = 1:length(Choices)
utility_choices(q) = utility(result_ff(1), result_ff(2), ...
guilt(df.Investment(withheld(n)), 4, Choices(q), df.Multiplier(withheld(n))), ...
payout_maximization(df.Investment(withheld(n)), df.Multiplier(withheld(n)), Choices(q)), ...
inequity(df.Investment(withheld(n)), df.Multiplier(withheld(n)), Choices(q), 10));
end
[~, maxIndex] = max(utility_choices);
Prediction_ff(withheld(n)) = Choices(maxIndex);
end
end
SS_ff = sum((df.Returned - Prediction_ff).^2);
fivefold(i, 1:12) = [SS_ff, Theta_ff, Phi_ff, included_subjects(i)];
end
fivefold.Properties.VariableNames = {'SS', 'Par1_fold1', 'Par1_fold2', 'Par1_fold3', 'Par1_fold4', 'Par1_fold5', ...
'Par2_fold1', 'Par2_fold2', 'Par2_fold3', 'Par2_fold4', 'Par2_fold5', 'SubjectID'};
sqrt(mean(fivefold.SS)/length(df.Investment)) - sqrt(mean(subjectData.modelSS)/length(df.Investment)); % the change in root mean squared error, per trial
fivefold.AIC = length(df.Investment) * log(fivefold.SS/length(df.Investment)) + 2 * 2;
ttest(fivefold.AIC, subjectData.modelAIC); % test fivefold MFI against normal MFI for this model
cosines = NaN(1, 10);
for i = 1:5
cosines(i) = cosine(subjectData.Theta, fivefold{:, (i + 1)}); % to get the correct columns in the fivefold dataframe (2-6)
cosines(i + 5) = cosine(subjectData.Phi, fivefold{:, (i + 6)}); % to get the correct columns in the fivefold dataframe (7-11)
end
mean(cosines(1:5)) % cosine similarity of parameter 1
mean(cosines(6:10)) % cosine similarity of parameter 2
fivefold = pd.DataFrame() # preallocate for parameters and errors from the fivefold validation to go into
for i in range(1, len(included_subjects) + 1):
df = trialData[included_subjects[i - 1] == trialData['Subject']]
order = np.random.permutation(len(df['Returned']))
Theta_ff = np.zeros(5)
Phi_ff = np.zeros(5)
SS_ff = 0
Prediction_ff = np.zeros(len(df['Returned']))
for z in range(1, 6):
j = round((z - 1) * (len(df['Returned']) / 5) + 1)
n = round(z * (len(df['Returned']) / 5))
withheld = order[j-1:n]
result_ff = minimize(lambda params: obj_function(params, df.loc[~df.index.isin(withheld), :]),
initial_params, bounds=list(zip(lower_bounds, upper_bounds)))
Theta_ff[z-1] = result_ff.x[0]
Phi_ff[z-1] = result_ff.x[1]
for n in withheld:
if df['Investment'].iloc[n] > 10:
Choices = np.arange(0, (df['Investment'].iloc[n] * df['Multiplier'].iloc[n]),
round((df['Investment'].iloc[n] * df['Multiplier'].iloc[n])/10))
else:
Choices = np.arange(0, (df['Investment'].iloc[n] * df['Multiplier'].iloc[n]), 1)
utility_choices = np.zeros(len(Choices))
for q in range(len(Choices)):
utility_choices[q] = utility(result_ff.x[0], result_ff.x[1],
guilt(df['Investment'].iloc[n], 4, Choices[q], df['Multiplier'].iloc[n]),
payout_maximization(df['Investment'].iloc[n], df['Multiplier'].iloc[n], Choices[q]),
inequity(df['Investment'].iloc[n], df['Multiplier'].iloc[n], Choices[q], 10))
correct_choice = Choices[np.argmax(utility_choices)]
Prediction_ff[n] = correct_choice
SS_ff = np.sum((df['Returned'].to_numpy() - Prediction_ff)**2)
fivefold[i - 1:i] = [SS_ff] + list(Theta_ff) + list(Phi_ff) + [included_subjects[i - 1]]
sqrt(np.mean(fivefold['SS'])/len(df['Investment'])) - np.sqrt(np.mean(subjectData['modelSS'])/len(df['Investment'])) # the change in root mean squared error, per trial
fivefold['AIC'] = len(df['Investment']) * np.log(fivefold['SS']/len(df['Investment'])) + 2 * 2
ttest_rel(fivefold['AIC'], subjectData['modelAIC']) # test fivefold MFI against normal MFI for this model
cosines = np.zeros(10)
for i in range(1, 6):
cosines[i - 1] = cosine_similarity(subjectData['Theta'].to_numpy().reshape(1, -1), fivefold.iloc[:, i:i+1].to_numpy().T)[0, 0]
cosines[i + 4] = cosine_similarity(subjectData['Phi'].to_numpy().reshape(1, -1), fivefold.iloc[:, i+5:i+6].to_numpy().T)[0, 0]
np.mean(cosines[0:5]) # cosine similarity of parameter 1
np.mean(cosines[5:10]) # cosine similarity of parameter 2
Tutorial 2 - Galvan & Sanfey, 2024
Assessing Accuracy in Capturing the Data Generation Process
### Model Performance
taxRatePredictions = lm(data = trialData, observedTaxRate ~ predictedTaxRate)
summary(taxRatePredictions) # R-squared predicting the subject's tax rate
payoutPredictions = lm(data = trialData, observedOutcome ~ predictedOutcome)
summary(payoutPredictions) # R-squared predicting the subject's payout, we're going to focus on this
### Visually Checking Assumptions
qplot(x = trialData$predictedOutcome, y = trialData$observedOutcome, geom = 'smooth') +
geom_abline(slope = 1, intercept = 0) + labs(x = 'Prediction', y = 'Observed') + lims(x = c(0, 30), y = c(0, 30)) # linearity
normvals = rnorm(1000, mean = 0, sd = sd(trialData$predictedOutcome - trialData$observedOutcome))
qplot(x = (trialData$predictedOutcome - trialData$observedOutcome), geom = 'density', bw = 1, color = 'Actual') +
geom_density(aes(x = normvals, color = 'Predicted'), bw = 1) + labs(x = 'Prediction - Observed', y = 'Density') #normality
qplot(x = trialData$initialAllocation, y = (trialData$predictedOutcome - trialData$observedOutcome), group = as.factor(trialData$Multiplier), color = as.factor(trialData$Multiplier), geom = 'smooth') +
labs(x = 'Initial Allocation', y = 'Prediction - Observed', color = 'Multiplier') #independence of error and homoscedasticity
### Assessing Independence
library(lme4)
library(MuMin)
ris_model = lmer(data = trialData, observedOutcome ~ predictedOutcome + (1 + predictedOutcome | SubjectID))
summary(ris_model) #model should have issues
r.squaredGLMM(ris_model)
cri_model = lmer(data = trialData, observedOutcome ~ predictedOutcome + condition + (1 | SubjectID))
summary(cri_model) #model should have issues
r.squaredGLMM(cri_model)
cric_model = lmer(data = trialData, observedOutcome ~ predictedOutcome + condition + (1 + condition | SubjectID))
summary(cric_model) #model should have issues
r.squaredGLMM(cric_model)
ri_model = lmer(data = trialData, observedOutcome ~ predictedOutcome + (1 | SubjectID))
summary(ri_model)
r.squaredGLMM(ri_model) #if conditional Rsq is between 0 and 0.05 lower than the multiple Rsq, that's good enough
% Model Performance
taxRatePredictions = fitlm(trialData, 'observedTaxRate ~ predictedTaxRate');
disp(taxRatePredictions); % Display summary
payoutPredictions = fitlm(trialData, 'observedOutcome ~ predictedOutcome');
disp(payoutPredictions); % Display summary
% Visually Checking Assumptions
figure;
% Linearity
subplot(2, 2, 1);
scatter(trialData.predictedOutcome, trialData.observedOutcome, 'filled');
hold on;
plot([0 30], [0 30], '--k');
xlabel('Prediction');
ylabel('Observed');
xlim([0 30]);
ylim([0 30]);
title('Linearity');
% Normality
subplot(2, 2, 2);
histogram(trialData.predictedOutcome - trialData.observedOutcome, 'Normalization', 'probability');
hold on;
normvals = normrnd(0, std(trialData.predictedOutcome - trialData.observedOutcome), 1, 1000);
histogram(normvals, 'Normalization', 'probability');
xlabel('Prediction - Observed');
ylabel('Density');
title('Normality');
% Independence of Error and Homoscedasticity
subplot(2, 2, 3);
gscatter(trialData.initialAllocation, trialData.predictedOutcome - trialData.observedOutcome, trialData.Multiplier, 'br', '.', 10);
xlabel('Initial Allocation');
ylabel('Prediction - Observed');
title('Independence of Error and Homoscedasticity');
legend('off'); % Turn off automatic legend
% Assessing Independence
ris_model = fitlme(trialData, 'observedOutcome ~ predictedOutcome + (1 + predictedOutcome | SubjectID)');
disp(ris_model);
disp(r2(ris_model));
cri_model = fitlme(trialData, 'observedOutcome ~ predictedOutcome + condition + (1 | SubjectID)');
disp(cri_model);
disp(r2(cri_model));
cric_model = fitlme(trialData, 'observedOutcome ~ predictedOutcome + condition + (1 + condition | SubjectID)');
disp(cric_model);
disp(r2(cric_model));
ri_model = fitlme(trialData, 'observedOutcome ~ predictedOutcome + (1 | SubjectID)');
disp(ri_model);
disp(r2(ri_model));
import seaborn as sns
import matplotlib.pyplot as plt
from statsmodels.regression.linear_model import OLS
import statsmodels.api as sm
# Model Performance
tax_rate_predictions = OLS.from_formula('observedTaxRate ~ predictedTaxRate', trialData).fit()
print(tax_rate_predictions.summary())
payout_predictions = OLS.from_formula('observedOutcome ~ predictedOutcome', trialData).fit()
print(payout_predictions.summary())
# Visually Checking Assumptions
# Linearity
plt.subplot(2, 2, 1)
sns.scatterplot(x='predictedOutcome', y='observedOutcome', data=trialData)
plt.plot([0, 30], [0, 30], '--k')
plt.xlabel('Prediction')
plt.ylabel('Observed')
plt.title('Linearity')
# Normality
plt.subplot(2, 2, 2)
sns.histplot(trialData['predictedOutcome'] - trialData['observedOutcome'], kde=True, color='blue', label='Actual')
norm_vals = np.random.normal(0, np.std(trialData['predictedOutcome'] - trialData['observedOutcome']), 1000)
sns.histplot(norm_vals, kde=True, color='orange', label='Predicted')
plt.xlabel('Prediction - Observed')
plt.ylabel('Density')
plt.title('Normality')
plt.legend()
# Independence of Error and Homoscedasticity
plt.subplot(2, 2, 3)
sns.lineplot(x='initialAllocation', y='predictedOutcome - observedOutcome', hue='Multiplier', style='Multiplier', markers=True, data=trialData)
plt.xlabel('Initial Allocation')
plt.ylabel('Prediction - Observed')
plt.title('Independence of Error and Homoscedasticity')
plt.tight_layout()
plt.show()
### Assessing Independence
# can do directly in R
Validating Parameter Recovery Process
fivefold = data.frame()
for (i in 1:length(included_subjects)){
datafile = paste(parentfolder, included_subjects[i], restoffilepath, sep = '') # produces a character vector 'parentfolder/included_subjects[i]**.filetype'
fullData = read.csv2(datafile)
thetaPerCondition = vector('numeric', length(conditions) * 5)
phiPerCondition = vector('numeric', length(conditions) * 5)
SSPerCondition = vector('numeric', length(conditions))
for (j in 1:length(conditions)){
df = fullData[which(fullData$condition == conditions[j]), c(49, 40:48), 33] #49 is subject's initial allocation, 40:48 are players 1:9 initial allocation, 33 is redistribution rate
df$redistributionRate = df$redistributionRate/100 #converting to a decimal from a percent
order = sample(20)
for (z in 1:5){
j = (z - 1) * 4 + 1
n = z * 4
withheld = order[j:n]
m = ((j - 1) * 5) + z
result = fmincon(obj_function,x0 = initial_params, A = NULL, b = NULL, Aeq = NULL, beq = NULL,
lb = lower_bounds, ub = upper_bounds,
df = df[-withheld,])
thetaPerCondition[m] = result$par[1]
phiPerCondition[m] = result$par[2]
for (k in 1:length(withheld)){
y = withheld[k]
Utility = vector('numeric', length(Choices))
for (n in 1:length(Choices)){
Utility[n] = utility(theta = thetaPerCondition[m],
phi = phiPerCondition[m],
Equity = equity(new_value(df[y, 1:10], choices[n]), df[y, 1:10], choices[n]),
Equality = equality(new_value(df[y, 1:10], choices[n]), df[y, 1:10], choices[n]),
Payout = payout(new_value(df[y, 1], choices[n]), df[y, 1], choices[n]))
}
correct_choice = which(Utility == max(Utility))
predictedOutcome = new_value(df$myself[y], Choices[correct_choice[sample(length(correct_choice), 1)]])
observedOutcome = new_value(df$myself[y], df$redistributionRate)
SSPerCondition = SSPerCondition + (observedOutcome - predictedOutcome)**2
}
}
}
fivefold[i, 1:45] = c(included_subjects[i], thetaPerCondition, phiPerCondition, SSPerCondition)
}
colnames(fivefold) = c('SubjectID',
'thetaMeritF1', 'thetaMeritF2', 'thetaMeritF3', 'thetaMeritF4', 'thetaMeritF5',
'thetaEntitlementF1', 'thetaEntitlementF2', 'thetaEntitlementF3', 'thetaEntitlementF4', 'thetaEntitlementF5',
'thetaCorruptionF1', 'thetaCorruptionF2', 'thetaCorruptionF3', 'thetaCorruptionF4', 'thetaCorruptionF5',
'thetaLuckF1', 'thetaLuckF2', 'thetaLuckF3', 'thetaLuckF4', 'thetaLuckF5',
'phiMeritF1', 'phiMeritF2', 'phiMeritF3', 'phiMeritF4', 'phiMeritF5',
'phiEntitlementF1', 'phiEntitlementF2', 'phiEntitlementF3', 'phiEntitlementF4', 'phiEntitlementF5',
'phiCorruptionF1', 'phiCorruptionF2', 'phiCorruptionF3', 'phiCorruptionF4', 'phiCorruptionF5',
'phiLuckF1', 'phiLuckF2', 'phiLuckF3', 'phiLuckF4', 'phiLuckF5',
'SSMerit', 'SSEntitlement', 'SSCorruption', 'SSLuck')
sqrt(mean(fivefold$SSMerit)/length(df$redistributionRate)) - sqrt(mean(subjectData$SSMerit)/length(df$redistributionRate)) #the change in root mean squared error, per trial in merit condition
sqrt(mean(fivefold$SSEntitlement)/length(df$redistributionRate)) - sqrt(mean(subjectData$SSEntitlement)/length(df$redistributionRate)) #the change in root mean squared error, per trial in entilement condition
sqrt(mean(fivefold$SSCorruption)/length(df$redistributionRate)) - sqrt(mean(subjectData$SSCorruption)/length(df$redistributionRate)) #the change in root mean squared error, per trial in corruption condition
sqrt(mean(fivefold$SSLuck)/length(df$redistributionRate)) - sqrt(mean(subjectData$SSLuck)/length(df$redistributionRate)) #the change in root mean squared error, per trial in luck condition
fivefold$AICMerit = length(df$redistributionRate) * log(fivefold$SSMerit/length(df$redistributionRate)) + 2 * 2
fivefold$AICEntitlement = length(df$redistributionRate) * log(fivefold$SSEntitlement/length(df$redistributionRate)) + 2 * 2
fivefold$AICCorruption = length(df$redistributionRate) * log(fivefold$SSCorruption/length(df$redistributionRate)) + 2 * 2
fivefold$AICLuck = length(df$redistributionRate) * log(fivefold$SSLuck/length(df$redistributionRate)) + 2 * 2
t.test(fivefold$AICMerit, subjectData$AIC, paired = T) #test fivefold MFI against normal MFI for this model
t.test(fivefold$AICEntitlement, subjectData$AICEntitlement, paired = T) #test fivefold MFI against normal MFI for this model
t.test(fivefold$AICCorruption, subjectData$AICCorruption, paired = T) #test fivefold MFI against normal MFI for this model
t.test(fivefold$AICLuck, subjectData$AICLuck, paired = T) #test fivefold MFI against normal MFI for this model
library(lsa)
cosines = vector('numeric', length = length(conditions) * 2 * 5) #number of conditions times number of parameters times the number of folds
for (i in 1:5){
for (j in 1:length(conditions)){
k = ((i - 1) * 5) + j
cosines[k] = cosine(subjectData$Parameter1, fivefold[, (k + 1)])
cosines[(k+20)] = cosine(subjectData$Parameter1, fivefold[, (k + 21)])
}
}
mean(cosines[1:5]) #cosine similarity of theta in merit condition
mean(cosines[6:10]) #cosine similarity of theta in entitlement condition
mean(cosines[11:15]) #cosine similarity of theta in corruption condition
mean(cosines[16:20]) #cosine similarity of theta in luck condition
mean(cosines[21:25]) #cosine similarity of phi in merit condition
mean(cosines[26:30]) #cosine similarity of phi in entitlement condition
mean(cosines[31:35]) #cosine similarity of phi in corruption condition
mean(cosines[36:40]) #cosine similarity of phi in luck condition
mean(cosines[1:20]) #cosine similarity of theta across all conditions
mean(cosines[21:40]) #cosine similarity of phi across all conditions
fivefold = table();
for i = 1:length(included_subjects)
datafile = strcat(parentfolder, included_subjects{i}, restoffilepath);
fullData = readtable(datafile);
thetaPerCondition = zeros(length(conditions) * 5, 1);
phiPerCondition = zeros(length(conditions) * 5, 1);
SSPerCondition = zeros(length(conditions), 1);
for j = 1:length(conditions)
df = fullData(fullData.condition == conditions{j}, [49, 40:48, 33]);
df.redistributionRate = df.redistributionRate / 100;
order = randperm(20);
for z = 1:5
start_index = (z - 1) * 4 + 1;
end_index = z * 4;
withheld = order(start_index:end_index);
m = ((start_index - 1) * 5) + z;
result = fmincon(@obj_function, initial_params, [], [], [], [], lower_bounds, upper_bounds, df(~ismember(1:20, withheld), :));
thetaPerCondition(m) = result(1);
phiPerCondition(m) = result(2);
for k = 1:length(withheld)
y = withheld(k);
Utility = zeros(length(Choices), 1);
for n = 1:length(Choices)
Utility(n) = utility(result(1), result(2), ...
equity(new_value(df{y, 1:10}, choices{n}), df{y, 1:10}, choices{n}), ...
equality(new_value(df{y, 1:10}, choices{n}), df{y, 1:10}, choices{n}), ...
payout(new_value(df{y, 1}, choices{n}), df{y, 1}, choices{n}));
end
[~, correct_choice] = max(Utility);
predictedOutcome = new_value(df.myself(y), Choices(correct_choice(randi(length(correct_choice)))));
observedOutcome = new_value(df.myself(y), df.redistributionRate(y));
SSPerCondition(j) = SSPerCondition(j) + (observedOutcome - predictedOutcome)^2;
end
end
end
fivefold{i, 1:45} = [included_subjects{i}, thetaPerCondition', phiPerCondition', SSPerCondition'];
end
fivefold.Properties.VariableNames = {'SubjectID', ...
'thetaMeritF1', 'thetaMeritF2', 'thetaMeritF3', 'thetaMeritF4', 'thetaMeritF5', ...
'thetaEntitlementF1', 'thetaEntitlementF2', 'thetaEntitlementF3', 'thetaEntitlementF4', 'thetaEntitlementF5', ...
'thetaCorruptionF1', 'thetaCorruptionF2', 'thetaCorruptionF3', 'thetaCorruptionF4', 'thetaCorruptionF5', ...
'thetaLuckF1', 'thetaLuckF2', 'thetaLuckF3', 'thetaLuckF4', 'thetaLuckF5', ...
'phiMeritF1', 'phiMeritF2', 'phiMeritF3', 'phiMeritF4', 'phiMeritF5', ...
'phiEntitlementF1', 'phiEntitlementF2', 'phiEntitlementF3', 'phiEntitlementF4', 'phiEntitlementF5', ...
'phiCorruptionF1', 'phiCorruptionF2', 'phiCorruptionF3', 'phiCorruptionF4', 'phiCorruptionF5', ...
'phiLuckF1', 'phiLuckF2', 'phiLuckF3', 'phiLuckF4', 'phiLuckF5', ...
'SSMerit', 'SSEntitlement', 'SSCorruption', 'SSLuck'};
% Change in root mean squared error per trial in each condition
disp(sqrt(mean(fivefold.SSMerit) / height(df) - sqrt(mean(subjectData.SSMerit) / height(df))));
disp(sqrt(mean(fivefold.SSEntitlement) / height(df) - sqrt(mean(subjectData.SSEntitlement) / height(df))));
disp(sqrt(mean(fivefold.SSCorruption) / height(df) - sqrt(mean(subjectData.SSCorruption) / height(df))));
disp(sqrt(mean(fivefold.SSLuck) / height(df) - sqrt(mean(subjectData.SSLuck) / height(df))));
fivefold.AICMerit = height(df) * log(fivefold.SSMerit / height(df)) + 2 * 2;
fivefold.AICEntitlement = height(df) * log(fivefold.SSEntitlement / height(df)) + 2 * 2;
fivefold.AICCorruption = height(df) * log(fivefold.SSCorruption / height(df)) + 2 * 2;
fivefold.AICLuck = height(df) * log(fivefold.SSLuck / height(df)) + 2 * 2;
% Test fivefold MFI against normal MFI for this model
disp(ttest(fivefold.AICMerit, subjectData.AIC, 'paired', true));
disp(ttest(fivefold.AICEntitlement, subjectData.AICEntitlement, 'paired', true));
disp(ttest(fivefold.AICCorruption, subjectData.AICCorruption, 'paired', true));
disp(ttest(fivefold.AICLuck, subjectData.AICLuck, 'paired', true));
cosines = zeros(length(conditions) * 2 * 5, 1);
for i = 1:5
for j = 1:length(conditions)
k = ((i - 1) * 5) + j;
cosines(k) = cosine(subjectData.Parameter1, fivefold{:, k + 1});
cosines(k + 20) = cosine(subjectData.Parameter1, fivefold{:, k + 21});
end
end
% Cosine similarity of theta and phi across conditions and folds
disp(mean(cosines(1:5)));
disp(mean(cosines(6:10)));
disp(mean(cosines(11:15)));
disp(mean(cosines(16:20)));
disp(mean(cosines(21:25)));
disp(mean(cosines(26:30)));
disp(mean(cosines(31:35)));
disp(mean(cosines(36:40)));
% Cosine similarity of theta across all conditions and folds
disp(mean(cosines(1:20)));
% Cosine similarity of phi across all conditions and folds
disp(mean(cosines(21:40)));
fivefold = pd.DataFrame()
for i in range(len(included_subjects)):
datafile = parentfolder + included_subjects[i] + restoffilepath
fullData = pd.read_csv(datafile)
thetaPerCondition = np.zeros(len(conditions) * 5)
phiPerCondition = np.zeros(len(conditions) * 5)
SSPerCondition = np.zeros(len(conditions))
for j in range(len(conditions)):
df = fullData[fullData['condition'] == conditions[j]][[48, *range(39, 48), 32]]
df['redistributionRate'] = df['redistributionRate'] / 100
order = np.random.permutation(20)
for z in range(1, 6):
start_index = (z - 1) * 4
end_index = z * 4
withheld = order[start_index:end_index]
m = (start_index * 5) + z - 1 # Corrected indexing
result = minimize(obj_function, initial_params, args=(df[~df.index.isin(withheld)],), bounds=list(zip(lower_bounds, upper_bounds)))
thetaPerCondition[m] = result.x[0]
phiPerCondition[m] = result.x[1]
for k in withheld:
Utility = np.zeros(len(Choices))
for n in range(len(Choices)):
Utility[n] = utility(result.x[0], result.x[1],
equity(new_value(df.iloc[k, 0:9], choices[n]), df.iloc[k, 0:9], choices[n]),
equality(new_value(df.iloc[k, 0:9], choices[n]), df.iloc[k, 0:9], choices[n]),
payout(new_value(df.iloc[k, 0], choices[n]), df.iloc[k, 1], choices[n]))
correct_choice = np.argmax(Utility)
predictedOutcome = new_value(df['myself'].iloc[k], Choices[np.random.choice(np.where(Utility == max(Utility))[0])])
observedOutcome = new_value(df['myself'].iloc[k], df['redistributionRate'].iloc[k])
SSPerCondition[j] += (observedOutcome - predictedOutcome) ** 2
fivefold.loc[i, 0:44] = [included_subjects[i], *thetaPerCondition, *phiPerCondition, *SSPerCondition]
fivefold.columns = ['SubjectID',
'thetaMeritF1', 'thetaMeritF2', 'thetaMeritF3', 'thetaMeritF4', 'thetaMeritF5',
'thetaEntitlementF1', 'thetaEntitlementF2', 'thetaEntitlementF3', 'thetaEntitlementF4', 'thetaEntitlementF5',
'thetaCorruptionF1', 'thetaCorruptionF2', 'thetaCorruptionF3', 'thetaCorruptionF4', 'thetaCorruptionF5',
'thetaLuckF1', 'thetaLuckF2', 'thetaLuckF3', 'thetaLuckF4', 'thetaLuckF5',
'phiMeritF1', 'phiMeritF2', 'phiMeritF3', 'phiMeritF4', 'phiMeritF5',
'phiEntitlementF1', 'phiEntitlementF2', 'phiEntitlementF3', 'phiEntitlementF4', 'phiEntitlementF5',
'phiCorruptionF1', 'phiCorruptionF2', 'phiCorruptionF3', 'phiCorruptionF4', 'phiCorruptionF5',
'phiLuckF1', 'phiLuckF2', 'phiLuckF3', 'phiLuckF4', 'phiLuckF5',
'SSMerit', 'SSEntitlement', 'SSCorruption', 'SSLuck']
# Change in root mean squared error per trial in each condition
print(np.sqrt(np.mean(fivefold['SSMerit']) / len(df) - np.sqrt(np.mean(subjectData['SSMerit']) / len(df))))
print(np.sqrt(np.mean(fivefold['SSEntitlement']) / len(df) - np.sqrt(np.mean(subjectData['SSEntitlement']) / len(df))))
print(np.sqrt(np.mean(fivefold['SSCorruption']) / len(df) - np.sqrt(np.mean(subjectData['SSCorruption']) / len(df))))
print(np.sqrt(np.mean(fivefold['SSLuck']) / len(df) - np.sqrt(np.mean(subjectData['SSLuck']) / len(df))))
fivefold['AICMerit'] = len(df) * np.log(fivefold['SSMerit'] / len(df)) + 2 * 2
fivefold['AICEntitlement'] = len(df) * np.log(fivefold['SSEntitlement'] / len(df)) + 2 * 2
fivefold['AICCorruption'] = len(df) * np.log(fivefold['SSCorruption'] / len(df)) + 2 * 2
fivefold['AICLuck'] = len(df) * np.log(fivefold['SSLuck'] / len(df)) + 2 * 2
# Test fivefold MFI against normal MFI for this model
print(ttest_rel(fivefold['AICMerit'], subjectData['AIC']))
print(ttest_rel(fivefold['AICEntitlement'], subjectData['AICEntitlement']))
print(ttest_rel(fivefold['AICCorruption'], subjectData['AICCorruption']))
print(ttest_rel(fivefold['AICLuck'], subjectData['AICLuck']))
cosines = np.zeros(len(conditions) * 2 * 5)
for i in range(5):
for j in range(len(conditions)):
k = (i * 5) + j
cosines[k] = cosine(subjectData['Parameter1'], fivefold.iloc[:, k + 1])
cosines[k + 20] = cosine(subjectData['Parameter1'], fivefold.iloc[:, k + 21])
# Cosine similarity of theta and phi across conditions and folds
print(np.mean(cosines[0:4]))
print(np.mean(cosines[5:9]))
print(np.mean(cosines[10:15]))
print(np.mean(cosines[15:19]))
print(np.mean(cosines[20:24]))
print(np.mean(cosines[25:29]))
print(np.mean(cosines[30:34]))
print(np.mean(cosines[35:39]))
# Cosine similarity of theta across all conditions and folds
print(np.mean(cosines[0:19]))
# Cosine similarity of phi across all conditions and folds
print(np.mean(cosines[20:39]))
Tutorial 3 - Crockett et al., 2014
Assessing Accuracy in Capturing the Data Generation Process
### Assessing Model Performance
sum(trialData$choseHarm == round(trialData$ProbHarm))/nrow(trialData)
qplot(subjectData$BIC[-20], geom = 'density')
worstExplained = which(subjectData$BIC[-20] > as.numeric(summary(subjectData$BIC[-20])[4]))
qplot(data = trialData[which(trialData$SubjectID %in% altSubjectData$SubjectID[worstExplained]),], x = moneyDifference, y = choseHarm - ProbHarm, group = SubjectID) + geom_smooth()
qplot(data = trialData[which(trialData$SubjectID %in% altSubjectData$SubjectID[worstExplained]),], x = shocksDifference, y = choseHarm - ProbHarm, group = SubjectID) + geom_smooth()
### Checking Assumptions
ggplot(data = trialData) + geom_smooth(aes(x = ProbHarm, y = choseHarm, group = isSelf, color = factor(isSelf)))
ggplot(data = trialData) + geom_density(aes(x = choseHarm, fill = factor(choseHarm)))
normvals = rnorm(1000, mean = 0, sd = sd(trialData$ProbHarm - trialData$choseHarm))
qplot(x = trialData$ProbHarm - trialData$choseHarm, geom = 'density', bw = sd(trialData$ProbHarm - trialData$choseHarm), color = 'Actual') +
geom_density(aes(x = normvals, color = 'Predicted'), bw = sd(trialData$ProbHarm - trialData$choseHarm))
qplot(x = trialData$shocksDifference, y = trialData$ProbHarm - trialData$choseHarm, geom = 'jitter') + geom_smooth()
qplot(x = trialData$moneyDifference, y = trialData$ProbHarm - trialData$choseHarm, geom = 'jitter') + geom_smooth()
qplot(x = trialData$shocksDifference, y = trialData$ProbHarm - trialData$choseHarm, color = factor(trialData$isSelf), geom = 'smooth') + lims(y = c(-1,1))
qplot(x = trialData$moneyDifference, y = trialData$ProbHarm - trialData$choseHarm, color = factor(trialData$isSelf), geom = 'smooth') + lims(y = c(-1,1))
### Assessing Independence
library(lme4)
library(MuMIn)
ris_model = glmer(data = trialData, choseHarm ~ ProbHarm + (1 + ProbHarm | SubjectID), family = "binomial")
r.squaredGLMM(ris_model)
ri_model = glmer(data = trialData, choseHarm ~ ProbHarm + (1 | SubjectID), family = "binomial")
r.squaredGLMM(ri_model)
ric_model = glmer(data = trialData, choseHarm ~ ProbHarm + isSelf + (1 | SubjectID), family = "binomial")
r.squaredGLMM(ric_model)
Validating Parameter Recovery Process
fivefold = data.frame() #preallocate for parameters and errors from the fivefold validation to go into
trialData$ProbHarm_ff = 0
optimize = function(df){
tryCatch({
fmincon(obj_function, x0 = initial_params, lb = lower_bounds, ub = upper_bounds, df = df, optimMethod = "MLE")
}, error = function(e){
fmincon(obj_function, x0 = initial_params, lb = lower_bounds, ub = upper_bounds, df = df, optimMethod = "OLS")
})
}
adj = 1
for (i in 1:length(included_subjects)){
df = grab_data(included_subjects[i])
if (is.character(df)){adj = adj + 1; next}
df$ProbHarm = 0
order = sample(152)
KSelf_ff = vector('numeric', length = 5)
KOther_ff = vector('numeric', length = 5)
BSelf_ff = vector('numeric', length = 5)
BOther_ff = vector('numeric', length = 5)
E_ff = vector('numeric', length = 5)
G_ff = vector('numeric', length = 5)
Deviance_ff = 0
df$Pred = 0
for (z in 1:5){
j = round((z - 1) * (152/5) + 1)
n = round(z * (152/5))
withheld = order[j:n]
result_ff = optimize(df[-withheld,])
KSelf_ff[z] = result_ff$par[1]
KOther_ff[z] = result_ff$par[2]
BSelf_ff[z] = result_ff$par[3]
BOther_ff[z] = result_ff$par[4]
E_ff[z] = result_ff$par[5]
G_ff[z] = result_ff$par[6]
df[withheld, ] = generate_predictions(df[withheld, ], result_ff)
}
Deviance_ff = -2*sum(df$choseHarm * log(df$ProbHarm) + (1 - df$choseHarm) * log(1 - df$ProbHarm))
fivefold[i, 1:32] = c(included_subjects[i], Deviance_ff, KSelf_ff, KOther_ff, BSelf_ff, BOther_ff, E_ff, G_ff)
j = (i-adj)*152 + 1; n = (i-adj + 1)*152
trialData$ProbHarm_ff[j:n] = df$ProbHarm
}
colnames(fivefold) = c('SubjectID', 'Deviance', 'KS_F1', 'KS_F2', 'KS_F3', 'KS_F4', 'KS_F5',
'KO_F1','KO_F2','KO_F3','KO_F4','KO_F5', 'BS_F1', 'BS_F2', 'BS_F3', 'BS_F4', 'BS_F5',
'BO_F1','BO_F2','BO_F3','BO_F4','BO_F5', 'E_F1', 'E_F2', 'E_F3', 'E_F4', 'E_F5',
'G_F1', 'G_F2', 'G_F3', 'G_F4', 'G_F5')
sum(round(trialData$ProbHarm_ff) == trialData$choseHarm)/nrow(trialData)
fivefold$BIC = as.numeric(fivefold$Deviance) + log(152) * 6
t.test(fivefold$BIC, subjectData$BIC, paired = T) #test fivefold MFI against normal MFI for this model
library(lsa)
cosines = vector('numeric', length = 60)
for (i in 1:5){
cosines[i] = cosine(as.numeric(subjectData$Kappa_Self[-20]), as.numeric(fivefold[-20, (i + 2)]))
cosines[(i+5)] = cosine(as.numeric(subjectData$Kappa_Other[-20]), as.numeric(fivefold[-20, (i + 7)]))
cosines[(i+10)] = cosine(as.numeric(subjectData$Beta_Self[-20]), as.numeric(fivefold[-20, (i + 12)]))
cosines[(i+15)] = cosine(as.numeric(subjectData$Beta_Other[-20]), as.numeric(fivefold[-20, (i + 17)]))
cosines[(i+20)] = cosine(as.numeric(subjectData$Epsilon[-20]), as.numeric(fivefold[-20, (i + 22)]))
cosines[(i+25)] = cosine(as.numeric(subjectData$Gamma[-20]), as.numeric(fivefold[-20, (i + 27)]))
}
mean(cosines[1:5])
mean(cosines[6:10])
mean(cosines[11:15])
mean(cosines[16:20])
mean(cosines[21:25])
mean(cosines[26:30])
Tutorial 4 - Li et al., 2022
Assessing Accuracy in Capturing the Data Generation Process
### Assessing Model Performance
sum(altTrialData$Chose1 == round(altTrialData$ar_Prob1))/nrow(altTrialData)
qplot(subjectData$BIC, geom = 'density')
worstExplained = which(subjectData$BIC > as.numeric(summary(subjectData$BIC)[4]))
qplot(data = altTrialData[which(altTrialData$SubjectID %in% altSubjectData$SubjectID[worstExplained]),], x = a0-a1, y = Chose1 - Prob1, group = SubjectID) + geom_smooth(se=F)
qplot(data = altTrialData[which(altTrialData$SubjectID %in% altSubjectData$SubjectID[worstExplained]),], x = a0-a2, y = Chose1 - Prob1, group = SubjectID) + geom_smooth(se=F)
qplot(data = altTrialData[which(altTrialData$SubjectID %in% altSubjectData$SubjectID[worstExplained]),], x = b0-b1, y = Chose1 - Prob1, group = SubjectID) + geom_smooth(se=F)
qplot(data = altTrialData[which(altTrialData$SubjectID %in% altSubjectData$SubjectID[worstExplained]),], x = b0-b2, y = Chose1 - Prob1, group = SubjectID) + geom_smooth(se=F)
### Checking Assumptions
ggplot(data = altTrialData) + geom_smooth(aes(x = ar_Prob1, y = Chose1, group = a0 < b0, color = factor(a0 < b0)))
ggplot(data = altTrialData) + geom_density(aes(x = Chose1, group = a0 < b0, fill = factor(a0 < b0), alpha = 0.5))
normvals = rnorm(1000, mean = 0, sd = sd(altTrialData$ar_Prob1 - altTrialData$Chose1))
qplot(x = altTrialData$ar_Prob1 - altTrialData$Chose1, geom = 'density', bw = sd(altTrialData$ar_Prob1 - altTrialData$Chose1), color = 'Actual') +
geom_density(aes(x = normvals, color = 'Predicted'), bw = sd(altTrialData$ar_Prob1 - altTrialData$Chose1))
qplot(x = abs(altTrialData$a1 - altTrialData$a2), y = altTrialData$ar_Prob1 - altTrialData$Chose1, geom = 'jitter') + geom_smooth(method = 'loess')
qplot(x = abs(altTrialData$a1 - altTrialData$a2), y = altTrialData$ar_Prob1 - altTrialData$Chose1, color = factor(altTrialData$a0 > altTrialData$b0), geom = 'jitter') + geom_smooth(method = 'loess')
### Assessing Independence
library(lme4)
library(MuMIn)
ris_model = glmer(data = altTrialData, Chose1 ~ ar_Prob1 + (1 + ar_Prob1 | SubjectID), family = "binomial")
r.squaredGLMM(ris_model)
ri_model = glmer(data = altTrialData, Chose1 ~ ar_Prob1 + (1 | SubjectID), family = "binomial")
r.squaredGLMM(ri_model)
ric_model = glmer(data = altTrialData, Chose1 ~ ar_Prob1 + factor(a0<b0) + (1 | SubjectID), family = "binomial")
r.squaredGLMM(ric_model)
Validating Parameter Recovery Process
fivefold = data.frame()
trialData$Prob1_ff = 0
for (i in 1:length(included_subjects)){
df = grab_data(included_subjects[i])
df$Prob1 = 0
order = sample(nrow(df))
A_ff = vector('numeric', length = 5)
R_ff = vector('numeric', length = 5)
B_ff = vector('numeric', length = 5)
E_ff = vector('numeric', length = 5)
G_ff = vector('numeric', length = 5)
Deviance_ff = 0
df$Pred = 0
for (z in 1:5){
j = round((z - 1) * (nrow(df)/5) + 1)
n = round(z * (nrow(df)/5))
withheld = order[j:n]
result_ff = optimize(of_ar,
initial_params[c(1,3:6)],
lower_bounds[c(1,3:6)],
upper_bounds[c(1,3:6)],
df[-withheld,])
A_ff[z] = result_ff$par[1]
R_ff[z] = result_ff$par[2]
B_ff[z] = result_ff$par[3]
E_ff[z] = result_ff$par[4]
G_ff[z] = result_ff$par[5]
pars = c(result_ff$par[1], 0, result_ff$par[2:5])
df$Prob1[withheld] = generatePredictions(pars, df[withheld, ])
}
Deviance_ff = -2*sum(df$Chose1 * log(df$Prob1) +
(1 - df$Chose1) * log(1 - df$Prob1))
fivefold[i, 1:27] = c(included_subjects[i], Deviance_ff,
A_ff, R_ff, B_ff, E_ff, G_ff)
trialData$Prob1_ff[which(trialData$SubjectID == included_subjects[i])] = df$Prob1
}
colnames(fivefold) = c('SubjectID', 'Deviance',
'A_F1', 'A_F2', 'A_F3', 'A_F4', 'A_F5',
'R_F1', 'R_F2', 'R_F3', 'R_F4', 'R_F5',
'B_F1','B_F2','B_F3','B_F4','B_F5',
'E_F1', 'E_F2', 'E_F3', 'E_F4', 'E_F5',
'G_F1', 'G_F2', 'G_F3', 'G_F4', 'G_F5')
sum(round(trialData$Prob1_ff) == trialData$Chose1)/nrow(trialData)
fivefold$BIC = as.numeric(fivefold$Deviance) + log(65) * 6
t.test(fivefold$BIC, altSubjectData$BIC_M5, paired = T) #test fivefold MFI against normal MFI for this model
library(lsa)
cosines = vector('numeric', length = 60)
for (i in 1:5){
cosines[i] = cosine(as.numeric(altSubjectData$Alpha_M5), as.numeric(fivefold[, (i + 2)]))
cosines[(i+5)] = cosine(as.numeric(altSubjectData$Rho_M5), as.numeric(fivefold[, (i + 7)]))
cosines[(i+10)] = cosine(as.numeric(altSubjectData$Beta_M5), as.numeric(fivefold[, (i + 12)]))
cosines[(i+15)] = cosine(as.numeric(altSubjectData$Epsilon_M5), as.numeric(fivefold[, (i + 17)]))
cosines[(i+20)] = cosine(as.numeric(altSubjectData$Gamma_M5), as.numeric(fivefold[, (i + 22)]))
}
mean(cosines[1:5])
mean(cosines[6:10])
mean(cosines[11:15])
mean(cosines[16:20])
mean(cosines[21:25])