Compute Model Fit Index
*************

.. article-info::
    :avatar: dnl_plastic.png
    :avatar-link: https://www.decisionneurosciencelab.com/
    :author: Elijah Galvan
    :date: September 1, 2023
    :read-time: 10 min read
    :class-container: sd-p-2 sd-outline-muted sd-rounded-1

Lesson
================

.. article-info::
    :avatar: UCLA_Suit.jpg
    :avatar-link: https://www.decisionneurosciencelab.com/elijahgalvan
    :author: Elijah Galvan
    :date: September 1, 2023
    :read-time: 10 min read
    :class-container: sd-p-2 sd-outline-muted sd-rounded-1

Goal During this Stage
---------------

To compare models in terms of how well they predict :bdg-danger:`Decisions` made by each :bdg-success:`Subjects`, accounting for how parsimonious each model is. 

How to Achieve this Goal
---------------

.. dropdown:: Select a Model Fit Index

    There are a few options that you have in terms of chosing between existing Model Fit Indices (MFIs). 
    The standard choices are between the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC). 
    Often, these MFIs will produce the same conclusions though sometimes they do not.
    Therefore, it is important to preregister which MFI you will use to select a model and to test your hypothesis.

    While BIC is often the default MFI, both AIC and BIC have their use cases. 
    Specifically, you should use the BIC if you are convinced that you are modeling the true data generation process. 
    The BIC is superior to the AIC at detecting the model which created data. 
    On the other hand, AIC is favorable if you believe that none of the models that you propose are the true data generation process.
    AIC is superior at identifying the best approximation of the data generation process.
    Therefore, you should always use AIC for generalized models (models without :bdg-success:`Free Parameters` characterizing the actual Decision-Making Process such as noise or bias :bdg-success:`Free Parameters` or without non-experimental variables such as self-report measures)

.. dropdown:: Compute Model Fit Index for Each :bdg-success:`Subject`

    .. tab-set::

        .. tab-item:: Plain English

            AIC is calculated as ``NLL + 2k`` while BIC is calculated as ``NLL + ln(N) * k``
            For both NLL is -2 times the log likelihood of our :bdg-success:`Subject`'s data (we calculated this already), k is the number of Free Parameters in our data, and N is the number of observations (i.e. trials).
            
            Alternatively, if model errors are normally distributed, NLL can be replaced with ``N * ln(SS/N)`` where SS is the sum of squared errors between :bdg-danger:`Decisions` predicted by your model and those actually observed. 
            If you are using a generalized model, your conclusions rely on this assumption. 
            This is because if you are *choosing* to overlook aspects of the data generation process by not modeling noise or biases, you must create a design wherein these tendencies produce random, rather than systematic, error. 
            Otherwise, you might very well end up fitting this systematic error which means that your conclusions are completely invalid.

            We've been very focused on making our models generalizable, so we would use the AIC with the latter formulation but we'll compute the BIC in both formulations for an example here. 
            Remember our model has two free parameters.

        .. tab-item:: R

            ::

                N = length(trialList[, 1])
                k = 2
                subjectData$modelAIC = N * log(subjectData$modelSS/N) + 2*k
                subjectData$modelAICStandard = subjectData$modelNLL + 2*k
                subjectData$modelBIC = N * log(subjectData$modelSS/N) + log(N)*k
                subjectData$modelBICStandard = subjectData$modelNLL + log(N)*k

        .. tab-item:: MatLab

            ::

                N = length(trialList(:, 1));
                k = 2;
                subjectData.modelAIC = N * log(subjectData.modelSS/N) + 2*k;
                subjectData.modelAICStandard = subjectData.modelNLL + 2*k;
                subjectData.modelBIC = N * log(subjectData.modelSS/N) + log(N)*k;
                subjectData.modelBICStandard = subjectData.modelNLL + log(N)*k;


        .. tab-item:: Python

            ::

                N = len(trialList[:, 0])
                k = 2
                subjectData['modelAIC'] = N * np.log(subjectData['modelSS']/N) + 2*k
                subjectData['modelAICStandard'] = subjectData['modelNLL'] + 2*k
                subjectData['modelBIC'] = N * np.log(subjectData['modelSS']/N) + np.log(N)*k
                subjectData['modelBICStandard'] = subjectData['modelNLL'] + np.log(N)*k


Tutorials
==========

Tutorial 1 - van Baar, Chang, & Sanfey, 2019
----------------------

.. dropdown:: Select a Model Fit Index

    They used the nonstandard AIC formulation for this (i.e. assuming a normal distribution of error). 
    This was appropriate because the true data generation process for reciprocation decisions might be substantially different: 
    some inequity-averse people might always use a fixed percent of the pot, ignoring how much the investor kept in their decisions and, similarly, 
    some guilt-averse people might do the inverse. 
    More to the point, this formualtion of guilt-aversion did not use self-reported believed expectations which means that they are trying approximate this on average. 
    Thus, AIC is better than BIC since the true data generation process is not (believed to be) in the set of models that we are comparing.

.. dropdown:: Compute Model Fit Index for Each :bdg-success:`Subject`

    .. tab-set::

        .. tab-item:: R

            ::

                N = length(trialList[, 1])
                k = 2
                subjectData$modelAIC = N * log(subjectData$modelSS/N) + 2*k

        .. tab-item:: MatLab

            ::

                N = length(trialList(:, 1));
                k = 2;
                subjectData.modelAIC = N * log(subjectData.modelSS / N) + 2 * k;

        .. tab-item:: Python

            ::

                N = len(trialList.iloc[:, 0])
                k = 2
                subjectData['modelAIC'] = N * np.log(subjectData['modelSS'] / N) + 2 * k

Tutorial 2 - Galvan & Sanfey, 2024
-------------------

.. dropdown:: Select a Model Fit Index

    We used the nonstandard AIC formulation for this (i.e. assuming a normal distribution of error). 
    This was appropriate because the true data generation process for redistribution decisions might be substantially different: 
    our model only focuses on outcomes as a consequence of redistribution, which might not be true: some people might chose redistribution rates based on a fixed number irrespective of outcomes generated. 
    Thus, AIC is better than BIC since the true data generation process is not (believed to be) in the set of models that we are comparing.
    
.. dropdown:: Compute Model Fit Index for Each :bdg-success:`Subject`, Per :bdg-primary:`Condition`

    .. tab-set::

        .. tab-item:: R

            ::

                N = length(df[, 1])
                k = 2
                subjectData$AICMerit = N * log(subjectData$SSMerit/N) + 2*k
                subjectData$AICEntitlement = N * log(subjectData$SSEntitlement/N) + 2*k
                subjectData$AICCorruption = N * log(subjectData$SSCorruption/N) + 2*k
                subjectData$AICLuck = N * log(subjectData$SSLuck/N) + 2*k

        .. tab-item:: MatLab

            ::

                N = size(df, 1);
                k = 2;
                subjectData.AICMerit = N * log(subjectData.SSMerit / N) + 2 * k;
                subjectData.AICEntitlement = N * log(subjectData.SSEntitlement / N) + 2 * k;
                subjectData.AICCorruption = N * log(subjectData.SSCorruption / N) + 2 * k;
                subjectData.AICLuck = N * log(subjectData.SSLuck / N) + 2 * k;

        .. tab-item:: Python

            ::

                N = len(df)
                k = 2
                subjectData['AICMerit'] = N * np.log(subjectData['SSMerit'] / N) + 2 * k
                subjectData['AICEntitlement'] = N * np.log(subjectData['SSEntitlement'] / N) + 2 * k
                subjectData['AICCorruption'] = N * np.log(subjectData['SSCorruption'] / N) + 2 * k
                subjectData['AICLuck'] = N * np.log(subjectData['SSLuck'] / N) + 2 * k

Tutorial 3 - Crockett et al., 2014
-------------------

.. dropdown:: Select a Model Fit Index

    They used the standard formulation of the BIC which is appropriate because they are attempting model both the decision-making process and the decision process itself 
    (i.e. the underlying probabilities).
    
.. dropdown:: Compute Model Fit Index for Each :bdg-success:`Subject`

    .. tab-set::

        .. tab-item:: R

            ::

                subjectData$BIC = as.numeric(subjectData$Deviance) + log(152) * 6

        .. tab-item:: MatLab

            ::

        .. tab-item:: Python

            ::

Tutorial 4 - Li et al., 2022
-------------------

.. dropdown:: Select a Model Fit Index

    They used the standard formulation of the BIC which is appropriate because they are attempting model both the decision-making process and the decision process itself 
    (i.e. the underlying probabilities).
    
.. dropdown:: Compute Model Fit Index for Each :bdg-success:`Subject`

    .. tab-set::

        .. tab-item:: R

            ::

                subjectData$BIC = as.numeric(subjectData$Deviance) + log(65) * 6

        .. tab-item:: MatLab

            ::

        .. tab-item:: Python

            ::