Agenda for meeting with Claire (21/1/2022)
Before the meeting we exchanged a few emails:
Ondrej
The questions that I am aiming to answer in this analysis:
- How do covid-related worries and probability estimates change over time as a function of trait measures.
- Is there a difference in speed of adjustment of covid-related worried and probability estimates following fall/rise in cases/deaths.
So far, I ran EFA on sessions 1 and 15, and CFA on sess 1, just to get the gist the analyses and flag any future pitfalls. For now, all temporal analyses are locked to sess1, however, as we discussed I am planning to make the time-courses more personalized by locking the variables to their peak. Here are some points that I would like to discuss:
- Factor number selection - I used a few indicators such as eigenvalues, parallel analysis etc (see index.html) and initially selected 8 factors, however, the subsequent CFA was crashing due to identifiability issues, so I reduced it to 4 factors (I find it a bit pity because some of the less contributing factors actually represented interesting constructs like "state awareness" or "covid-skepticism"). Is there a generally agreed-upon way to do this though? It feels a bit arbitrary.
- The factors will differ across sessions, what would be the best way to analyse this? So far, I have been exploring the idea of doing EFA on Session 1 and using those factors in a CFA ran on all sessions. Alternatively, would it make sense to create a "factor template" (mean loading across all sessions) and then computing the distance for each session to make it less sess1-dependent? Alternatively II, i saw some work about longitudinal CFA (basically structural equations models accounting for residual covariance) which can provide us with i) general factors across sessions; ii) change in mean for each factor from session to session -> this is kind of what we want looking at how adjustments are made depending on trait anxiety for example.
C: To your question about the EFA loadings differing across sessions. I would take the weights from time 1 and apply them to the raw data at time 2. This is similar to an approach we have taken with the factors from elife, where in new datasets we simply apply the weights from eLife. So long as sample is large enough, the correlation across the weights is super high anyway from the independent samples. In your case, it might not be the case so I think more important to ensure you use same weights at both times.
- Selecting factors for CFA. It seems that there can't be overlaps between pre-defined factors (i.e. design) going into CFA. Therefore, for now, I assigned the indicators (questions) based on the highest loading across factors. I am not sure if this is the right way because loadings might be standardized within-factor so the magnitude across factors might not be relevant? One idea would be to reflect the EFA-estimated loadings in the input to CFA (a sort of parametric design) instead of just using binary assignment.
4 . Are different scales an issue - most variables are scaled 1-7 but some are 1 - 10000 (number of days till the end of the pandemic). The CFA seems to keep complaining about large variance of some variables but the estimation with 4 factors seems to work well.
C: good question. I think it’s not an issue if I recall, but I’m also pretty sure we looked at this and possibly scaled the items prior to analysis. You could quickly do both methods and correlate the scores across them to check empirically. I suspect it will be the same result.
- I am trying to decide whether to do what feels like a slightly hacky way (EFA->15xCFA) or whether to spend some time building a Bayesian SEM model which would effectively eliminate questions 1 and 2, but it would probably increase the complexity of the analysis. In your experience, is this generally a good time investment or do the results often end up being similar?
- To test relationship of factors and "standardized questionnaires" such as trait anx, depression etc. would one commonly work with Factor Scores like I did above? (which i understand are numbers how a given factor is expressed in an individual)
C: yes I’d use the scores. You will see that if you multiply the weights for each time with each persons response to that item, and add them all up, you get the ‘score’ for that person
Claire
One big thing that would help me is if you could remind me of the purpose of the FA for your question/hypothesis? Are there dimensions you are hoping to identify or are you just trying to reduce the dimensionality more generally and see what emerges? The latter can be tricky as it is so dependent on the set of items you put into the analysis*
O This question made me realize that I am actually trying to solve multiple thing at once.
-
Since we specified the questionnaires somewhat arbitrarily I wanted to use the FA to validate that the constructs we pre-defined are somewhat valid. It is very likely that questions from different sections will form a factor that we didn't foresee - for example questions about economic impact seem to load together (or, interestingly, questions involving worry about close person don't cluster with worry about self or avoidance behaviours). I am hoping FA will help us select more meaningful factors.
-
Assuming that there will be some degree of factor stability, for example a factor including questions about perceived relative safety/threat sows up in all sessions that I looked at, but, there seem to be progression in which the reason I was looking on sess1 and sess15 was to see what indicators become more closely associated with worry (i.e. load together). This is where my attempts on CFA come in
Have you tried labelling your factors yet? It’s a little hard for me to get a sense of them. Would be helpful to have a list of the full questionnaire item text and 4 columns showing the loadings on each factor.
I have but then I changed the number of factors so it didn't make sense - choosing the number of factors is something I'd like to ask you about. I initially selected 9 and they all seemed to make sense but because CFA didn't work well I reduced it to 4. I will make the script print out the table, that's a great idea!