Category : Factor Analysis

Understanding The Basics Of Principal Component Analysis

In case you are studying statistics, at some point, you will cross with the principal component analysis concept. 

Discover the best stat calculators online.

principal-component-analysis

Simply put, principal component analysis is just a statistical technique that allows you to do the same thing with data. So, you will be able to try to find which items fo together because they are the result of something you can’t observe directly. For example, imagine that you’re taking a walk in a forest and you see a pile of laves there. The principal component analysis allows you to figure out which leaves come from which tree. So, to do this, you will need to look at the leaves and see which ones are similar and then look for the tree that went with them. 

Factors

Factors are just the underlying perceptions or concepts that you can’t directly observe. However, you can observe their effects on different surveys and tests. 

Here’s a simple example. You have no idea about how other teachers engage their students in a class since you can’t observe them directly. However, you can design a survey for teachers made of items that represent what research says engagement should like in a class. 

Check out our student t value calculator.

Imagine that you made a survey with 20 questions and that the factor you want to measure is student engagement. The survey items represent the effect of engagement. In a scientific sense, you can think of the factor as an independent variable and the items as a dependent variable.

Factor Analysis

As you can easily understand factor analysis is a technique that looks for correlation between items. In case you had a survey of 6 items, the correlations might look like this:

Factor-Analysis

Notice that items x1 – x3 have high correlations with each other while x4 – x6 have high correlations with each other but not x1 – x3. Logically, items x1 – x3 are related and x4 – x6 are related. This means that there are two separate concepts that the items are measuring. Those concepts are the factors. 

Looking for an effect size (Cohen’s d) for a student t-test calculator?

However, a simple factor analysis does not take some things into account such as the covariance of the items.

Principal Component Analysis

Basics-Of-Principal-Component-Analysis

Simply put, principal component analysis is a more robust and mature version of factor analysis. As you can easily assume, this type of analysis doesn’t only look for correlations of the items but it also looks at correlations between the variance of the items. 

Discover our z-score calculator.

The reality is that the variance between the items helps explain how the items are related. For example, if x1 and x4 have a low covariance, then changes in x1 do not explain the changes in x4 very well. 

As you can easily understand, principal component analysis has an advantage over traditional factor analysis because it takes into account that the variable may explain each other and how well they do that. If the variances are related, then it makes sense that the items are related.


Factor Analysis – How Big Your Sample Size Needs To Be?

When you are looking to conduct a study, you usually plan a sample size for a data set. This is based on getting reasonable statistical power to ensure that you can run a good analysis. These power calculations figure out how big a sample you need so that a certain width of a confidence interval or p-value will coincide with a scientifically meaningful effect size.

factor-analysis

But that’s not the only issue in sample size, and not every statistical analysis uses p-values. One example is factor analysis.

Discover the best stat calculators here.

What Is Factor Analysis?

Simply put, factor analysis is a measurement model of an underlying construct. Ultimately, the focus is on understanding the structure of the relationships among variables.

what-is-factor-analysis

The specific focus in factor analysis is understanding which variables are associated with which latent constructs. The approach is slightly different if you’re running an exploratory or a confirmatory model, but this overall focus is the same.

So, how big does your sample need to be in factor analysis? Simply put, big. However, the answer isn’t that simple in what comes to statistics. 

Check out our student t value calculator.

The Rules Of Thumb

The truth is that this is a discussion between authors. For example, some authors use a criterion based on the total sample size:

  • 100 subjects = sufficient if clear structure; more is better
  • 100 subjects= poor; 300 = good; 1000+ = excellent
  • 300 subjects, though fewer works if correlations are high among variables.

Looking for a p-value calculator for a student t-test?

Other authors base it on a ratio of the number of cases to the number of variables involved in the factor analysis:

  • 10-15 subjects per variable
  • 10 subjects per variable
  • 5 subjects per variable or 100 subjects, whichever is larger
  • 2 subjects per variable

And then others base it on a ratio of cases to the number of factors:

20 subjects per factor.

Remember That Rules Of Thumb Are Not Rules

reducing-data

While there are rules of thumb in what concerns factor analysis, the reality is that these aren’t rules. Some recent simulation studies have found that the required sample size depends on a number of issues in the data and in the model, working together. 

Learn to determine the t statistic and the degrees of freedom with our calculator.

They include all the issues listed above and a few more:

  • You’re going to need a large sample. That means in the hundreds of cases. More is better.
  • You can get away with fewer observations if the data are well-behaved. If there are no missing data and each variable highly loads on a single factor and not others, you won’t need as many cases. But counting on the data behaving is like counting on the weather behaving during hurricane season. You’ll have a better outcome most of the time if you plan for the worst.
  • The main issue with small data sets is overfitting (a secondary issue is if the sample is really small, the model won’t even converge). It’s a simple concept: when a sample is too small, you can get what looks like good results, but you can’t replicate those results in another sample from the same population.

All the parameter estimates are so customized to this particular sample, that they’re not useful for any other sample. This can, and does, happen in any model, not just factor analysis.