this post was submitted on 11 Oct 2023
8 points (100.0% liked)

The R Project for Statistical Computing

21 readers
1 users here now

Everything about the R programming language.

Rules

  1. No bigotry

Check out

founded 4 years ago
MODERATORS
 

P values?
Do they account solely for sampling error (therefore irrelevant when population data is available) OR do they serve to asses the likelihood of something being due to chance in other ways (therefore relevant for studies with population data)?

Any links or literature are welcome :)

@rstats @phdstudents @datascience @socialscience @org_studies

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 1 points 1 year ago (1 children)

@Jey_snow @rstats @phdstudents @datascience @socialscience @org_studies

P-values can be generated from various statistical tests, so a P-value gives no indication of whether the appropriate test was used to analyze the data.

Here are a couple of papers on P-values:

https://pubmed.ncbi.nlm.nih.gov/29566133

https://pubmed.ncbi.nlm.nih.gov/26545564

[–] [email protected] 1 points 1 year ago (1 children)

@MarcusMuench @rstats @phdstudents @datascience @socialscience @org_studies
According to the second article:
"...A p value should be interpreted in terms of what would happen if you repeated the measurement multiple times with different samples..."
If I have a census, I would expect zero difference for repeating measurements due to random sampling. Therefore p values are irrelevant for census data.

Thanks for the references!

[–] [email protected] 1 points 1 year ago

Careful there. If you had a census of ALL the people in your population then you would not expect any variation, as you wrote. But not because of random sampling, but because next time you sample, you would just sample the exact same people (your whole population). And since the sample stays the same, so do the numbers.

If you however truly took a random sample of the population, then the next time you take a new random sample you ask different people and would therefore also get slightly different numbers. And there p-values are useful, because they are based on exactly this question of "well what if I took another random sample, and then another, and then another and so on".