Science topic

Sample Size - Science topic

The number of units (persons, animals, patients, specified circumstances, etc.) in a population to be studied. The sample size should be big enough to have a high likelihood of detecting a true difference between two groups. (From Wassertheil-Smoller, Biostatistics and Epidemiology, 1990, p95)
Questions related to Sample Size
  • asked a question related to Sample Size
Question
3 answers
1)How it will be when 2000 students (same) participated in three different studies and all three studies are in the inclusion criteria?
2) How in case of the single large study had divided into three different studies with same 2000 sample size and all three are considering for reviewing?
It will be a great help for the further development of my study.
Relevant answer
Answer
Thank you for your valuable [email protected] Donald Griffiths
  • asked a question related to Sample Size
Question
3 answers
I request for advice on the procedures for generating a sample size for an Interrupted Time Series design for an Implementation Science study. I will also be grateful if some references on this topic are shared with me.
Relevant answer
Answer
The attached papers may be of some interest to you. Please note that time series are not sampling studies and thus can be long or short depending on research questions and available data. Best wishes, David Booth
  • asked a question related to Sample Size
Question
5 answers
If the research design is qualitative, data collection is via semi structured interviews and analysis include thematic analysis and fuzzy . What should be the optimal sample size? Is there any rational behind the number of interviews or any reference?
Relevant answer
Answer
Though the concept of saturation has been controversial (Marshall et al., 2013; Sebele-Mpofu, 2020), it is regarded as a gold standard for determining sample sizes in qualitative research (Guest et al., 2006; Morse, 2015). Saturation is the point when collecting new qualitative data yields redundant information or has no significant addition to what has been collected and analyzed. For elaboration on the different types of saturation, you might check out the article by Saunders et al. (2018). You could also go through the useful guidelines discussed by Guest et al. (2006) since they help determine theoretical saturation in interview-based research. Further, you could consider the recent relevant insights rendered by Guest et al. (2020), Hennink et al. (2017), and Robinson (2014).
Guest, G., Bunce, A., & Johnson, L. (2006). How many interviews are enough?  An experiment with data saturation and variability. Field Methods, 18(1), 59–82. https://doi.org/10.1177/1525822X05279903
Guest, G., Namey, E., & Chen, M. (2020). A simple method to assess and report thematic saturation in qualitative research. PLOS ONE, 15(5), e0232076. https://doi.org/10.1371/journal.pone.0232076
Hennink, M. M., Kaiser, B. N., & Marconi, V. C. (2017). Code saturation versus meaning saturation: How many interviews are enough? Qualitative Health Research, 27(4), 591–608. https://doi.org/10.1177/1049732316665344
Marshall, B., Cardon, P., Poddar, A., & Fontenot, R. (2013). Does sample size matter in qualitative research? A review of qualitative interviews in is research. Journal of Computer Information Systems, 54(1), 11–22. https://doi.org/10.1080/08874417.2013.11645667
Morse, J. M. (2015). “Data were saturated…” Qualitative Health Research, 25(5), 587–588. https://doi.org/10.1177/1049732315576699
Saunders, B., Sim, J., Kingstone, T., Baker, S., Waterfield, J., Bartlam, B., Burroughs, H., & Jinks, C. (2018). Saturation in qualitative research: Exploring its conceptualization and operationalization. Quality & Quantity, 52(4), 1893–1907. https://doi.org/10.1007/s11135-017-0574-8
Sebele-Mpofu, F. Y. (2020). Saturation controversy in qualitative research: Complexities and underlying assumptions. A literature review. Cogent Social Sciences, 6(1), 1838706. https://doi.org/10.1080/23311886.2020.1838706
Good luck,
  • asked a question related to Sample Size
Question
8 answers
Hey all,
I have a question regarding my optimal sample size. Is the formula that I added in the attachments sufficient to determine my sample size or should I also do something with the amount of variables that I include in my regression. Namely, I thought that there was a rule of thumb that you should have at least 10 respondents for each variable in your regression. Moreover, I don't know whether you need 10 respondents for each factor or for each variable (f.e, the variable loyalty consists of 5 factors, does that imply 10 or 50 respondents?).
Thanks in advance for answering my question!
Kind Regards,
Bram
Relevant answer
Answer
I find these rules of thumb anachronistic and often too simple (one dimensional) for the modelling we are actually trying to do. A much better way is to simulate data with known properties and see if you favoured technique (given the number of observations and number and nature of the variables ) can in practice extract the correct signal from the noise.- this is often very sobering. For discussion and software that can do this see http://www.bristol.ac.uk/media-library/sites/cmm/migrated/documents/mlpowsim-manual.pdf
  • asked a question related to Sample Size
Question
2 answers
I am looking for some valid references for the appropriate sample size for doing a research study based on phenomenology.
Relevant answer
Answer
The guidance as per the following papers and link (see some more recommended publications in the discussions) may further help:
  • Gentles, S. J., Charles, C., Ploeg, J. and McKibbon, K. A. (2015) Sampling in qualitative research: Insights from an overview of the methods literature, The Qualitative Report, 20, 11, pp. 1772-.
  • Groenewald, T. (2004) A Phenomenological Research Design Illustrated, International Journal of Qualitative Methods, 3, 1, pp. 1-26.
  • Morse, J. M. (2000) Editorial: Determining Sample Size, Qualitative Health Research, 10, 1, pp. 2-5.
  • Moser, A. and Korstjens, I. (2018) Series: Practical guidance to qualitative research. Part 3: Sampling, data collection and analysis, European journal of general practice, 24, 1, pp. 9-18.
  • Sim, J., Saunders, B., Waterfield, J. and Kingstone, T. (2018) Can sample size in qualitative research be determined a priori?, International Journal of Social Research Methodology, 21, 5, pp. 619-634.
  • How do you determine the cut off point for sample size in phenomenology?: https://www.researchgate.net/post/How-do-you-determine-the-cut-off-point-for-sample-size-in-phenomenology
  • asked a question related to Sample Size
Question
2 answers
I am to calculate sample size for a case control study and I am not sure whether to use the crude odds ratio or the adjusted odds ratio of other studies to estimate my sample size
Relevant answer
Answer
  1. Kindu Yinges Define rates of participants for each group of outcome ,the minimum clinical significant difference , define the expected prevalence of exposure and define your significance level (usually p<0.05) and the power.
  • asked a question related to Sample Size
Question
16 answers
My sample sizes are 41 and 12 respectively and are normally distributed, continuous data, and randomly selected. However the means for both sample sizes (I even did a combined sample of 41 and 12) are above the mean score that is being compared to. Both have a standard deviation of around 20. I am using SPSS. My data: administered a survey to two groups (two languages) and language one, 41 people replied and language two, 12 people replied. Thus, my first sample size is statistically significant and my second sample size is not.
I am comparing to a mean of 60 and the sample size of 41 yields a mean of 80 and the sample size of 12 yields a mean of 88. When running a one sample t test respectively on both sample sizes, my significance is < .05 which means H0 is rejected (means are the same in comparison to the compared value). Yet doing a two sample t test yields a significance that is > .05 which means H0 is accepted but this would not make sense since a two sample t test gives me a mean that is much higher than 60. Any advice on how to proceed with statistical analysis?
the two tailed/independent samples t test on SPSS tells me in the equal variances assumed row that the significance is > .05. The row beneath it is equal variances not assumed and there is no value for f or significance. To my understanding, if Levine's says significance > .05 I use equal variance assumed and that same significance is telling me that it is not significantly different to the mean value of 60 since it is > .05. This still does not make sense. In this case what am I concluding in respect to the mean value I am comparing my data to?
Relevant answer
Answer
First thing: forget all decision on statistical methods based on sample size as the first criterion, it's typically 1) a very very old habit, dating from the before-computers era and 2) a very non-rigorous way to analyze data because it does not deal at all with the real important points: what is the question you wand to answer? How do you model your data? How do you use this model to answer this question? None of these points involves the sample size.
If you *know* (reasonnably) that your data are (close enough from a) normal (distribution), you do not have to bother with sample sizes, especially with such arbitrary rules like « more than 30 ».
Note that comparing t-tests on each sample individually (« one sample ») and a two-samples t-tests is totally meaningless, it shows that you simply did not really about « what is your question » because it answers two very, very different questions: the one-sample t-test answers to « is the theoretical mean from this sample different from 0 » (so it is not really surprising that the test is significant…) [by default; you can compare to another reference value, it does not change the logic], whereas the two-sample t-test answers to « do the two samples have the same theoretical mean » (which is probably your real question, and you can expect either a significant or a non-significant test; please keep in mind that « non-significant » does not mean « H0 is true/accepted », it only means « I cannot prove that H0 is false/reject H0 », which is completely different).
So I would really advise to take contact with a professional statistician locally to deal with your data. Or at least to follow a basic statistics course.
  • asked a question related to Sample Size
Question
8 answers
is there a way of arriving at a sample size without calculation in a case where the researcher does not know the size of the population
Relevant answer
Answer
guess
  • asked a question related to Sample Size
Question
1 answer
I am planning to conduct a genetic diversity & population structure study in African zebu cattle. I will use 77k SNP markers to genotype the population. What would be your though about ideal sample size?
Thanks
A. Ali (PhD)
Relevant answer
Answer
Genotyping and Quality Control
ORIGINAL RESEARCH article
Front. Genet., 10 October 2018 | https://doi.org/10.3389/fgene.2018.00438
Samples were genotyped at Geneseek (Neogen Corporation, Nebraska, United States) using the Geneseek Genomic Profiler (GGP) High Density (HD) SNP array consisting of 150,000 SNPs, while SNPs for the reference breeds had been genotyped with the Illumina HD Bovine (777K SNPs) array. The SNPs in GGP array were optimised for use in dairy cattle having the most informative SNPs from Illumina Bovine 50k and 770k chips and additional variants known to have a large effect on disease susceptibility and performance. Genotype data quality control and cheques were carried out using PLINK v 1.9 (Purcell et al., 2007) and included removal of SNPs with less than 90% call rate, less than 5% minor allele frequency (MAF) and samples with more than 10% missing genotypes. Additional removal of SNPs not mapped to any chromosome left a total of 120,591 SNPs for analysis. Of the 299 animals, 12 failed the above outlined quality cheques and were removed from the analysis. Total genotyping rate in remaining samples was 0.991. The 120,591 SNPs used in the analysis covered 2516.25 Mb with an average distance of 22.67 kb between adjacent SNPs. The mean chromosomal length ranged between 42.8 Mb on BTA 25 and 158.86 Mb on BTA 1. The mean length of adjacent SNPs per chromosome ranged between 18.67 and 23.89 kb on BTA 14 and BTA 29, respectively. The linkage disequilibrium (LD) across the genome averaged 0.41. Private alleles, defined as variants which are segregating in only one population when evaluating multiple populations, were identified using a custom script in R. A total of 143 private variants, most (132) of which originated from the Rwanda cattle population were detected.
  • asked a question related to Sample Size
Question
3 answers
Sample size calculation
Relevant answer
Answer
Hello Suman,
Sample size calculation tool (formula) is not determined by the statistical software you plan to use but rather by a variety of features such as the type of study e.g Comparative research or descriptive research...... etc
I have found the article cited below (Reference) very helpful, I recommend it as a simple guide.
Reference: Eng, MD, Sample Size Estimation: How Many Individuals Should Be
Studied? 10.1148/Radiol.2272012051 Radiology 2003; 227:309–313
All the best
  • asked a question related to Sample Size
Question
2 answers
I am trying to estimate the sample size necessary for the RNA-seq study. There's no previous studies that I can refer to for an estimation. In this case, how should I approach planning for sample collection? Thank you for your help.
Relevant answer
Answer
You need to find the test/rationale behind your analysis. The sample size is determined in terms of the test underlying your objective.
  • asked a question related to Sample Size
Question
4 answers
I am trying to run analysis in smart pls.every time I try to run algorithm or bootstrap I get the following error:
" sample size too small.there must be atleast 301 cases/ observations."
My sample size is 275.I also checked my data for duplucation in spss and found 0 duplicate data.kindly help in this regard.
Thank you
Relevant answer
Answer
Oops number two.what type of bootstrapping did you specify?. That could also be a problem. Google your question and be sure to see the videos as well. Specify your package.. apologies #2, David Booth
  • asked a question related to Sample Size
Question
8 answers
I need to do healthcare providers' compliance to the standard performance with observation, but difficult to get a reference for sample size determination.
Relevant answer
Answer
without knowing your intended calculations, this is impossible to determine. I would refrain from using often unattainable "rules of thumb" like x participants per variable. You will not only find a number of threads on RG that have dealt with this question, but a mountain of literature on the topic of statistical power. One of the most precise, yet difficult procedures would be a Monte Carlo simulation - I am struggling with those myself at the moment.
Good luck!
Marcel
  • asked a question related to Sample Size
Question
5 answers
I have polymer nanocomposite made through injection molding route. Nanomaterial's were added into the polymer then by melting and injection molding the samples were taken out in ASTM Standards. What are all the methods to prepare solid sample for testing it into SEM, FESEM, AFM and TEM. What should be the minimum sample size required for doing these testing for a non conductive material? Like to know about it in detail kindly share some pdf research papers/ book for reference.
Relevant answer
Answer
Dear Dinesh Babu Rathina , specimen preparation depends on your task. For example, for SEM - ESEM (it's the same thing) and for AFM specimens could be polished or fractured. Minimum specimen size is usually not a problem, 1 mm square should be enough. Of coarse, for proper polishing specimen should be much bigger (at least 1 cm square), or embedded in a resin. Prepared specimens chould be coated with conducting coating, either C, or Au/Pd, or Au, etc. For polishing techniques you can google "metallographic specimen preparation" or take a look at this site:
For TEM ultrathin sections should be cut with ultramicrotome. You cannot do it by yourself, contact your TEM technician.
  • asked a question related to Sample Size
Question
10 answers
Hi,
I am looking at which age group it is more plausible that an individual belongs to based on which social group she/he trusts the most.
The population that I want my sample to be representative of is 8.189.892 and is aged between 18 and 100+. I need to calculate the correct sample size (simple random sampling).
Do I pick confidence interval/margin of error and confidence level just voluntarily and calculate the correct sample size based on them? If so, how do I go about it? or:
Do confidence interval/margin of error and confidence level need to be calculated first? and if so how do I go about it?
Is there any other variable that needs to be taken into account when calculating a sample size? and if so which, and how do I go about it?
As it might be vivid I am a beginner and pretty much every solution that I have found online goes about calculating one from the other but I don't have any variable (except for the total population size) as long as confidence interval and confidence level cannot just be picked by free choice (like 2% of confidence interval and 99% confidence level for example).
Cheers,
Viktor
Relevant answer
  • asked a question related to Sample Size
Question
2 answers
Better if I can get an idea about how to calculate the sample size or what would be the minimum number of participants to be recruited in such intervention?
Relevant answer
Answer
Studies with large participants provide promising plus reliable results. However there is no gold standard in terms of sample size as each study comes with its own inter- and intra sample variations based on the features that being collected during the study. Since this is a clinical study, a power analysis on data from few prominent clinical variables (potential con-founders that may explain differences in microbiome composition) collected from the cohorts of interests may provide few hints on how many samples to be included in a microbiome study to anticipate statistically significant results. Yet, this could be a superficial test to explain how different the study groups are based on certain variables.
  • asked a question related to Sample Size
Question
3 answers
I plan to conduct prospective cohort study but there have not been similar studies with my proposed title.
Relevant answer
Answer
In the equations to estimate sample size, you are required to have prevalence of the outcome in exposed and unexposed group; which you can get from any related study from similar setting else where. Or you assume OR of 2. But in either case you need to have an estimate of the prevalence of the exposure in the population to decide on unexposed:exposed ratio.
  • asked a question related to Sample Size
Question
2 answers
Need to know who originally formulated the sample size determination formula n= N/(1 + Ne^2). Your insights will be helpful.
Relevant answer
Answer
Yumi Vivien V De Luna (I am not an expert but...) both Slovin's [n = N / (1+Ne2)] and Yamane's [n= N/ (1+N (e) 2)] appear similar (identical?) and provide a simplified formula to calculate sample sizes. Taro [sometimes Yaro] Yamane is often cited and some biographical/bibliographic data can be found. Slovin is often-cited but further information is scant. Several questions on this topic in ResearchGate. See:
Cheers,
Leo
  • asked a question related to Sample Size
Question
1 answer
Hi,
Does anyone may have suggestions to compute constrained maximum likelihood estimate (CMLE) instead of ML estimate in a Wald Test In Mplus using MODEL TEST COMMAND? My latent class are unequal in terms of sample size and some have small sample size, and this method seems to be more adapted for this type of design.
Here is an example of my syntax for Wald test:
Model Constraint:
New(P1vs2 P1vs3 P2vs3);
P1vs2 = P1 - P2;
P1vs3 = P1 - P3;
P2vs3 = P2 - P3;
Model Test:
0 = P1vs2 - P1vs3;
0 = P1vs2 - P2vs3;
Thanks
Relevant answer
Answer
This is really difficult I think
to get a start you might want to look at Wald test construction in the attached paper. Best wishes, David Booth
  • asked a question related to Sample Size
Question
3 answers
I want to know how to determine the sample size while performing a microbiome study in humans.
Relevant answer
Answer
Thank you very much for the promising question. Microbiome studies, ranging from observational case reports to randomized controlled trials, typically
depend on well-defined research hypotheses and number samples to be tested. Microbiome diversity and signature could significantly be affected by the number of samples studied. The higher the sample size, the higher level of confidence about these generalisations. The sample size could also be affected by the technique of sequencing; as for example targeted ribosomal gene (16S rRNA) is now so cheap and huge number of samples can be sequenced easily. Conversely, shotgun or whole metagenome sequencing is far more costly, and thus, it usually include fewer numbers of samples than than amplicon based sequencing
  • asked a question related to Sample Size
Question
12 answers
I am performing Factor analysis on a dataset with 16 variables and 22 observations. I realize the sample size is small but I am getting high factor loadings. The problem is that the correlation among variables under one factor is very high with correlation coefficients of more than 0.7. Also some of the variables under one factor are highly correlated with variables under another factor. I don't want to drop too many variables because they are important from the study aspect. I read that the correlation pattern among variables in case of small sample size is not reliable. If that is the case then can I go with the further analysis with the existing high correlation. Would there be a problem of multicollinearity?
Relevant answer
Answer
I also endorse that multicollinearity doesn't seem to be an issue however, it would be too early to reach any conclusion given that the sample is too low. Considering the case that loadings are appropriate and under the right factors, the issue of high correlation among factors is also supposed to be sorted out with a larger data sample. Wishing you the best in your research.
  • asked a question related to Sample Size
Question
4 answers
Hi,
I am going to intervene mice with drug and want to see is there any effects of intervention compared to sham controls in mouse model. Do you have any idea about the priori power calculation tools/methods used in animal interventional studies?
Many thanks,
Nirmal
Relevant answer
Answer
George Vassilakos , thank fornoticing. I fixed it. The new link should work.
  • asked a question related to Sample Size
Question
2 answers
Can anyone share a research paper where the power calculation was conducted for sample size and as for dataset 'Demographic and Health Survey (DHS)' was used?
The outcome variable is not a big concern. Just need the clarification of methodological section where power calculation was performed using DHS dataset.
Thanks in advance!
Relevant answer
Answer
I believe Google search still works. Best wishes, David Booth
  • asked a question related to Sample Size
Question
12 answers
I need the latest recommendations for using Krejcie and Morgan's (1970) formula.
Relevant answer
Answer
Hi Sidra. I think we still can use the Kriejie Morgan Table of Sampling. In my field town planning research..many of the researchers still using this table. I'm graduated my PhD with minor correction in 2019 & I used this table to justify my sample size.
  • asked a question related to Sample Size
Question
3 answers
When I have a binary outcome variable and a binary predictor, I am able to calculate the sample size needed using either G*Power (under z tests -- logistic regression) or R (wp.logistic). However, I do not know how to tackle this when the predictor variable has three groups/categories. Does anyone know how to go about these calculations, and using what software? Thanks in advance.
Relevant answer
Answer
There are a few online calculators for this just google the topic. Most people recommend doing it by simulation. See the attached search for references. Best wishes, David Booth
  • asked a question related to Sample Size
Question
4 answers
I would like to calculate the required sample size for a main study based on the results from my pilot study in a psychological experiment.
The hypotheses indicate a repeated measures multiple mean comparison, hence, a repeated measures ANOVA (one-way). In the pilot study, the normality assumption is violated, therefore, I used a Friedman test instead.
My question: How can I now calculate the required sample size for my main study? I already found the rule of thumb to add 15% of the calculated sample size for the corresponding parametric analysis. Yet, because the normality assumption is violated, I cannot trust this parametric analysis. I am therefore looking for an equation to calculate the required sample size with the output of the Friedman test.
Thanks in advance for your help
Relevant answer
Answer
Hello Markus,
The "large sample" asymptote of power-efficiency of the Friedman test relative to the one-way, repeated measures anova, is about .955 * j / (j +1), where j is the number of levels/measures (when all assumptions are satisfied). With small samples, the drop off is likely worse, as the Friedman test behaves more like a sign test than like the dependent t or Wilcoxon signed ranks sum test.
You'd either need to: (a) build in a larger cushion than the 15% you mention; or (b) consider an alternative analytic method. Have a look at this blog, which outlines the concerns in more detail (as well as discussing ranked data/parametric method hybrids, as championed by Puri & Sen, as well as Iman & Conover) : https://www.r-bloggers.com/2012/02/beware-the-friedman-test/
Good luck with your work.
  • asked a question related to Sample Size
Question
3 answers
I'm looking for a way to measure significance + variance homogeneity/heterogenity
First of all I tested my data for normality and it's not always parametrical, so I think that I should use different tests for parametrical/nonparametrical data (fig 1 and 2).
As I understood I can't use repeated measures ANOVA if I have unequal sample size so any tips would be highly apprecitated
Relevant answer
Answer
Dear Denis,
Actually, you can use ANOVA for unequal sample size.
Just consider some rules for choosing post-hocs (we use them for finding the source of differences). But for your case:
* Non-equal groups: Gabriel’s or Hochberg’s GF2 are most favorites here. Hochberg’s GF2 may be better if there is a larger difference in group sizes.
* Equal variances not assumed: Games–Howell is the most commonly used here.
Share
  • asked a question related to Sample Size
Question
3 answers
Dear all,
My goal is to predict night light intensity based on day time satellite imagery. I am using data from VIIRS and Landsat 8 sensors. My study area is shown in the attached image and I am following the methodology of the attached paper. According the authors their day time images are of 400*400 pixel (dimensions). Also, they classified the nighttime images into 3 categories (low, medium, high intensity lights). In my study area very few day time pixels corresponds to high intensity lights compared to the other 2 categories (e.g. I can find only 40 day time pixels for category 3 but I can sample way more for the other two categories). How should I sample (select) my day time images for each category, taking into account that each sample will have different dimensions (e.g. I can find only 40 day time pixels for category 3 but I can sample way more for the other two categories)? These samples are going to be my input to the VGG-16 architecture.
Relevant answer
Answer
Hi Nikolaos Tziokas!
You can randomly choose the size 224x224, conduct an experiment, evaluate the accuracy of the model. And also choose a smaller image size and also conduct an experiment and evaluate the quality of the model. And compare at what sizes of images your model works better. Neural networks are creativity and a lot of experiments. Also, I recommend using VGG-19, Resnet, Alexnet, etc.
  • asked a question related to Sample Size
Question
5 answers
Goodnight!
I have questions regarding the definition of sample size.
The aim of the study is to evaluate if the mother's diet will contribute to the offspring's obesity rates.
The experiment in question has the following design:
Six types of diet and a control diet will be used in the treatments of female mice. After the experiment and the mating of the females, we only need to use the male individuals of the offspring. Therefore, I need to define the number of female mice that will be used in each of the treatments, and define the number of male individuals that will be selected from each offspring, to carry out the index tests.
For ethical reasons, I need to use as few individuals as possible and I need statistical bases to define this number. A big problem is that we cannot predict how many males will come from each problem.
Can anybody help me?
Relevant answer
Answer
depends on the study area and topic
  • asked a question related to Sample Size
Question
32 answers
I am writing a qualitative research paper on EFL graduate students' academic writing challenges in a university in Turkey where English is the medium of instruction. The research instrument is a semi-structured interview, and thematic analysis (TA) will be implemented. Based on what should I choose the sample size? What is the best/ideal sample size to reach the principle of saturation?
Relevant answer
Answer
A way to deal with this is to point your committee to some references so they don't have to take your word for it. The book by Zina O'Leary (2009-- I know she has newer version but the basics of qualitative research should already be touched on the 2009 version). You may also consult the book by Jennifer Mon titled Qualitative Researching-- that one is my favorite. It is student-friendly. From the looks of it, your committee is more from the positivist school of thought, which does not sit well with the constructivist orientation of your research.
  • asked a question related to Sample Size
Question
6 answers
Hello,
Does anybody have any statistical suggestions for justify the pertinence to make comparisons between groups with different sample size?
I've conducted latent class analysis and i would like to compare these classes on a distal variable. Chi-square tests were used to compare classes. Results are good but i'm looking for some references that could help me justify the pertinence of these comparisons even though classes differ frome each other in terms of sample size.
Thanks!
Relevant answer
Answer
Likert scales cannot ever be considered normal, because the numbers are not equal value. The meaning of the difference between say 2 and 4 is not necessarily the same as the difference between say 5 and 6. The main information content is ordinal. Therefore rank based tests are to be preferred. For instance, a Kruskal-Wallis test does not require equal sample size and it can be followed by subsequent comparisons between treatments using the Wilcoxon test with Bonferroni correction.
  • asked a question related to Sample Size
Question
3 answers
I am a Medical Sonographer and will be performing a clinical based research project. I need to calculate the sample size needed for patient recruitment however in my literature review I have not found any previous studies on my research topic to assist with sample size calculation. I will be able to recruit 200 patients in my clinical site. How do I justify that 200 patients are sufficient to allow data analysis, draw conclusions etc?
Relevant answer
Answer
Hello Samantha,
In the absence of extant studies which report on observed effects of the type you wish to investigate, there are several other viable options:
1. Run a small, pilot study to get an idea of what effects may exist. Note, though, that the effects which may exist might or might not be considered worthy of attention (as in the next suggestion).
2. Get a consensus value from a group of domain experts as to what magnitude of effect would be noteworthy, should it exist in the population.
3. Evaluate the effect from an economic basis: At what magnitude of effect would the resultant costs (considering both positive and negative) outweigh those of no treatment/measurement/diagnosis.
Good luck with your work.
  • asked a question related to Sample Size
Question
12 answers
data sample: 206
dependent - continuous sample. normaly distributed
independents - sample sizes are are different between categories.
do i use parametric or non-parametric to analyze?
Relevant answer
Answer
What is the statistical hypothesis you want to test?
  • asked a question related to Sample Size
Question
6 answers
What is your opinion on how much patients I need for a early feasibility study (ESF) to access in human device functionality of a prototype?
Relevant answer
Answer
Dear Thales Paulo Batista The first part of the answer to your question is simply: 'how long is a piece of string'? That is, the sample size of a test procedure depends on the accuracy and precision of the procedure. What measures will you take to test this prototype? The classic answer is threefold:
Effectiveness: proportion of correct solutions made by testees
Efficiency: amount of mental enegery expended by testees in arriving at correct solutions
Satisfaction: internal mental state of testees after they finish work with the prototype
See the ISO 2941 standard (browse for it!)
Unfortunately, Effectiveness and Efficiency do not have standardised test procedures, although Satisfaction does: see my work at sumi.uxp.ie
The other part of the question is: 'how big is the gap'? That is, to what extent does your prototype exhibit all the behaviour a testee will expect to see in the finally developed app. If your prototype is missing functionalities that will be obvious to the testee, then either 'more work is needed' or you may wish to test with a wire-frame model or a good paper prototype using a Wizard of Oz method. Using these partial methods, you will not wish to rely on quantitative data, but listen carefully to what your testees are telling you. Classically there is no predictive model for sample size in such open-ended research: you carry on sampling until you achieve 'saturation': that is, you start getting the same reactions over and over again.
  • asked a question related to Sample Size
Question
7 answers
Hello,
I'm trying to estimate a minimum sample size and I'm reading Kline's book on structural equation modeling, where he references the N:Q ratio:
"In ML estimation, Jackson (2003) suggested that researchers think about minimum sample size in terms of the ratio of cases (N) to the number of model parameters that require statistical estimates (q)"
I have my model set up in AMOS and it shows the parameters for the weights, covariances, and variances. I understand where they come from, but I'm confused about how to use the N:Q rule because Kline says that its the parameters that require statistical estimates. Does this mean the parameters that I am interested in for my prediction (i.e. the weights and/or covariances) ? Or does it mean the parameters that the entire statistical model requires to make an estimate of fit (i.e. weights + covariances + variances)?
Thanks so much in advance!
Relevant answer
Answer
David L Morgan It is not generally true (in fact, rather rarely) that the residual variances in path, CFA, or SEM models are functions of (determined by) other model parameters. Residual variances are almost always free (independent) model parameters.
As an aside, when parameters (including residual variances) are set equal to other parameters, they count only as one free parameter.
  • asked a question related to Sample Size
Question
5 answers
I want to model mode choice of movement challenged persons (MCPs) using random forrest decision tree. My sample size is 400. It is very difficult to reach to MCPs and collect more samples. Will 400 samples be enough for random forrest decision tree?
Is there any specific sample size to apply random forrest decision tree on a data.
Relevant answer
Answer
The 400 instances will be sufficient for DT algorithms. I wish to know the number of attributes considered.
  • asked a question related to Sample Size
Question
6 answers
Hello everyone, I have a question regarding the sample size and the response rate. My population size is around 22000 students. I sent a survey to all of them via email. Around 6% of them (n=1300) participated. My question is, since I have a large sample size, is a low percentage like 6% response rate will create an issue for my research?
Relevant answer
Answer
It is also important to consider the purpose of the study. Is it to make descriptions of "typical/mean" attitudes in the population (statistical inference) or is the purpose to test a theoretical model? If the latter, the prototypicality of the sample is the most important aspect. If the sample (collected data) is adequate for testing the theoretical model the low response rate might not be a major problem. And Power (n-value) you have enoguh anyway.
  • asked a question related to Sample Size
Question
4 answers
Hello all, I have a question about sample size calculation want to ask for help. My project is about"prevalence and associated factors of burnout among the end of life care nurses in the North East of England", which is a cross-sectional study and I'm going to use linear regression to analyse data. However, it is my first time conducting research and I have little experience and knowledge about statistic...Now I'm stuck in the power calculation. My supervisor required me to figure out how many respondents indeed I need first, but I'm quite confused about it. I tried reading books from the uni library and searching videos on youtube...but I still feel all information I received are really messy and unorganised. May I ask for any advice and suggestion about how to figure out the sample size calculation and power calculation? Just need some advice, like from where I can get good resources to learn from the very beginning (I have limited time now)...or steps that I need to do to go through this pathway. Could you please give me some help? Thanks a million for any advice and instruction.
Relevant answer
Answer
Depending on design, number of researchers and fund
Also we have may formal mathematics to calculate the sample size
  • asked a question related to Sample Size
Question
5 answers
I have sample a of 138 observations (cross sectional data) and running OLS regression with 6 independent variables.
My adjusted R2 is always coming negative even if I include only 1 independent variable in the model. All the beta coefficients as well as regression models are insignificant. The value of R2 is close to zero.
My queries are:
(a) Is negative adjusted R2 possible? If yes how should I justify it in my study and any references that can be quoted to support my results.
(b) Please suggest what should I do to improve my results? It is not possible to increase the sample size and have already checked my data for any inconsistencies.
Relevant answer
Answer
@David If you can suggest any book or published research paper I can refer to because I couldn't find any authentic source on google that can be cited for supporting my results.
  • asked a question related to Sample Size
Question
10 answers
Hello,
I have multiple choice items in my survey questionnaire and I have a problem with figuring out which statistical analysis is the best since I don't have a hypothesis because my questionnaire is exploratory.
I have 4 devices and I have 9 feelings (the same feelings repeated for each device) and I want to discover which feeling is the most prominent across all devices.
Example of the item: (below example for the same subject)
Device A: picked feelings 1,4,9
Device B: picked feelings 5
Device C: picked feelings 4,5,7,8
Device D: picked feelings 5
The other item is I have 5 choices of what they expect of specific training. The choices varied between one subject to another.
Example of the item: (below example for different subjects)
Subject 1: picked expectation 1, 3, 2
Subject 2: picked expectation 1,2,3,4,5
Subject 3: picked expectation 3
I have used Friedman's test to rank each feeling item and expectations item, however, I want to know if there's a statistical significance of these items.
My sample size is 101. However, since these items are multiple choice they are not equal. For example Device A has 200 responses while Device B has 109 responses.
Thank you in advance.
Relevant answer
Answer
In the first place, I think that it should carry out a "semantic reduction", if such options in each item are "open" combining the different answers received in "similar, homogeneous or similar semantic frames or of the same sense or option"; Whether they are "open" or not, Later, it would calculate the frequency of each "semantic framework" or option of each item obtained, it would classify them by virtue of the number of subjects who opted for said "semantic frameworks" or options and it would offer the resulting MODA ("semantic framework" or option in each item with more subjects).
Another possibility would be to do a "Delphi-type analysis" of the responses obtained, as is done in "brainstorming"
  • asked a question related to Sample Size
Question
5 answers
Hi,
I want to conduct a study comparing the adverse effects (our primary outcomes) of an old drug on another disease. There is no previous experience on treating the disease using the old drug for a specific cohort (such as highly severe group). In this study, we are planning to recruit 20-40 participants, give them the treatment and follow them for x months (3-6). Then we will assess the adverse effects (A, B and C [all binary outcomes] in which A is the most important one.) and other outcomes (maybe continuous) at two times (maybe first at 3 months and second at 6 months). For the control group, we plan to match control group from our medical database based on patients characteristics.
However, I am not sure if it is feasible to conduct this study as a matched case control study. And if it’s okay, how can I determine the sample size and matching ratio between case and control groups. If it’s not correct, what type of study do I need to conduct. Any thoughts would be very appreciated!
Thanks in advance!
Relevant answer
Answer
Mark R Speechley You are absolutely correct! It's going to be consisted of a prospective treatment group and a retrospective control group. I will look into the problem following your suggestions and rethink the study design.
Thanks for organizing the question in such a clear way! I really appreciate it.
Chen
  • asked a question related to Sample Size
Question
7 answers
I would like to conduct multigroup moderation analysis using AMOS. I have two samples. the first one n =73 whereas the second is 216. Are the two sample sizes appropriate for such analysis?
It should be noted that my overall or total sample size is 289. the two samples are categorized according to the quality of ties with organizational leadership. the smaller sample size represents the group which has a negative tie with leadership. usually negative ties are much smaller in number than the positive ties.
Relevant answer
Answer
  • asked a question related to Sample Size
Question
7 answers
Hi,
I am using the Demographic and Health Survey (DHS) data of a country. My purpose is to find out the prevalence and risk factors of 'Intimate Partner Violence'. Now, one of my supervisors has asked me for the 'power calculation'. But I don't know how to reply to him since I did not need to estimate the sample size by my own.
Actually, as far I know, the design of DHS samples is determined by many factors, including criteria for the standard errors of estimates of the main indicators within the sample strata, which are usually the combinations of level 1 administrative units and urban/rural residence.
So, 'how can I reply to my supervisor about this power calculation in terms of DHS dataset', can anyone help me to find it out?
Relevant answer
Answer
Sabiha Khatoon Thank you very much for your file. It was really helpful.
And, David Eugene Booth I am really grateful for your suggestion. I am going to compute the power, and will come back to you if I need any suggestions regarding the Power Calculation.
  • asked a question related to Sample Size
Question
10 answers
Dear colleagues,
I applied the Granger Causality test in my paper and the reviewer wrote me the following: the statistical analysis was a bit short – usually the Granger-causality is followed by some vector autoregressive modeling...
What can I respond in this case?
P.S. I had a small sample size and serious data limitation.
Best
Ibrahim
Relevant answer
Answer
Ibrahim Niftiyev , probably the reviewer wants to see not only whether a variable affects or not the other (i.e. the results of the Granger causality tests), but also to which extent (the magnitude and temporality of the dynamic relationship, something you can obtain from the IRFs of a VAR model). If you want to apply a VAR but you have a small sample size/data limitations, you want to consider a Bayesian VAR. Bayesian VARs are very popular and Bayesian methods are valid in small samples.
  • asked a question related to Sample Size
Question
6 answers
Hi everyone,
I tested an SEM model with 2 IV, 4 mediators and 1 DV on a sample of 1000 participants (see attached figure). Could you please help me to find an estimation for a good sample size using power analysis for this multiple-mediator model.
Best,
Robin
Relevant answer
Answer
You could refer to Jak et al. (2020) and Wang and Rhemtulla (2021) for recent insights. Here are the full citations.
Jak, S., Jorgensen, T. D., Verdam, M. G. E., Oort, F. J., & Elffers, L. (2020). Analytical power calculations for structural equation modeling: A tutorial and Shiny app. Behavior Research Methods. https://doi.org/10.3758/s13428-020-01479-0
Wang, Y. A., & Rhemtulla, M. (2021). Power analysis for parameter estimation in structural equation modeling: A discussion and tutorial. Advances in Methods and Practices in Psychological Science, 4(1), 251524592091825. https://doi.org/10.1177/2515245920918253
Good luck,
  • asked a question related to Sample Size
Question
3 answers
My research is to determine hypoglycemia due to ADR prevalence with total sampling method. Why total sampling needs a minimum sample size ?
Relevant answer
Answer
If you want your computed prevalence to be representative and applicable for others to cite and use, you have to ensure that there is a minimum sample size (effect size). This applies to all studies not just prevalence studlies.
  • asked a question related to Sample Size
Question
3 answers
What was the study power in this sample size?
I received this comment from a reviewer. I used a questionnaire to collect the data from bio-medical students (790) to know their study experience during covid-19. So, do I need to calculate study power? I was not doing any medical trials etc. So, please suggest to me how to deal with this issue?
Relevant answer
Answer
Although power analyses are often conducted prior to undertaking a study (in order to determine a sample size), they can also be used "post-hoc" to determine your ability to detect an effect, given your actual sample size.
Since you have a relatively large sample, you should not hesitate to use g*power or some other online tool to assess your power.
  • asked a question related to Sample Size
Question
3 answers
I am conducting a diagnostic accuracy study which is being held in two different study sites within a particular region in Ghana.
Using the formula comprehensively explained by Negida et. al. (2019) https://europepmc.org/article/pmc/6683590, I would need prevalence (from a previous study) to arrive at my final sample size.
My question is should I look for a single prevalence from a study held in the region or do I have to look for individual prevalence specific to each study site?
Relevant answer
Answer
It depends on whether the prevalence is likely to be vary dramatically between the study sites and the precision you want. . You should calculate sample sizes for a range of assumptions as to estimated likely prevalence and see the effect.
eg The attached extract shows that sample size does not very much for precision of +/- 0.05 in estimating a true prevalence of 0.3-0.7 .
(From dos Santos Silva .Cancer epidemiology; principles and methods. IARC 1999)
If the prevalence is likely to vary more between sites , do a separate sample size calculation for each site.
  • asked a question related to Sample Size
Question
5 answers
How do i determine the sample size for a study looking at the treatment outcomes of mental health patients in a community house that is a step down from inpatient unit.
The study will look at outcome measures from when participants first enter the house as well as at discharge(up to 14 days) and at 3 months(after community follow-up).
It will also look at the client satisfaction survey results completed at these time frames.
We will be using a convenience sample.
Currently there are 8 people in the houses at a time.
Also if anyone knows, how would you determine if someone is too unwell and should be excluded from the study. Most patients are fairly stable in the house but require additional support prior to discharge to the community. Ive read a number of studies but no one clearly defines how they make that decision.
Thank you for taking the time to help
Kind regards Margaret
  • asked a question related to Sample Size
Question
7 answers
I need citation from the previous studies about to include the Pilot study sample size with the final sample size
thank you
Relevant answer
Answer
Hi.Generally is not recommended. ... (qualitative research) or survey questionnaire / experiment etc. (quantitative research) - so that after the pilot study, the instrument can be improved before being used in the final data collection.
  • asked a question related to Sample Size
Question
7 answers
I am planning a cross-over design study (RCT) on effect of a certain supplement/medicine on post-exercise muscle pain. There hasn't been any similar study to recent date on the effect of this medicine (or similar medicines) on post-exercise muscle pain. However, some studies have been conducted for effect of this medicine on certain conditions such as hypertension.
As long as I have been searching formulas for estimating sample size, they need information (such as standard deviation, mean, effect size, etc.) from some similar kind of studies which was conducted before.
Is there anyway to estimate a sample size for my RCT with the aforementioned conditions?
Relevant answer
Answer
For a power calculation you design a study to detect an effect size that you would like to be able to detect (sometimes termed the smallest effect size of interest or SESOI) not the effect you think exists. So the question becomes what would be a meaningful change in pain on the scale/measure you are using (probably in terms of clinical benefit). That gives you the mean difference you are looking for. Then you can look at other studies that have used the same outcome measure in a simple population to the one you are sampling and look at the typical ranges of SDs. That suggests a range of standardised mean differences that you can then look at. (Best to consider ranges of plausible values as SD in particular is not precisely measured in most studies). After that factor in things like likely drop-out or the impact of covariates (e.g., collinearity).
  • asked a question related to Sample Size
Question
3 answers
Hi, so I conducted a 2*3 factorial experiment with pretest and mediators. the sample size is 422. I ran the model in AMOS and it returns the following error message:
An error occurred while attempting to fit the model.
The sample moment matrix is not positive definite. It could fail to be positive definite for any of the following reasons:
1. The sample covariance matrix or the sample correlation matrix contains a date entry error.
2. The observed variables are linearly dependent (perhaps because the sample size is too small).
3. The sample covariance matrix or sample correlation matrix was computed form incomplete data using the method of
“pairwise deletion”.
4. The sample correlation matrix contains correlation coefficients other than product moment correlations (such as tetrachoric correlations).
For maximum likelihood estimation only, it maybe appropriate to check “allow non-positive definite sample covariance matrices” in the “Analysis Properties” window, or to use the non-positive method.
I was using raw data so 1 and 4 should not be a problem. I also checked the correlation matrix - there's no correlation higher than .85, so I assumed it was fine.
I added factors listwise and finally located one factor - the posttest measurement of a mediator (3-items). Adding this factor yielded the error message. The other factors (including its pretest) are fine.
I guess it is probably because the model is underpowered.
I was wondering if I can change the latent mediator into a manifest item, for both of its pre- and post-tests, and include it into the model? Would there be any articles that I can read or find a solution?
Thank you very much in advance!
Relevant answer
Answer
That is an interesting approach.
  • asked a question related to Sample Size
Question
2 answers
We are planning to conduct matched case control study design to identify determinants of fetal macrosomia of neonates delivered. We are trying to get maximum sample size for this study
Relevant answer
Answer
  • asked a question related to Sample Size
Question
9 answers
In my master's thesis, I am doing a mixed method research where I have a quantitative analysis of a survey with 21 items measuring a total of 8 elements of a theoretical model. The sample size is approximately 77.
I have been told that due to my mixed methods approach, can get enough analysis in sticking to some descriptives, and potentially a regression.
My question is therefore if I can combine items that are intended to measure the same construct into one variable and use as dependent variable without having run some kind of factor analysis prior? I do not want the quantitative part to dominate my research and would therefore prefer sticking to only a few quantitative analyses, but of course not run any tests that ignore critical prerequisites.
Relevant answer
Answer
In a mixed method one could do exploration first with Personal interview.This method will give you some new variables which are not present in past studies.THen you so a causal study by using the SEM where you have these new and old variables.This way you can add to the theory of the past literature too.
  • asked a question related to Sample Size
Question
10 answers
Hi Reserchers,
I am doing a computer science dissertation on the topic '' Automate text tool to analysis reflective writing''.
The hypothesis set is ‘To what extent is the model valid for assessed reflective writing?’ I just want from the questionnaire( closed ended questions and one open question) to validate the proposed model.
I have used the used the 5 point likert scale for analysing the data, option given strongly agree, agree, neutral, disagree, strongly disagree. The sample size is 10 participants. I have chosen my participate based on their experience, career and knowledge of the reflective writing.
1) Which statistical analysis tool shall I use to analyse 10 sample size to validate the model? Please show me step by step on how to analyse the data?
2) What would be the associated hypothesis?
3) Can I use Content Validity Index with 10 sample size participants on the questionnaires using 5 point likert scale?
4) this step on my research Is it qualitative method or quantitative method?why?
If you have any suggestion on my hypothesis, the sample size and the tool I need to analyse?
Thank you in advance !
Relevant answer
Answer
but i have a worry how will you increase your population are you going to change your area of study? i have a similar case with a population of just 13 teacher how do i test for the relibility of the quetionares
  • asked a question related to Sample Size
Question
4 answers
I need to determine sample size for my research by using an online power calculator. Can anyone recommend a best and easy to use calculator? I am a beginner at statistics.
Relevant answer
Answer
It depends on what type of sampling you are doing.eg:simple random,stratified,systematic or cluster.And other than that what are you estimating.eg:population mean,polulation total etc
  • asked a question related to Sample Size
Question
3 answers
Hi all, I am currently doing a study on cognitive behavioural therapy(CBT) on POI patients, and my null hypothesis is that CBT improves QOL in these patients. my primary outcome would be the general health(GH) portion of the SF-36 questionnaire. Based on previous research done, the standard deviation for GH of PCOS patients were 21.74, and the minimal clinically significant difference is 5 points. Assuming 80% power and 5% significance level, do I have enough information to calculate sample size? Ive read through multiple papers but all of them have different methods to calculate. Do I have to take into account intraclass correlation or variance inflation factor?
Relevant answer
Answer
Well I have heard male residents say cystitis doesn't need treatment and have heard female residents say you'd treat it if you had it. So,
a larger sample is likely better than a small one. For your calculation you might Google Gpower and the relevant literature. But a pilot sample may be really helpful. May the force be with you. David Booth
  • asked a question related to Sample Size
Question
7 answers
when I calculate the sample size to find the mean in ross-sectional study. I recommended use the formula: n=(Z^2 × σ^2)/d^2
Z = 1.96
σ = mean of previous study
Can you help me to explain d and how to determine d?
Relevant answer
Answer
Thanks for your sharing
  • asked a question related to Sample Size
Question
4 answers
Many statistical tests require approximate normality (normal distribution should be seen approximately). On the other hand, normality tests such as Kolmogorov-Smirnov and Shapiro-Wilk are sensitive to the smallest departure from a normal distribution and are generally not suitable for large sample sizes. They can not show approximate normality (Source: Applied Linear Statistical Model). In this case, the Q-Q plot can show approximately normal.
Based on what is written in the book "Applied Linear Statistical Model", a severe departure from normality is only considered, in this case, parametric tests can no longer be used. But if severe departure is not seen, parametric tests can be used.
What method do you know to detect approximate normality in addition to using a Q-Q plot?
Relevant answer
Answer
Neda Ravankhah
Thank you.
  • asked a question related to Sample Size
Question
4 answers
I am doing a retrospective quantitative cohort study. Need assistance in calculation a sample size.
Relevant answer
Una muestra es representativa si representa más del 10% de la población estudiada.. Operó para ser confiable en un 95% y a un nivel de significación de 0.05 debes hallar la proporción del evento. P=1915/8592
P= 22%
Por tanto necesitas una muestra igual o mayor a 256 para tener una muestra en el rango del nivel de confianza necesitas
  • asked a question related to Sample Size
Question
7 answers
This study will rate some (N number) of clinical photographs into 3 categories by different raters (n)? How do I calculate the sample size? Will it be based on N or n. Is there an online sample size calculator? Can anyone help me in working out a solution
Relevant answer
Hola David Morse. El respetable tamaño no es otra cosa que el nivel de confianza que quieras alcanzar, para las tres categorías de los evaluadores puedes considerarlos como estratos en esta hoja de calculo
  • asked a question related to Sample Size
Question
7 answers
For my research, I have to determine the sample size using G*Power. As I am not interested in conducting "pilot" research to see what sample size will be needed, I have to do a priori analysis. However, I am not used to G*Power, and I am a little bit confused. Firstly, my model consists of one IV with two levels & a moderator. Should I consider the total sample size that G*Power comes up with as the sample size per level (group) of the IV or the total for the two levels? Also, is there a "universal" effect size d that I can use as there is no previous research related to mine that can suggest an effect size?
I really appreciate any help you can provide.
Relevant answer
Answer
@David Morse is entirely correct in everything that he said in his posts. I would point out the use of a pilot sample is often to get a reasonable estimate of the population variance to be used in the sample size calculation. If you have no idea of that variance then a pilot sample may be needed unless say a paper exists that gives you an idea of the needed variance. Best wishes, David Booth
  • asked a question related to Sample Size
Question
5 answers
Can anyone please recommend any technique or a good reading to estimate sample size for quantitative survey based study in the field of organizational psychology. I need to collect matching data from employees and their supervisors and planning to use Structural Equation Modeling (SEM) for analysis.
Relevant answer
Answer
It depends on various factors. What is the population, what confidence level and intervals etc. Usually its 95% confidence level and 5 Confidence interval. If you know the population then you can use an online sample calculator to calculate for you using the above measurements.
You may also use a census method if population is small. In that case there is no need for sampling.
  • asked a question related to Sample Size
Question
5 answers
Hello! I'm running a Friedman two-way analysis because my sample is not normally distributed.
I've performed the analyses on different groups, paired on two periods. One of them, although having a considerable difference between the two periods, is not significant (Friedman 1.3 on 1 degree of freedom). I wonder if this is because this group is smaller (n=22) in regards to the others.
I've looking for evidence on Friedman robustness according to sample size but I haven't found anything substantial.
Thanks!
Relevant answer
Answer
First David Morse is correct in everything he said. I would like to add that robustness of nonparametric tests to similar parametric tests is oftentimes just not true. You might want to look at the attached google search for some studies on the topic. Best wishes, David Booth
  • asked a question related to Sample Size
Question
1 answer
I am about to use this formula for a Discrete Choice experiment with 4 attributes each contains 3 Level. Thus, I have some question that I am cordially requesting to get an answer for : - How to fixe the true proportion p as the the true choice proportion of the relevant population? - What relation can be provided between the DCE parameters (attributes, profiles ..) and the sample size calculation formula in term of fixing the minimum sample size required for the full study?
Relevant answer
Answer
You seem to be confused. The standard way is to choose p=0.5 , the largest value.p.can have and this gives the biggest variance and thus yields the minimum n to use. See any statisics book to follow such a calculation for the case of sampling proportions. Best wishes, David Booth
  • asked a question related to Sample Size
Question
6 answers
Dear All,
I am working on a data having cost of care as DV. This is a genuinely skewed data reflecting the socioeconomic gap and therefore healthcare financing gap among population of a developing country. Because of this skewness, my data violated normality assumption and therefore was reported using median and IQR. But I will like to analyze predictors of cost of care among these patients. 
I need to know if I can go ahead and use MLR or are there alternatives?
The sample size is 1,320 and I am thinking of applying Central Limit theory.
Thanking you for your anticipated answers.
Dimeji
Relevant answer
Answer
I am conducting a research on a topic assessment of Socioeconomic of GSM Masts on residents living in closed proximity
Please can you help me with relevant literature on the subject matter?
İf you don't have, do you know anyone that can help me with related literature?
I am an undergraduate student of Environmental Management Technology
Thanks
  • asked a question related to Sample Size
Question
8 answers
I would like to compare the food security status of two group of people. Here, I will use household hunger scale. sample size is large. then which statistics will be most suitable?
Relevant answer
Answer
  • asked a question related to Sample Size
Question
4 answers
I have two sample size (both relatively small anyway) n=4 and n=19 what statistical tests can I do to compare them effectively accounting for the gap in between the number of samples.
Relevant answer
Answer
You don't say what you want to compare, and all you describe is the sample size, so it must mean that you was to compare 19 and 4. There are many ways, the difference (19-4) and the odds (19/4) being two of the most common. If you meant something else, delete this question and write what you want.
  • asked a question related to Sample Size
Question
8 answers
I was initially running a multinomial logistic regression, with multiple predictors. However, the standard error turned out to be huge for the parameters. So, I ran a simple logistic regression with just one predictor, but the standard error was still huge. The possible reasons in my opinion are, that they are bad predictors for the outcome, or that the sample size is small.
Relevant answer
Answer
In introduction, sorry about the length of the following...
First, you can check for problems mentioned above: 1) for collinearity, try fitting the interaction between suspect variables with inflated SE, with and without main effects, which should result in small interaction SE and little change in model fit; 2) fit the model to Gaussian distribution, with will shrink SE to 0 if the problem is identifiability; 3) as a brute force check on sample size, rerun the model on a duplicate of your dataset appended to itself (rows 1-104=105-208), which will substantially reduce the SE if sample size is a problem but parameters will be identical.
What sample size "ought" to be is relative, dependent on your needs, data, and question, all of which determine your model and inferences. What do you consider "high", and why is it a problem? What is a reasonable expectation for the probabilities you are estimating?
Mathematically, the only requirement is that you have sufficient observations to estimate the population probabilities. For two treatments without an effect (p=0.5 for both), you only need 6 observations. If p=0.1 for a treatment, then you need at least 10 observations for that treatment; for p=0.01, n>100, etc. This all assumes that in each treatment, Y contains both 0's and 1's. So what is reasonable for you research?
There is a sort of rule of thumb oft quoted that you need n>10 per parameter, but this requirement is fluid and comes with many caveats. If in doubt and your data are all categorical, it can be appropriate and informative to check your results against an exact test or chi square.
As for "high standard errors", model ML SE is the reliability of parameter estimates based upon the data, not a measure of the reliability of your data per se. The question is if the interpretation of the output makes sense given your hypothesis and understanding of the subject.
Bear in mind that logistic regression estimates the logit of the mean, with the ML null hypothesis that p=0.5, i.e., the linear predictor=0. It's only on that scale that the SE has direct meaning relative to the values of your X variables or number of treatments.
Recall what logistic does: estimate probabilities. Logistic regression requires variation to discriminate between groups. Model predictions of 0 or 1 are statements that an event NEVER or ALWAYS occurs. In either case, there can be no variation, which leads to degeneracy; essentially, the algorithm is trying to divide by zero. Model likelihoods become zero and the log likelihood would be log(0), which does not exist.
In the case where SE gets into the thousands, there is near-perfect discrimination among your set of variables. There cannot be variation in the cases for explanatory variables where {Y} in one or more levels of a factor is all 0's or 1's or a continuous variable is split between 0's and 1's at some threshold. This forces parameter estimates to go to 0 or infinity, either way SE's exponentially grow.
Collinearity is another problem with similar consequences of imposing restraints of equivalency of parameter estimates, thus degeneracy. Parameters for highly collinear variables are ill defined and cancel each other numerically, thus fitting them is another form of infinite estimation (e.g., if X1={1, 2, ...} and X2=X1 +/- (0.1), logit(Y)=X1+X2 will produce equivalent slopes but 1 will arbitrarily be negative).
  • asked a question related to Sample Size
Question
2 answers
While using Propensity Score Matching Approach for an impact assessment of a programme inyervention, what is the acceptable ratio of the sample size for paticipant and non-participant?. Some have advocated for ratio 1:3 for participant to non-participant to allow for easy matching. I need your input please. The sample size of program participant in my study is 2,300. Thanks
Relevant answer
Answer
PS I forgot to mention that ease of analysis is of little or no importance to serious research.. Making a useful contribution is. D. Booth
  • asked a question related to Sample Size
Question
8 answers
Hello everyone,
I am validating an existing questionnaire (it has been validated in different countries and languages) in my country. Unfortunately, the data that we have gathered are unequal; 80% are females and 20% are males. Does this affect the validation process or am I good to proceed with data analysis?
Thanks,
Sara
Relevant answer
Answer
I don't think this is a problem. We usually do not consider the applicability of a validation procedure being dependent on a specific demographic. There is no requirement for questionnaire validation to use a stratified sample. If that was the case then we would need to stratify the sample about every possible demographic, not just gender. However, the re-use of the instrument which has been validated with one population in a different population may require an additional re-validation. If all you are planning to do is to use the questionnaire to analyse a research question with the same sample that you used to validate it then there is no problem.
  • asked a question related to Sample Size
Question
3 answers
The study would have N number of examiners who will be provided with a set of photographs of x number. They will be asked to identify whether the photograph which will depict a corneal ulcer is caused by either fungi, bacteria or a parasite. The response can only be one. How many N examiners will I need so that the study has a power of 80% and alpha of 0.05 that the answers provided are not due to chance? Here it is probably related to Fleiss K which I should take as 0.6. Therefore do I need to find for sample size either Number of examiners, or number of photographs, given categories (K) as 3.
Relevant answer
Answer
Interesting topic.
  • asked a question related to Sample Size
Question
6 answers
As I understand, with a large sample size chi square presents almost always statistical significance, so that we should not use this fit index to base on our acceptance/rejection decisions. What about chi square to degrees of freedom ratio ( χ2/df )? Are there circumstances where its use is advisable and circumstances where it is not?
Please, do not hesitate to suggest me very technical literature on this. I am curious about it and want to learn.
Relevant answer
Answer
Yes, with a large sample size, Chi-square will be statistically significant, and we should expect that the ratio X2/df will be outside of the "normally acceptable range", depending on whose guide you're reading.
This is where you look at the various fit indices as Sumit Mishra has advised.
  • asked a question related to Sample Size
Question
3 answers
I'm part of a team designing an observational study in which we're going to compare 2 models of intraocular lenses. The main outcome is contrast sensitivity.
I'm having some difficulty in understanding how we are going to handle this variable in statistical analysis and looking at the literature hasn't helped me much. It's also important because, being the main outcome, our sample size calculation depends on that.
Has anyone worked with something similar and can give me some pointers?
Relevant answer
Answer
Hi,
From the available literature get the stats of comparison between the two IOLs. Stats on the various measures of Contrast sensitivity and factors influencing them. Get hold of the comparative tests, Std. Dev, Std error, Confidence intervals. find out whether any Ethnic or other factors are important in the study. If your study is mostly similar to those studies, Put those values from the literature onto the Sample size calculation formulae and calculate the number for each group for a given level of significance and power. Then plan and execute the study.
Alternatively, as a part of the pilot project get the numbers and calculate the sample size.
There is sample size calculation software available on Google that you can use.
Here are some references on the topic under discussion:
Crnej A, Buehl W, Greslechner R, Hirnschall N, Findl O. Effect of an aspheric intraocular lens on the ocular wave-front adjusted for pupil size and capsulorhexis size. Acta Ophthalmol. 2014 Aug;92(5):e353-7. doi: 10.1111/aos.12344.
Ferrer-Blasco T, García-Lázaro S, Belda-Salmerón L, Albarrán-Diego C, Montés-Micó R. Intra-eye visual function comparison with and without a central hole contact lens-based system: potential applications to ICL design. J Refract Surg. 2013 Oct;29(10):702-7. doi: 10.3928/1081597X-20130919-03.
Hida WT, Motta AF, Kara-José Júnior N, Costa H, Tokunaga C, Cordeiro LN, Gemperli D, Nakano CT. Estudo comparativo do desempenho visual e análise de frente de onda entre as lentes intra-oculares multifocais difrativas Tecnis ZM900 e AcrySof ResTor SN60D3 [Comparison between OPD-Scan results and visual outcomes of Tecnis ZM900 and Restor SN60D3 diffractive multifocal intraocular lenses]. Arq Bras Oftalmol. 2008 Nov-Dec;71(6):788-92. Portuguese. doi: 10.1590/s0004-27492008000600004.
Avitabile T, Marano F. Multifocal intra-ocular lenses. Curr Opin Ophthalmol. 2001 Feb;12(1):12-16. doi: 10.1097/00055735-200102000-00004
  • asked a question related to Sample Size
Question
5 answers
We plan to use a Likert scale survey to explore the impact of an intervention on staff attitudes. How do calculate the sample size for paired responses (before and after implementation)? I assume we will use Wilcoxon signed rank test instead of paired T-test to analyse the difference as the data tend to be skewed. Please share any online calculators that I can use or information that I need to calculate the sample size, thanks!
Relevant answer
Answer
You can count on the schedule
  • asked a question related to Sample Size
Question
4 answers
Hydro chemical data analysis for the significance test.
Relevant answer
Answer
The short answer is no. Look up paired t test in any useable statisics book. D. Booth
  • asked a question related to Sample Size
Question
1 answer
I need help from the experts... my research will be conducted in health facilities that deals with tuberculosis patients. the study population is among TB patients newly diagnosed in health facilities. I planned to conduct a quasi-experimental study to assess the effectiveness of intervention in improving adherence to anti-tuberculosis medication where the outcome variable is dichotomous (adhered/ non adherence) based on calculation of percentage of medication ingested. intervention will be conducted at 8 weeks, with pre and post test evaluation. How do I calculate the sample size?
Relevant answer
Answer
You could compare difference in proportion adhering (%) before and after intervention. Then to detect a desired difference at 5% alpha significance with power 80% (say) use the standard normal approximation method given in any elementary statistics textbook.
an online calculator is given in the attached ref.
  • asked a question related to Sample Size
Question
3 answers
I have visualized the alteration of the cell wall during drought stress with the confocal microscope. Now, I want to measure the cell size at different time points. Is there anyone who knows how many cells I should measure?
I have studied Arabidopsis at 7 time points (days 0,1,2,4,7,15)
Relevant answer
Answer
Dear @Maryam Zekri The number and size of samples matter the most. Taking large size and more number of samples will greatly reduce "bias" from statistics applied. Moreover, I do agree with the suggestions given by @J.D. Franco-Navarro.
Best wishes, AKC
  • asked a question related to Sample Size
Question
4 answers
Considering the cut-off, removing items with low loadings may be the best strategy when conducting the Exploratory Factor Analysis. But, is the "Scale purification” ( the process of eliminating items from multi-item scales) sensitive to the sample size and the sample characteristics?
Relevant answer
Answer
Hello Fatemeh,
1) Eliminating items solely based on their EFA loadings is generally dangerous as it depends on the factor model being the appropriate data generating process. If the process is different (e.g., a network structure or a simple aggregate) then the loadings simply don't mean anything. Second, it is not uncommon that you have highly valid indicator that however measure their underlying latent variable in isolation (i.e., there are no other indicators that measure this latent). This latent factor cannot be identified by the EFA and these items will be mislocated to a factor with which the true factor is most strongly correlated. Their loading on this factor will be much lower as it is the product of the correlation between both factors and the true loading. If this is not clear, I can provide an example and a simulation.
2) In low sample sizes, you get a stronger variation of any estimated parameter. Hence, a good item may have a bad loading just due to sampling error
3) The sample matters in sofar, as you have to have a sample of that population for which measurment model holds. That is as in any other moderation situation, only that in this time a group/population characteristics moderates the effect of the latent variable on the item response (assumed that the latent variable exists in both groups/population which does not have to be case).
HTH
--Holger
  • asked a question related to Sample Size
Question
6 answers
Hi everybody,
here is my problem:
I have several regions which are represented by different isotope values and I would like to detect the variability between these regions and the values. Basically, the data looks like
a b c
0.788 0.759 0.797
0.786 0.756 0.798
...
I have performed the Bonett.Seier.test and it gives me a p-value for a,b. So, I assume this test can be used for any distribution and is not strictly tied to normal distribution, right?
Any suggestions what would be (more) useful?
thanks in advance!
Michael
_____
PS: here is the simply code:
#package
library(intervcomp)
#data
data <- read.csv("mydata.csv", header=TRUE)
#test
Bonett.Seier.test(data$a,data$yb"two.sided",0.05)
# output:
$Statistic
[1] -1.676137
$p.value
[1] 0.09371146
$Estimate
[1] 2.212373
[1] 0.8741855
[1] 5.599036
Relevant answer
Answer
Your logic is flawed. Being "significantly different" only means that you have sufficient data to demonstrate that the whole of your data is not well explained by a model assuming equal variances. Failing to see a "significant difference" only means that you don't have enough data to make a conclusion. But you take this to conclude that this-or-that workflow is ok. This is not correct. This is the same misconception typically made when testing assumptions for other tests (like normal distribution or homoscedasticity for linear models) and interpreting a "non-significant result" as a confirmation that the assumptions are ok.
  • asked a question related to Sample Size
Question
4 answers
I'm planning to conduct a Pilot Feasibility Study of a digital health intervention for individuals with low back pain. However, I wonder how I have to calculate the sample size, as feasibility will be the primary outcome.
Any help will be really appreciated!
Relevant answer
Thank you for sharing this paper. This's really helpful and clear!
All the best!
  • asked a question related to Sample Size
Question
18 answers
I just read here that normality tests like the Kolmogorov-Smirnov test and Shapiro-Wilks test are basically useless: https://www.spss-tutorials.com/spss-shapiro-wilk-test-for-normality/
"Limited Usefulness of Normality Tests
The Shapiro-Wilk and Kolmogorov-Smirnov test both examine if a variable is normally distributed in some population. But why even bother? Well, that's because many statistical tests -including ANOVA, t-tests and regression- require the normality assumption: variables must be normally distributed in the population. However, the normality assumption is only needed for small sample sizes of -say- N ≤ 20 or so. For larger sample sizes, the sampling distribution of the mean is always normal, regardless how values are distributed in the population. This phenomenon is known as the central limit theorem. And the consequence is that many test results are unaffected by even severe violations of normality.
So if sample sizes are reasonable, normality tests are often pointless. Sadly, few statistics instructors seem to be aware of this and still bother students with such tests. And that's why I wrote this tutorial anyway.
Hey! But what if sample sizes are small, say N < 20 or so? Well, in that case, many tests do require normally distributed variables. However, normality tests typically have low power in small sample sizes. As a consequence, even substantial deviations from normality may not be statistically significant. So when you really need normality, normality tests are unlikely to detect that it's actually violated. Which renders them pretty useless."
Is this true? Thanks!
Relevant answer
Answer
On the question, I'll vote, yes, using tests of normality to check the assumptions of parametric analysis is not desirable.
1. For small samples, these tests aren't particularly powerful. And for large samples, they might be too powerful --- that is, they might find significant deviations from the normal distribution even if these deviations are small. This creates all kind of havoc for people following this procedure.
1b. You can run a couple of simulations iteratively, in R or at https://rdrr.io/snippets/ .
The following generates a sample of only six observations from a log-normal distribution. Interestingly, even with this small sample, the Shapiro-Wilk test will report a significant (non-normal) result maybe 30% of the time. I don't know, I think that's pretty powerful for a sample size of six, but not a particularly reliable result.
A = rlnorm(6, 1, 1)
hist(A)
shapiro.test(A)
The following generates a sample of 1000 observations, that isn't too far off from a normal distribution. The Shapiro-Wilk test consistently reports a significant (non-normal) result. I probably wouldn't be worried about most of these distributions being meaningfully different from normal. So in this case the test probably isn't very useful.
B = rnorm(1000, 10, 4)
C = B^1.1
hist(C)
shapiro.test(C)
2. Using plots to assess a normal distribution is more useful. q-q plots and histograms (especially if a normal distribution is super-imposed) of residuals are useful. Looking at these plots might also make any outlier observations obvious, and other plots of residuals will make meaningful deviations from homoscedasticity apparent.
3. Ideally, you would have some idea about what (conditional) distribution your data would follow. Some things in the world are normally distributed. Others are log-normally distributed. If you have sense of the likely distribution of the data you might use e.g. Gamma regression or beta regression, or something appropriate for count data like negative binomial regression.
If you have some sense of the like distribution of your data, the residuals for your specific data may be less important than the likely distribution of the conditional data or the likely distribution of the test statistic.
  • asked a question related to Sample Size
Question
7 answers
Hi,
For my mastersthesis, I am doing an OLS-regression (sample size 75 with 4 independent variables). There seems to be a heteroskedasticity problem and I tried to fix this with robust std errors but afterwards my F-statistic was not significant, before it was.
Anyone tips to fix this?
Relevant answer
Answer
Hi,
Here is a reference for the resolution of Heteroskedasticity
This particular video might be helpful.
  • asked a question related to Sample Size
Question
5 answers
I have 28 parameters and therefore I figure that I'll need 280 participants at a minimum? Any thoughts appreciated. Thanks
Relevant answer
Answer
There is no fixed rule that applies to all situations. Also, your sample size may be large enough to obtain unbiased parameter estimates, but you may not have enough power to detect the effect sizes (paths) of interest (especially if the paths are small in size). The best way to study sample size requirements for path models and SEM/CFA is by running a Monte Carlo simulation through which you can study bias in fit statistics, parameter estimates, standard errors specifically for your model, target sample size, and expected effect sizes. Simulations also allow you to determine statistical power empirically.
I offer a free mini-workshop on sample size planning/power analysis with Mplus that uses path analysis as an example:
See also:
  • asked a question related to Sample Size
Question
9 answers
Please guide me with references that how much minimum sample is required to conduct Bibliometric anaysis.
Relevant answer
Answer
It is obvious from basic statistical theory that larger samples lead to analytical outcomes that are likely to be more ‘accurate’ in the sense of providing a result that is closer to the true population mean. However, a sample of 1000 is actually acceptable.
  • asked a question related to Sample Size
Question
6 answers
Hi,
I'm making simulations to compute required sample size according to different pre-determined proportions (sensitivity) and different marginal errors (i.e. the half width confidence interval). Pre-determined proportions ranged from 0.65 to 0.99, and marginal errors from 0.05 to 0.20
For example, for a pre-determined proportion of 0.70, with a marginal error of 0.10, the required sample size will be: [1.96²x0.70x(1-0.70)]/0.10²=81
My issue is about high proportions and high marginal errors. For example, if i want to calculate the required sample size for a pre-determined proportion of 0.99 with a marginal error of 0.20, it would mean the lower confidence interval will be 0.79 and the upper one will be 1.19, which does not make sense (a proportion can not be superior to 1).
I think there is a specific formula to compute required sample size in this situation but I hav not found it yet.
Do you have any ideas on how to do this?
Thank you for your advise.
Relevant answer