Science topic

Statistical Analysis - Science topic

Explore the latest questions and answers in Statistical Analysis, and find Statistical Analysis experts.
Questions related to Statistical Analysis
  • asked a question related to Statistical Analysis
Question
3 answers
Hello,
We are conducting a longitudinal study on resilience in children. We have taken initial measures using a single scale that results in a continuous value and will continue taking measurements each school year for the next five years. What would be the most appropriate statistical analysis for this study in which I have nominal variables (grade, subject ID, time point) and a single continuous variable (resilience score)?
Thank you!
Relevant answer
Answer
Probably Repeated Measures ANOVA, but I might not be understanding your design fully. The resilience score is the repeated measure, repeated over successive time points. If by 'grade' you mean the year in school, that would be your independent variable. (There is a lexical ambiguity between grade or year in school on the one hand and earned grades on the other.)
  • asked a question related to Statistical Analysis
Question
20 answers
Hi everyone,
We have implemented four metaheuristic algorithms to solve an optimization problem. Each algorithm is repeated 30 times for an instance of the problem, and we have stored the best objective function values for 30 independent runs for each algorithm.
We want to compare these four algorithms. Apart from maximum, minimum, average, and standard deviation, is there any statistical measure for comparison?
Alternatively, we have four independent samples each of size 30, and we want to test the null hypothesis that the means (or, medians) of these four samples are equal against an alternative hypothesis that they are not. What kind of statistical test should we perform?
Regards,
Soumen Atta
Relevant answer
Answer
In the following article, the performance of a new metaheuristic algorithm named QANA was compared with ten comparative algorithms using three statistical tests, Wilcoxon signed-rank sum, analysis of variance (ANOVA), and mean absolute error (MAE).
  • asked a question related to Statistical Analysis
Question
4 answers
Hello,
I am currently writing my dissertation and have a few questions regarding reporting my results. I am not very confident in statistics, so please bear with me if this doesn't make any sense.
I am using a two-sample Wilcoxon test for my statistical analysis which I am doing through R. I am struggling with how to report the results as am only given a W and p-value. However, I have seen that most people report their results using an additional U and Z value. Is it ok to report using only the W and p value?
Additionally, I have read conflicting advice on whether to report the mean or median values! Could someone please help? My data doesn't follow a similar distribution pattern, so I am unsure if reporting medians would be viable. However, I have been told that I shouldn't report means and standard deviations on non-parametric tests. I am a little confused about what the best thing is to do.
Please can someone help!
  • asked a question related to Statistical Analysis
Question
7 answers
The construct was measured via different scales in order to use age-appropriate instruments.
Relevant answer
Answer
Hi Kate! I've just come up against this problem. Best practice may have changed since 2013 but I'm interested in hearing what approach you took.
  • asked a question related to Statistical Analysis
Question
6 answers
In short, my MSc thesis involves providing recommendations to industry and these recommendations have been put to a series of experts in the form of a survey in order to validate the recommendations.
For each recommendation, four questions are asked relating to:
1. Whether the expert would advise clients to implement the recommendation
2. Whether businesses can generally afford to implement the recommendation
3. Whether businesses would understand the recommendation
4. Whether businesses could implement the recommendation without external support
Each of the four questions utilises a 5 point likert scale ranging from very unlikely to very likely.
I have never undertaken statistical analysis before and have historically been more of a words person than a numbers person. I'm intending to use SPSS for data analysis but if anyone were able to point me in the right direction of appropriate tests to run I would be exceptionally grateful.
  • asked a question related to Statistical Analysis
Question
6 answers
I have made detection for microsatellite SSR markers in patient with plasmodium...
for statistical analysis, I have used genalex 6.5 ...
I still need to make confirmation with another software...
can anyone help me within this point?
Relevant answer
Answer
Arlequin and GenePop are a good softwares.
  • asked a question related to Statistical Analysis
Question
14 answers
I have 2 groups I tested before and after the experiment. Comparing the HR, BP, and other anthropometric before and after in control and experiment groups. Used Wilcoxon signed-rank test to compare before and after results in each group, which method is best for statistical analysis of controls over experiment group before and after. The number of subjects is between 5 to 10 in each group.
Relevant answer
Answer
Rolando Gonzales , good point to mention Bayesian analysis. This requires and uses even more assumptions (or prior "knowledge") than the "standard" (parametric) analyses, and the information provided by a small sample adds a bit to the amount of information that is coded in the model an in the prior. I just want to note that the Bayesian inference using small samples is as good as the prior information that is used. I mean to say: it pays off only if there is such information. Using "non-informative" or "flat" priors won't bring much of an advantage.
Thank you for mentioning JASP. This is a cool tool, not only for Bayesian analysis. I can add JAMOVI (https://www.jamovi.org/), which is quite similar to JASP (I am also not related to these software in any way, I just think these are good free alternatives to the proprietary and costly programs out there).
  • asked a question related to Statistical Analysis
Question
1 answer
I am conducting a study on multiple marginalized identities and mental health outcomes.
What would be the best data analysis I should use to assess the interaction between different variables and the way they intersect with each other.
Thank you
Relevant answer
Answer
I think that a mixed methods study which includes a qualitative analysis could be better to assess that intersectionality. I did It in my masters' research and I found Very positively results.
You can acess it at this link:
  • asked a question related to Statistical Analysis
Question
4 answers
Hi
As you know, for using nonlinear regressions in statistical software like SPSS or Minitab or codes, you need to determine a start point(start value or initial guess) for regression's parameters. Actually you need to choose an optimum start value for parameters to achieve the best nonlinear equation.
How can we determine the optimum start value?
Is there an other way (the other software) to use nonlinear regression regardless to parameter's start value?
Relevant answer
Answer
NO. But a numerical analyst would do plots and look for approximate values of the regression coefficients. As my old numerical analysis professor used to say graphs tell you lots and lots of cool stuff.
David Booth
PS if you had an optimum start you wouldn't need anything else would you?
  • asked a question related to Statistical Analysis
Question
7 answers
I feel like maybe this question is so easy that it's hard and I keep doubting every choice I make, so I thought I would just finally ask online. I have one dependent variable with three levels: percentage of caches a bird make in one of three different types of substrates (sand, gravel, or other). My independent variable has two levels (the three conditions the birds are in whilst caching).
I pretty much know my pattern of results and I have figures drawn up and everything (I can post a small subset of the data if that would be helpful to people in answering the question), but I cannot, for the life of me, figure out if I have chosen the right test to analyse this data and I keep having doubts every time I make any progress.
I'd ran a similar study not too long ago that was almost identical, except the independent variable only had two levels (gravel and sand). Therefore, I could just run a Friedman ANOVA (since the data violated normality assumptions) for percentage of caches in one or the other substrate.
Right now the only solution I have been able to come up with for this is running multiple Friedman ANOVAs, but, as mentioned, I am really starting to doubt this. This is primarily because--unlike when I only had 2 levels in my DV--, I cannot just run a test for one of the levels (e.g. in my first experiment, if 25% of the caches were in sand, you automatically knew the other 75% were in gravel; however, for this experiment, if 25% are in sand, that only means that 75% are in gravel or "other" and you do not know how it is divided up between the these two).
I know there are more advanced forms of analysis as well, like mixed models, but I've been specifically advised against doing those in this particular instance.
It's just that I keep writing and writing, then I get freaked out about my stats, so I do a bunch of research, then I try to re-calculate and re-write….and it's getting to the point where I can't do that anymore. So, I kind of need to know now if I need to change my method of analysis.
Could anyone give me advice?
Relevant answer
Answer
Good advice.
  • asked a question related to Statistical Analysis
Question
4 answers
Doing this reduces uncertainty on flat part of trend, closer to present day, because the small values at the early part of the trend have the largest uncertainty and this carries through to present day uncertainty.
Relevant answer
Answer
Maybe you can consider the recursive least squares algorithm (RLS) with forgetting factor (RLS-FF). RLS is the recursive application of the well-known least squares (LS) regression algorithm, so that each new data point is taken in account to modify (correct) a previous estimate of the parameters from some linear (or linearized) correlation thought to model the observed system. The method allows for the dynamical application of LS to time series acquired in real-time. For the RLS-FF algorithm, acquired data is weighted according to its age, with increased weight given to the most recent data.
― I have applied the RLS-FF algorithm to estimate the parameters from the KLa correlation, used to predict the O2 gas-liquid mass-transfer, hence giving increased weight to most recent data:
  • asked a question related to Statistical Analysis
Question
10 answers
I have data for about 5500± of trees flowering data for 4 consecutive flowering events. Need more idea on what statistical analysis that i can do ? or any suggestion ?
Relevant answer
  • asked a question related to Statistical Analysis
Question
3 answers
I am trying to do an analysis of the Job Demands and Resources Model (JDR) with a one mediating variable using the Hayes model.
However, I am not sure how I can do an analysis of one big model of both demands AND resources (Multiple independent variables) and Outcomes (dependent variables). Motivation is the mediating variable.
Relevant answer
Answer
@Kristina Below attached is our paper based on JD-R model with mediation effect. Hope you find it helpful.
  • asked a question related to Statistical Analysis
Question
9 answers
I believe perspective research is all about publishing new ideas or arguments in terms of statistical analysis, optimization of data in innovative ways or simply an opinion/perspective about existing research on an interesting topic.
I am looking for some journals/Editorial in materials science who can either accept our proposal or allow us to publish perspective research article in their journal.
Relevant answer
Answer
Thank you, Prof. Madhukar Baburao Deshmukh
  • asked a question related to Statistical Analysis
Question
3 answers
Like if I use a biochemical marker at admission as predictor of mortality in the long term what statistical analysis would be bet suited?
Relevant answer
Answer
The first two responses can be summarized by saying what is your research question? Perhaps the attached would be helpful to you
Best wishes, David Booth
  • asked a question related to Statistical Analysis
Question
4 answers
I have measured the RMSE for both groups for different dentures, but have an issue with the statistical test that I should be using? can I use Paired T_test? or what test should I go for to combine all RMSE values of all dentures in each group? and how to compare two models for statistical significance?
Relevant answer
Answer
Hi Eman
RMSE means the standard deviation of errors.Therefore, its square is the variance of errors (MSE). You can devide error MS of one model (i.e. MSE) by the error MS of another model and compare the resulting F with F table based on corresponding DF of model's error for both models.
That simple.
Best
Hani
  • asked a question related to Statistical Analysis
Question
10 answers
I need to defend PCA against structural equation models...
Relevant answer
Answer
In essence, the PCA "decomposes" the signals that are the most important for your work and allows the easy "reconstruction" of an approximation of the plurality of signals, or data.
After performing the PCA on a matrix with MxN elements and you select a subset on mxn (m<M, n<N) you end up with the "most important" information that is in your matric of signals. if you have data with 1000x1000 and you can select, after decomposing the data to the most important 10x10 components, by the application of the PCA, (if this is an audio data for example), you will still be able to recognise the audio file (because the Principal Components are contained).
I hope that I explained the practical aspect of the PCA.
In other words, this is to remove redundant information, also it is somehow self-calibrating, because it is "normalizing it" so data can be easily compared, and you don't need really to compare all the data (say 1000x1000) you can compare the 100x100 selection of the Principle components if they "describe" well the data.
  • asked a question related to Statistical Analysis
Question
5 answers
Hi
I have a dataset and I need to check the goodness of fit for Pearson type3 for them.
How can I do it?
Any software? Any MATLAB code?
Thanks
Relevant answer
Answer
Amirhossein Haghighat the definition of the chi squared statistic for a sample is the sum of (x-E(x))^2/s^2, where s^2 is the variance of the data. When the variable is Poisson-distributed, then you are in the particular case where s^2=E(x).
Good luck
  • asked a question related to Statistical Analysis
Question
5 answers
I am new to statistical analysis. I came to know about the two terms: correlation and causation. Correlation and causation seems similar but they are not the same thing. Sometimes it becomes confusing to differentiate whether the relationship between two or more phenomena are causal or correlational. What are the ways to point the differences?
For example: Do rainfall and flooding have correlation or causation? How can we evaluate?
Relevant answer
Answer
Establishing if two variables are correlated is relatively easy, however, it is much harder to establish causality. Causal discovery is not a new topic, but it is one that is gaining traction in recent years (see here )
  • asked a question related to Statistical Analysis
Question
4 answers
I am doing a research on the impact of social media on labour productivity, with LP broken into 3 categories. A questionnaire was administered to employees comprising of a mix of open-ended questions, as well as a 5-point Likert scale.
But I am at a loss with which statistical analysis to use to analyse the responses obtained.
Relevant answer
Answer
logistic regression
  • asked a question related to Statistical Analysis
Question
11 answers
How do I know if the data I collected follows a normal distribution or not through SPSS? And if it does not follow a normal distribution, how do I convert it so that parametric tests can be performed on it?
Relevant answer
Answer
Very easy and good video to learn about the data collected follows approximately normal distribution or not through SPSS
(Normality test using SPSS: How to check whether data are normally distributed)
  • asked a question related to Statistical Analysis
Question
5 answers
Hi
I have 4 IV (categorical data) - age, gender, ethnicity, job position
and 1 DV (continuous data) - muscle strength
What is the most suitable statistical analysis because I want to find which factor (age, gender, ethnicity, job position) contributed the most toward the muscle strength?
Thanks!
Relevant answer
Answer
Timo Van Canegem is totally right, in any case age can be used as a continuous variable, with respect to dummy binary variables (any categorical variable with a number = p of categories can be dissected into p yes/no variables) they behave in the same way that continuous variables in regression modeling as aptly demonstrated by Cox (see attached)
  • asked a question related to Statistical Analysis
Question
4 answers
I am trying to measure the level of awareness (obtained through Likert-scale and total scores categorized as Low, Moderate, High) of participants regarding a certain topic.
Relevant answer
Answer
Your original scale seems to provide the equivalent of an interval-level dependent variable, so you can use ordinary regression. If for some reason you do need to break your continuous dependent variable into three categories, then ordinal logistic regression is appropriate -- but note that going from a continuous variable to a three-category variable involves a considerable "loss of information."
  • asked a question related to Statistical Analysis
Question
10 answers
Hi Reserchers,
I am doing a computer science dissertation on the topic '' Automate text tool to analysis reflective writing''.
The hypothesis set is ‘To what extent is the model valid for assessed reflective writing?’ I just want from the questionnaire( closed ended questions and one open question) to validate the proposed model.
I have used the used the 5 point likert scale for analysing the data, option given strongly agree, agree, neutral, disagree, strongly disagree. The sample size is 10 participants. I have chosen my participate based on their experience, career and knowledge of the reflective writing.
1) Which statistical analysis tool shall I use to analyse 10 sample size to validate the model? Please show me step by step on how to analyse the data?
2) What would be the associated hypothesis?
3) Can I use Content Validity Index with 10 sample size participants on the questionnaires using 5 point likert scale?
4) this step on my research Is it qualitative method or quantitative method?why?
If you have any suggestion on my hypothesis, the sample size and the tool I need to analyse?
Thank you in advance !
Relevant answer
Answer
but i have a worry how will you increase your population are you going to change your area of study? i have a similar case with a population of just 13 teacher how do i test for the relibility of the quetionares
  • asked a question related to Statistical Analysis
Question
4 answers
I have an hedonic sensory data which is generated by using 2 samples and 30 same panelist. Do I have to use the statistical independent methods (Mann-Whitney U) or dependent ones (Wilcoxon test)?
Note: the data isn't normally distributed
Thank you
Relevant answer
Answer
Your data is not independent. The assumption of independence can never be met in hedonic test (and sensory test in general) involving evaluation of multiple samples in the same session.
It is in my view much more important to think about this when preparing the experiment (e.g., use a warm-up samples, randomize serving order, allow sufficient recovery between samples, etc). If the experiment is properly conducted you probably will get the same conclusion regardless of which of the two tests you use. You can probably even use a paired sample t-test which is sufficiently robust against non-normality.
Off topic, N= 30 is a very low sample size for hedonic data. It may also be the reason why the data are not normal.
  • asked a question related to Statistical Analysis
Question
28 answers
A few articles with qualitative method have used a descriptive statistic analysis and a statistics profesor said that it is correct. My question is: if you use statistics in those researchs, won't it be a mixed study?
Relevant answer
To answer this valuable question, let me repeat what I have written in my recording sheet during my bachelor study:
  • While Quantitative data collection methods use mathematical calculations to produce numbers, Qualitative data collection methods concern with words and produce descriptions.
  • While Quantitative methods are more structured and allow for aggregation and generalization, Qualitative methods are more open and provide for depth and richness.
  • Quantitative and qualitative each has their strengths and weaknesses. Sometimes numbers are more useful; other times, narrative (qualitative data) are more useful. Oftentimes, a mix of quantitative and qualitative data provides the most useful information. However, arriving at the correct-manageable mixture of them is not an easy task.
Now, the following points can help you to choose between the two methodologies:
  • The purpose of your evaluation. The knowledge gap in an area that needs to be investigated and the goal you seek to achieve have a great effect in determining which methodologies is more convenient. The research problem itself identifies what is problematic about a given topic.
  • The respondents (i.e. audience): how can they be reached?
  • Resources available: The choice between them ultimately depends upon the available resources like time, money, and analytical tools.
  • The needed types of information.
Furthermore, there are other factors that have a great effect like cultural considerations.
  • asked a question related to Statistical Analysis
Question
6 answers
What a standard deviation 25 tells us?
Relevant answer
Answer
There are some indices to scale how much the data are deviated from the point (that they are concentrating around). These indices takes non-negative values. Zero value means that the data are of the same value. The more, these indices are, the farther, the data are from each other. Variance, standard deviation, range, difference of bounds and difference of quartiles are of these criteria.
  • asked a question related to Statistical Analysis
Question
4 answers
I am conducting confirmatory factor analysis using lavaan in R with several both first- and second-order factors in the model (20 latent constructs and 97 items in total; n = 511; no missing values). After running the CFA, the model fit is not sufficient, but some low factor loadings, a couple Heywood cases and several indications from the modification index give some pointers for model improvement. Deleting some items from the model and re-structuring two factors based on an EFA eliminates Heywood cases and results in acceptable model fit (Chi-square=4117, DF=2646, CFI=.94, RMSEA=.038, SRMR=.052), however, the CFA of the adjusted model now returns the following error message: "covariance matrix of latent variables is not positive definite". There aren't any negative values in the covariance matrix though and also not in the correlation matrix.
I'm fairly new to this kind of statistical analysis and have no clue what causes this error and how to fix the issue. I'd appreciate any help. Please find attached the original and adjusted CFA measurement model as well as the covariance and correlation matrices of the adjusted model causing the NPD error.
Thanks a lot for your input!
Relevant answer
Answer
Sometimes a message like this occurs when several factors are highly (but not necessarily perfectly or > |1|) correlated. High correlations among factors can cause linear dependencies (e.g., one factor being perfectly predictable from a set of other factors) without there being negative variance estimates or correlation estimates > |1|.
  • asked a question related to Statistical Analysis
Question
4 answers
In R-studio software which statistical analysis will do for moisture content in soil?
please guide
Thanks in advance..
Relevant answer
Answer
Thanks for responding Uchenna, basically I collected the soil samples from two plots as well as soil nutrients data (pH, OM, TP, TN). Now I wanna to see... how the soil nutrients describe influence the soil water content? That's my question
  • asked a question related to Statistical Analysis
Question
4 answers
I want to make a stacked bar graph with showing each data value as dots. I tried with graph pad prism, version 6, but could not succeed. Does anyone have tried before making such graphs?
Relevant answer
Answer
I used Prism 8 these days which has this feature of plotting the graphs. But I heard that we can plot dot plots scattered graph using prism 6 or higher. Even we can do it with prism 5, but it is manual and time consuming.
  • asked a question related to Statistical Analysis
Question
6 answers
since i am using likert scale for both dependent and independent variable
Relevant answer
Answer
SEM, CFA,EFA
  • asked a question related to Statistical Analysis
Question
4 answers
(My research is about development of FR intumescent coating by addition of additives, my goal is to have a more thermal resistive sample than the controlled sample without neglecting the time(aim: lesser average temp or better "temperature vs time" relationship( negatively related)). I performed a horizontal burner fire test for 8 different samples, recording the temperatures in 1 hour per minute from each samples.What statistical tool/method do I use the data gathered( time vs temp) to conculde that sample a is better than the controlled sample? I am not quite sure to use correlation and not knowledgeable enough to other tests thank you.
Relevant answer
Answer
Dou you have several readings (Y) per sample over time (T), like you put a sample in fire and measure every minute over one hour, so you have about 60 readings per sample?
If so:
Is there some theory specifying the functional relationship between Y and T (e.g. should that be an exponential or linear relationship)?
Plot Y against T for each sample and see what the relationship in your sample data might be, or if it is ain accordance with theory.
Fit the relationship per sample, using a regression model. The model should include some parameter of interest, e.g. a slope or a half-time. Collect the fitted values of this parameter.
You can finally test hypotheses about the difference or the ratio of this parameter depending on the additive. For instance, if the regression model is a simple linear regression and the relevant parameter is the slope of the regression line, you can use a t-test. If the regression model is exponential and the parameter of ionterest is the half-time, you can use a t-test with the logarithm of the half-time.
  • asked a question related to Statistical Analysis
Question
7 answers
Hello!
So, here is the story. I was give this Likert scale data for analysis, and I just can't get it how I should deal with it. It is a 1-7 scale with answers ranging from 1 being "extremely worse" to 7 being "extremely better". But here is the problem, 4 is "same as was before" and questions introduce the changes as an effect of a different variable, which is work from home (for example, "Compared to work from office, how much has your ability to plan your work so that it was done on time changed when working at home?").
Questions are separated into some groups to form variables, and mean should probably show each person's opinion on the change, right? But it just seems too strange to me to work with just 1 parameters and not go through full comparison of now vs before as 2 different constructs.
If you have any works or insight on the topic, can you please help me?
All the best and take care!
Relevant answer
Answer
I agree with David L Morgan, Likert-scored items are ordinal. Don't worry about 4 (same as was before), it is a neutral option in Likert scale. Based on data distribution, you can use different statistical tests.
  • asked a question related to Statistical Analysis
Question
6 answers
Hello, thank you in advance for any answer!
A version of the instrument, the CTS-2 (Revised Conflict Tactics Scale), adapted to my country (Portugal), was used in the study I am working on, in order to find out the tactics people use to resolve conflict in an intimate relationship. It is comprised of 39 items, which are to be answered on a Likert scale, ranging from 1 (one time in the previous year) to 8 (it never happened). Every item is to be answered 2 times, one time to specify if the individual has done what says in the item (perpetration), and one time to specify if it was ever done to them (victimization), which represents a total a 78 items. It has five sub-scales (negotiation; physical abuse; psychological agression; sexual coercion; injury) and each sub-scale has three different levels (minor; severe; total), with the items being divided by these levels and sub-scales (e.g. items 11 and 71 belong to the Injury sub-scale, minor level).
As specified by both the original author and the authors which adapted the instrument, to obtain the global prevalence of either victimization or perpetration, being that perpetration is the focus of my study, one must transform every item into a dichotomous variable (1 through 7 is 1, indicating it has happened and 8 is 0, indicating it never happened). This is where my question comes in. I have transformed every item into a dichotomous variable, but I am completely lost at to what the SPSS processes are to obtain the percentages of perpetration for the levels and sub-scales from this dichotomous items.
Thank you very much
Relevant answer
Answer
Your question is to some extent inexpressive. If possible, explain more to get better answers.
  • asked a question related to Statistical Analysis
Question
3 answers
I'm part of a team designing an observational study in which we're going to compare 2 models of intraocular lenses. The main outcome is contrast sensitivity.
I'm having some difficulty in understanding how we are going to handle this variable in statistical analysis and looking at the literature hasn't helped me much. It's also important because, being the main outcome, our sample size calculation depends on that.
Has anyone worked with something similar and can give me some pointers?
Relevant answer
Answer
Hi,
From the available literature get the stats of comparison between the two IOLs. Stats on the various measures of Contrast sensitivity and factors influencing them. Get hold of the comparative tests, Std. Dev, Std error, Confidence intervals. find out whether any Ethnic or other factors are important in the study. If your study is mostly similar to those studies, Put those values from the literature onto the Sample size calculation formulae and calculate the number for each group for a given level of significance and power. Then plan and execute the study.
Alternatively, as a part of the pilot project get the numbers and calculate the sample size.
There is sample size calculation software available on Google that you can use.
Here are some references on the topic under discussion:
Crnej A, Buehl W, Greslechner R, Hirnschall N, Findl O. Effect of an aspheric intraocular lens on the ocular wave-front adjusted for pupil size and capsulorhexis size. Acta Ophthalmol. 2014 Aug;92(5):e353-7. doi: 10.1111/aos.12344.
Ferrer-Blasco T, García-Lázaro S, Belda-Salmerón L, Albarrán-Diego C, Montés-Micó R. Intra-eye visual function comparison with and without a central hole contact lens-based system: potential applications to ICL design. J Refract Surg. 2013 Oct;29(10):702-7. doi: 10.3928/1081597X-20130919-03.
Hida WT, Motta AF, Kara-José Júnior N, Costa H, Tokunaga C, Cordeiro LN, Gemperli D, Nakano CT. Estudo comparativo do desempenho visual e análise de frente de onda entre as lentes intra-oculares multifocais difrativas Tecnis ZM900 e AcrySof ResTor SN60D3 [Comparison between OPD-Scan results and visual outcomes of Tecnis ZM900 and Restor SN60D3 diffractive multifocal intraocular lenses]. Arq Bras Oftalmol. 2008 Nov-Dec;71(6):788-92. Portuguese. doi: 10.1590/s0004-27492008000600004.
Avitabile T, Marano F. Multifocal intra-ocular lenses. Curr Opin Ophthalmol. 2001 Feb;12(1):12-16. doi: 10.1097/00055735-200102000-00004
  • asked a question related to Statistical Analysis
Question
5 answers
What feature analysis techniques or new approaches can be applied to analyzing cardic ultrasound images for detection of a defect.
Relevant answer
Answer
Suggest following ..perhaps expanding the reach of the Q to more related
areas might bring more answers ..
1. Comparison with certified standard Ground Truths of images of healthy and
defective hearts
2. From 1 it will mean how we can identify some measures to
distinguish. defects .. using comparisons ..possibilities are Euclidean distance measure..EDM
3. Shape/ texture / statistical moments of higher order
4. Fiedler vector for spectral partition using SVD
Cheers
  • asked a question related to Statistical Analysis
Question
6 answers
Good day, everyone!
I'm doing a research on Personality Traits, Reading Habit, and Writing Achievement. My data were collected using Likert-scale questionnaires (for personality traits and reading habit) and a writing test. My purpose is to find the correlation between personality traits and reading habit, personality traits and writing, and reading habit and writing.
Does anyone know what kind of statistical analysis I should use? Thank you, have a nice day!
Relevant answer
Answer
Siti Khairunnisa, for people to be able to help you effectively, I think you need to provide information about how you were measuring personality and reading habits. For example, did you use multi-item scales for each of those variables, and, if so, were there subscales on those scales?
Also, what did scores on the writing test look like? Were they pass/fail, or percentages, or what?
  • asked a question related to Statistical Analysis
Question
5 answers
I am a Master student and for my graduation I must perform some statistical analysis. I created the questionnaire myself: it begins (section 1) with asking characteristics about the person without using the Likert scale (public/private; previous collaborations; amount of partners..) then the questionnaire is divided into 3 further sections. The second section is divided into 9 themes, and each theme contains between 1 and 4 statements (some negative and some positive - actually, will this be problematic?) to which respondents have to answer using the Likert scale and say whether they experienced this "theme" during collaboration. After, the third section, they are asked about their satisfaction in relation to the nine themes (1 statement per theme) as a measure of effectiveness. Finally, in the fourth section, which is very small, I ask them about whether benefits outweigh the drawbacks and whether they see the collaboration as helping them achieve sustainable development goals (SDGs) as a further measure of effectiveness.
1. How is it possible for me to look at whether their characteristics (independent variables, section 1) affect their perception of these 9 themes (section 2)?
2. How can I see if their perception of these nine themes (section 2) is correlated with their satisfaction (per theme, section 3)? And whether their separate characteristics (section 1) are correlated with their satisfaction (section 3)?
3. How can I see if their satisfaction (section 3) is correlated with their perception of whether the collab is helping achieve SDGs (section 4) and their overall thoughts on benefits/drawbacks (still section 4). I.e. if effectiveness can be measured by satisfaction, as well as via the perception of whether the collab helps achieve SDGs
Relevant answer
Answer
Lots here. Let's start with the first one.
"1. How is it possible for me to look at whether their characteristics (independent variables, section 1) affect their perception of these 9 themes (section 2)?"
First, you have phrased this as a causal question, how one set affects another. Without further information about the study it is difficult to know if/how you can address this. But if you want to know how one set of variables is associated with another, depending on what you mean this is possible. Are you interested in how each individual item is set A is associated with each individual item in set B, or are you interested in, for example, what is the highest correlation between some summary of the items in set A with some summary of the items in set B? Which of these you are interested in will lead you down a different analytic path.
I won't look at the other questions you have yet. Worth taking steps one at a time!
  • asked a question related to Statistical Analysis
Question
5 answers
Hi esteemed colleagues,
I am seeking assistance on computing regions of significance following a significant interaction. My model is as follows: Condition (Three levels - predictor), word count (predictor), WC*Condition (interaction), and depression ratings (outcome), and T1 depression ratings (covariate).
As noted above, the interaction is significant and now I would like to determine at what regions of word count do conditions significantly differ. For instance, the treatment only differs from controls when participants write > 2,000 words... that kind of thing.
I was thinking of using the Johnson-Neyman Technique, but I'm unsure how I would compare conditions with that approach, as the technique typically evaluates simple slopes of continuous variables... though maybe that is inaccurate. I'm most comfortable with SPSS, though am learning R as well.
Thanks in advance!
Relevant answer
Answer
Hello William A. Schraegle. When you say you are considering using the Johnson-Neyman technique, I imagine a graph like figure 2 here:
Y = the simple slope for one variable at a range of values for the other variable (i.e., the one shown on the X-axis). There is also a horizontal line at Y (the simple slope) = 0. As I understand it, most people identify a region of significance (for the simple slope) as the region on the X-axis for which the 95% confidence region around the simple slope excludes 0 (i.e., does not cross the horizontal line at Y = 0). If Fig 2, for example, the region of significance for the simple slope is at X < 6.04 and X > 7.06.
For a couple reasons, I have often wondered if a better approach would be to determine the smallest simple slope of interest (or that would be considered practically important), and then ask if that minimally important simple slope is statistically significant. This, it seems to me, would mitigate some of the multiplicity issues implicit in the Johnson-Neyman technique; and it would serve as a good reminder of the important distinction between statistical significance and practical importance.
HTH.
  • asked a question related to Statistical Analysis
Question
8 answers
Hello,
My friend is seeking an collaborator in psychology-related statistics. Current projects including personality traits and their relations to other variables (e.g., age). You will be responsible for doing data analysis for potential publications. Preferbably you should have some knowledge about statistics and is fimaliar with software that is used to do analysis (e.g., MATLAB, R, SPSS). 10 hours a week is required. Leave your email address if interested.
Relevant answer
Answer
Psychological Councilling data of pateints can be analysed statistically - https://www.ncbi.nlm.nih.gov/books/NBK425420/
  • asked a question related to Statistical Analysis
Question
4 answers
I want to study the hypothesis that higher levels of all 3 IV would lead to higher levels of the 2 DV.
I also want to compare both levels of the 3 IV and 2 DV between multiple ethnic groups.
Is a three-way ANOVA all I need?
Relevant answer
Answer
Multivariate analysis of variance (MANOVA) is used to assess multiple dependent variables (DVs) concurrently. MANOVA is an extension of the analysis of variance (ANOVA), which is used for only one DV. The following could be a good read.
Frost, J. (2018, November 13). Multivariate ANOVA (MANOVA) benefits and when to use it. Statistics By Jim. https://statisticsbyjim.com/anova/multivariate-anova-manova-benefits-use/
Good luck,
  • asked a question related to Statistical Analysis
Question
4 answers
Dear Colleagues ... Greetings
I would like to compare some robust regression methods based on the bootstrap technique. the comparison will be by using Monte-Carlo simulation (the regression coefficients are known) so, I wonder how can I bootstrap the determination coefficient and MSE.
Thanks in advance
Huda
Relevant answer
Answer
Here you can follow:
1- draw the bootstrap sample (size m)
2- Estimate the coefficient by you need from this sample by using the (robust method)
3- find Mse for coefficient
4- Repeat the above bootstarp process B times, here you can estimate Mse for coefficients by taking the average, here you will get the estimation for bootstarp Mse
5- This loop can be repeat N times by Monte-Carlo Simulation
  • asked a question related to Statistical Analysis
Question
6 answers
I am trying to understan if there is a latitudinal cline in the degree of melanization of a frog, but I am not quite sure of which would be the best statistical analysis to perform.
Relevant answer
Answer
Selection of appropriate statistical method depends on the following three things: Aim and objective of the study, Type and distribution of the data used, and Nature of the observations (paired/unpaired). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6639881/
  • asked a question related to Statistical Analysis
Question
7 answers
I am looking for a biostatistician who is interested to work in collaboration with us for various research projects., should have experience with meta-analysis and systematic reviews?
if you know someone, please recommend me. thanks
Relevant answer
Answer
Did you do a previous systematic review?
If I can help you with something, contact me.
Marlene
  • asked a question related to Statistical Analysis
Question
3 answers
I want to optimize the experimental conditions for a qualitative (pass / fail) type of test. Kindly suggest what type of
1. design of experiments can be used to optimize such tests
2. statistical analysis can be use to check validity of the test results
Examples/literature references in the replies will be highly appreciated.
Relevant answer
Answer
If you explain more, you will get better answers.
  • asked a question related to Statistical Analysis
Question
4 answers
I am conducting a program evaluation plan for an outpatient pulmonary rehab program and am having trouble determining the statistical analysis methods to use. The following process and outcome evaluation questions I have are:
  1. How many days, on average, does it take a participant to enroll in class following referral/evaluation?
  2. What was the average number of initial evaluations conducted per week over the last year?
  3. What was the maximum class size over the last year?
  4. What percentage of program participants attended the pulmonary rehab program over the last year?
  5. What percentage of program participants are readmitted to the hospital within three months of completing the program?
  6. In what ways do the participants feel they are benefiting from program participation based on feedback from the feedback surveys?
Any help would be appreciated in determining the appropriate statistical test to use as well as the phrasing of questions. Thank you!
Relevant answer
Answer
Hello Arriana,
Most of the questions, as posted, sound to me as if ordinary descriptive statistics (and not a hypothesis test) would apply:
1. Days to action is frequently a skewed distribution. I'd suggest the Tukey five-number summary (minimum, lower hinge/quartile, median, upper hinge/quartile, maximum).
2. Mean & SD would suffice, unless the distribution is odd (as might be the case if these were rare events), in which case, 5-no. summary again.
3. Just report the maximum value; it is what is is! However, some sense of the distribution would be helpful, so 5-no. summary or histogram.
4. Simple descriptive value of the computed proportion.
5. Ditto.
6. How to summarize this depends on the survey structure. It could be as simple as reporting percents of cases who chose option "X"; it could be as complex as thematic analysis of open-ended or semi-structured interview responses.
If that still leaves you puzzled, I'd recommend trying to reach out to a local expert in evaluation/statistics to consult in deeper detail so that any features that you didn't mention in your query, but which may be germane to appropriate choice of analysis/reporting could be addressed.
Good luck with your work.
  • asked a question related to Statistical Analysis
Question
6 answers
warm greetings to all Drs , Mrs.
did any one have and idea what supposed to do in statistical analysis for research , I have a sample composed from 135 patients , each patient I got 8 categorical variables and 8 quantitively variables , 135 patient grouped to 3 groups class I , Class II and Class III by one parameter called ANB and 3 groups Short , Average and Long face by another parameters called FMA , also two groups be gender Male and Female , I did the normality test for 8 quantitative variables and I got some of variables statistically significant and others not , now my question is how can we conduct normality test for categorical variables especially I define the variable as string , did I need to change it to numeric ?
second question did I need to conduct the same test for statistically significant and statistically non significant variables in normality test ? I mean parametric and non parametric test ? which one I should choose according to p value
if I need to conduct non parametric test how can I do it if my data composed from three groups .
I'm so sorry for my English language , and i hope anyone have any helpful source just post it here .
thank you all.
Relevant answer
Dear Ahmad,
1) You can not do any analysis with string type of data. you should attach them a number (simply add a new variable, and then give them numbers).
But, for testing normality of data, some consider categorical data as not normally distributed, so do not conduct any normality test.
2) Some papers did normality tests, and some did not. but if you want to do normality test, so just simply choose parametric test for normally distributed variables and vice versa.
3) Based on your research aims, it can change. For example, imagine you have three groups, and you want to looking for significant difference between the mean values. in this regard, you should choose one way ANOVA (parametric test) or Kruskal-wallis (non parametric). because you have three groups.
If you can be little bit clearer about your research framework, I'll help you efficiently.
  • asked a question related to Statistical Analysis
Question
7 answers
Dear Experts,
For my research, I have collected temperature values of 15 locations in an urban area (each point 6 data) and repeated data collection 6 times. I calculated the mean valess of each point in each time (15 mean values*6 times), then I calculated the mean of means (mean of 15 mean values*6 times) and finally, I got 15 values. I used these final values for statistical analysis such as t-test, Person and regression analysis. my results are good and meaningful to address my objectives. is it proper to report these results in a research paper? I am not confident about if it is a proper way to use Mean values for statistical analysis instead of original values.
Thank you for your time and guide.
Relevant answer
Answer
Generally, no its not a good idea. You are at risk of committing the ecological fallacy unless your goal is inference about those means rather than the underlying locations. In balanced cases the means might summarise the underlying locations, but not in unbalanced cases and they'll also ignore error variance and hence give false estimates in relation to uncertainty and statistics such as correlations that depend on the uncertainty for scaling.
  • asked a question related to Statistical Analysis
Question
3 answers
Dear Experts,
It is stated: for 15 pairs of data, with 5% significance level, R2 values should be higher than 0.2601.
how can I know such R2 value for 540 pairs? how can I find this certain value for a different number of pairs?
Is it understandable using SPSS?
Thank you
Relevant answer
Answer
  • asked a question related to Statistical Analysis
Question
8 answers
I'm looking at the effect of the drug on lung function. What test should I perform for pre and post control vs pre and post treatment? Control group had measurements taken before and after recieving placebo and same for the tratment group, however they were administered a drug. Different subjects were used in control and treatment groups.
I'm very thankful for help.
Relevant answer
Answer
Hello Sandra,
It depends on how your DV (lung function) is quantified. If you can consider it as a continuous variable of interval strength or better, then it sounds to me as if a one-way (treatment vs. control), ancova would be a logical option. Pre-measures would be used as the covariate; post-measures as the DV.
Good luck with your work.
  • asked a question related to Statistical Analysis
Question
7 answers
I want to develop a relationship between induction hardening process parameters with case depth. Keeping all other variables constant such as quench delay and quench time, I want to analyze the effect of voltage percentage and heat time on case depth. Any recommendation of any statistical analysis will be highly appreciated.
Relevant answer
Answer
What are the units of measurement for the variables you use? The approach suggested by Alessandro Giuliani seems to be appropriate, but in order to obtain linear relationships you may need to apply data transformations. Most likely, voltage percentage should be transformed using logit-transformation and heat time and case depth using log-transformation, but some experiments are needed.
  • asked a question related to Statistical Analysis
Question
4 answers
I have two dependent variables.
These two are
before starting the program - "Starting Reading level" after starting the program - "Ending Reading Level". (A pre-test and post-test scenario).
After that I have one independent variable (the independent variable is "startdate")
with three groups - Cohort september 2019, Cohort february 2018, march 2018.
I would like to know which test is best for comparing the mean of the starting and ending reading level factored by each different program.
Relevant answer
Answer
I would use the ANCOVA, i.e. a regression:
post = pre + groups (in that order)
ANCOVA in R:
ANCOVA in SPSS:
  • asked a question related to Statistical Analysis
Question
10 answers
Hi
We have two time series dataset. For example mean daily temperature from two stations. Each of them have 30 data for a month.
T1={ t1 , t2 ,... , t30 }
T2={ t1 , t2 ,... , t30 }
Now we want to calculate the similarity between them. Some people may suggest correlation coefficient for this task but I think we use it when we consider the relation between datasets not similarity.
Is there any index to measuring the similarity?
Thanks
Relevant answer
Answer
Cross Recurrence or Joint Recurrence are the mot general metrics for correlating time series. The first is phhase dependent the other not.
  • asked a question related to Statistical Analysis
Question
3 answers
Hi all. First of all, I'm sorry if this question isn't phrased right or posted correctly. I am quite new to research of this calibre and need help with the statistical analysis part of it.
Let me give a brief introduction to the research that I am conducting. Earlier research has shown that lean inventory management could have a beneficial impact on the financial performance of firms. Now I want to research whether firms that practise lean inventory management have been less affected by the pandemic than those that do not practise it.
My dataset, therefore, is the financial data of US manufacturing firms from 20016-2020, with the last being the year of focus.
I have made some simple graphs (as can be seen in the attachments: Industry Plot) of the aggregated Net Sales per year to show which firms have been negatively impacted by the pandemic.
Now I need to get to the bulk of the research, the actual statistical analysis and this is where I am absolutely lost. Firm performance will be measured in ROA or ROS and Inventory Leanness by the empirical leanness indicator (as proposed by Eroglu and Hofer in Lean, Leaner, Too lean? (2011)). Total assets and percentage growth per year will be used as control variables.
How do I go to conducting this research? What type of model is best to use for this?
Relevant answer
Answer
Besides the rate, you can analyze the net value (original data) too.
  • asked a question related to Statistical Analysis
Question
3 answers
Hi,
I have a dataset with 25 outcome variables (derived from a 25-item scale), and am hoping to do some ordinal regressions with it. To test the proportional odds assumption beforehand I need to create 3 dichotomous split categories for each outcome variable, however this split format will be uniform across each of the 25 outcome items. Is there a way to do this in SPSS so I can 'bulk' apply the cumulative split format to each of the outcome variables, so I don't have to do it manually 25 times over?
I'm not the most proficient at SPSS, so would really appreciate some advice on this, thank you.
Relevant answer
Answer
First, Bruce Weaver is a very knowledgeable SPSS Programmer and I have no doubt that his procedure will work. However 25 outcome ordinal variables seem like an awful lot to me.How many IVs do you have? Are you proposing a multivariate ordinal regression with ,all 25 DVs. ?If not how are you parsing them among the IVS? This seems a giant data set for such a statistical procedure and seems overwhelming to me. I don't do this type of survey work myself so I don't follow the details and perhaps this is standard in your field. I have never had 25 DVs in a data set before and I wonder how all of this goes and will provide interpretative results.If you would please explain and give a research question or so we could see what you are intending with all these variables.. What for example. size is your data matrix and the studied sample. If this is just my lack of experience please forgive me In any case I wish you the best in your research. David Booth
  • asked a question related to Statistical Analysis
Question
11 answers
Dear colleagues,
From the screenshot, you can see my OLS estimations between institutional variables and oil-related predictor variables. My main hypothesis was that oil-related variables have a negative impact on institutional quality (according to Resource curse theory); however, my estimations produced mixed results, giving both positive and negative coefficients. In this case, what should I do? How do I accept or reject the alternative hypothesis that I have already mentioned.? Thank you beforehand.
Best
Ibrahim
Relevant answer
Answer
As you can see from the screenshot, I have mixed results which is the exact reason that makes me confused. Which means that some of my independent variables have positive, some negative impact.
Best
Ibrahim
  • asked a question related to Statistical Analysis
Question
6 answers
Hi everyone! What statistical analysis method would you use if you like me had repeated an experiment 3 times. The experiment consist of control and treated with drug A, B or C. The experiment had 3 replicates per treatment group (3xcontrol, 3xdrug A etc). The experiment was repeated two more times, so in total I have results of three similar experiments (the same setup). Would you go for multivariable ANOVA?
Thanks for the help!
Relevant answer
Answer
Hi,
As far as I could understand you have 4 factors (1Control + 3 factors) to compare, and you have 3 observations for each?
Please let me know,
Greetings,
Teresa
  • asked a question related to Statistical Analysis
Question
2 answers
What methods are best used in the statistical analysis of ultrathin cell sections for transmission electron microscopy? I need to estimate the number of vesicles 50-100 nm in size in different samples. Unfortunately, other microscopic methods cannot be used here.
  • asked a question related to Statistical Analysis
Question
6 answers
1. What statistical analysis can we use for a low sample size (n=3): between 2 groups and between more than 2 groups
2. If we take two samples and run them in technical replicates of say 6, can my n be considered as n=12 (2 samples x 6 = 12)
3. What is the best method to calculate sample size, in case of rare samples
Relevant answer
Answer
Hello again Elizabeth,
If measuring the same cases on multiple occasions does not affect the response variable in any way (e.g., through fatigue, participant reluctance to "do that again," or some combinatorial effect of time period x events followed by time period y events), then you can use a repeated measures approach.
However, the resultant responses are no longer independent (e.g., a participant that measures/scores high on the DV at time x is more likely to score high, comparatively, at time y), you'll need to use a RM paradigm for evaluating the relative power of your design. G*power allows for this (https://www.psychologie.hhu.de/arbeitsgruppen/allgemeine-psychologie-und-arbeitspsychologie/gpower). Alternatively, you could run simulations to determine the relative power (https://webpower.psychstat.org/wiki/, or https://aaroncaldwell.us/SuperpowerBook/).
Good luck with your work.
  • asked a question related to Statistical Analysis
Question
8 answers
We are conducting a study about the correlation of covid-19 psuedoscience theories to the proliferation of COVID-19 positive cases. Which statistical test should we use to correlate the data that we are going to gather from the Covid-19 pseudoscience theories scale (4pt likert scale) to the number of positive cases? (data of the covid-19 positive cases will be from the health departments tally). Thank you for those who could help us in advanced. ☺️
  • asked a question related to Statistical Analysis
Question
4 answers
The plan is to create a 2x2 between subject research design
1. Seperate the subjects into two groups based on the measurement of the independent variable (group identification: high vs low).
2. Same Measurement for 2 dependent variable (pretest) for both group
3. Each subject on each group randomly receive 2 different treatment (T1 vs T2)
4. After receiving the treatment, all the subject take the same measurement for the 2 dependent variable (posttest).
So I have 4 cell based on their group identification (high vs low) X Treatment (T1 vs T2)
Basically, what I want to know:
1. Is there a different/change between pretest and posttest for each cell?
2. Comparing the score of both dependent variable between each cell?
I'm wondering how to do the statistical analysis. Thanks
Relevant answer
Answer
Hello Agrandaiz,
I agree with Jos Feys ' recommendation that ancova is a strong candidate for the analysis. I would modify the model, however, to: post ~ pre + group + treatment + group x treatment, given your description of the study design.
The other alternative is to treat pre & post scores as a third factor (a repeated measures factor), but in this approach the test of interest has to do with whether there is an interaction between group & measure (pre, post), treatment & measure, and/or group x treatment & measure (and not so much the other effects).
As Jos indicates, there are assumptions (including normality, homogeneity, independence of observations, homogeneity of regression) that should be reasonable for the data set in order to go this route.
Good luck with your work.
  • asked a question related to Statistical Analysis
Question
2 answers
The center offers various activities from crisis counseling to resume development. I am proposing a qualitative evaluation that uses unstructured interviews to investigate which of the services offered at the center the teens believe helped them obtain employment or make progress in school. Someone is thinking a much better methodology would be to use statistical analyses to see which activities correlate more strongly with the desired outcomes. Which would be the best choice of design?
Relevant answer
Answer
How can I monitoring C4D in Refugee Comps?
  • asked a question related to Statistical Analysis
Question
6 answers
I will be running an experiment on a pro hormones effect on BMR.
I will be measuring BMR once a week for 6 weeks in a control group and a group receiving the drug. How should I most accurately go about running statistical analysis? T-test?
Relevant answer
Answer
This is a longitudinal analysis of a variable that is not normal distributed, and you will presumably hava covariables to adjust for. I strongly suggest to co-operate with a local statistician.
  • asked a question related to Statistical Analysis
Question
4 answers
Dear all,
I want to have a paper in "water contamination statistical analysis", so please share your comments on which software and which methods can I use for this topic?
Also which parameters should be more focused on this topic?
Relevant answer
Answer
It's not clear what you are trying to do. There is a big literature out there on things like MPN calculation, dealing with censored values above and below limits of detection etc etc.
What are you trying to do?
  • asked a question related to Statistical Analysis
Question
3 answers
I have collected data on Hoolock gibbon population and environmental variables in their respective habita/
Relevant answer
Answer
I agree with David, more information on your research questions would be helpful. If you want to look on population changes depending on environmental factors, anything from MANOVA to general linear models would work, but again, narrowing down your question would help.
  • asked a question related to Statistical Analysis
Question
8 answers
I am doing statistical analysis on some existing data, but my significance result seems to be different from the existing ones. What could possibly be the reason?
Relevant answer
Answer
The difference can be derived from the accuracy (rounding) of calculations.
Since there are no complex calculations regarding t-test, at most, an ignorable difference may occurs.
  • asked a question related to Statistical Analysis
Question
17 answers
Hi, everyone
In relation with the statistical power analysis, the relationship between effect size and sample size has crucial aspects, which bring me to a point that, I think, most of the time, this sample size decision makes me feel confusing. Let me ask something about it! I've been working on rodents, and as far as I know, a prior power analysis based on an effect size estimate is very useful in deciding of sample size. When it comes to experimental animal studies, providing the animal refinement is a must for researchers, therefore it would be highly anticipated for those researchers to reduce the number of animals for each group, just to a level which can give adequate precision for refraining from type-2 error. If effect size obtained from previous studies prior to your study, then it's much easier to estimate. However, most of the papers don't provide any useful information neither on means and standard deviations nor on effect sizes. Thus it makes it harder to make an estimate without a plot study. So, in my case, when taken into account the effect size which I've calculated using previous similar studies, sample size per group (4 groups, total sample size = 40 ) should be around 10 for statistical power (0.80). In this case, what do you suggest about the robustness of checking residuals or visual assessments using Q-Q plots or other approaches when the sample size is small (<10) ?
Kind regards,
Relevant answer
Answer
Some assorted comments:
Your sample size is 40, not 10.
Having a look on residual diagnostic plots is generally a good idea. They help to spot possible problems and may help to refine future experiments/analyses.
That authors do not publish the data is a considerable problem (particularily today, where any kind of supplementary material can and must be made available by the journals!). To my opinion, publishing research als means publishing the data, not just summary statistics. Not making the original data available hinders re-analysis, re-interpretation, and re-use (e.g. to build priors for informed models etc) and is thus hindering scientific progress. See also https://en.wikipedia.org/wiki/FAIR_data. But science seems to be more and more business and career. Then, of course, publishing data is mostly something that shouldn't be done...
However, at least effect sizes are typically not known for new research. You can anyway only make a crude guess - as educated or informed as possible, though. More interesting is the expected variance (the noise). That is notoriousley difficult to estimate (it requires large samples to get a good impression of the noise), but it should be comparable for comparable experimental setups/measurement principles, so there should be data available (but what practically often is not, as you recognized). A "pilot study" to find effect sizes and variances are not recommended. In Germany you don't get an ok for a pilot study that is performed to estimate anything. Pilot studies are feasability studies, and there is a strict distinction to a "comparative study" that aims to estimate or test effects (usually by "comparing groups", therefore the name).
If you base your guess on the effect size on published studies, then be careful: there is publication bias. Smaller effects are more likely not statistically significant and are not published. You will see almost only the "outliers", showing the larger examples of effect sizes that more likely are statistically significant.
  • asked a question related to Statistical Analysis
Question
7 answers
I have the results of a survey that I have been asked to put on the same scale for statistical analysis. The problem is, they don't quite all fit on the same scale. For instance, some questions have the typical Likert-scale style of response like "Not much" to "A lot". Others have more of a Bipolar Likert-scale style like "It has increased" to "It has not changed" to "It has decreased". Others are Yes/No. I'm at a loss as to how to incorporate all of these on the same scale, or if it's even possible. It's been several years since I've taken a class in this, and I'm a little out of my element.
Relevant answer
Answer
Hello Alana,
Yes, composite scores can be made from sets of variables having different quantification. As David Eugene Booth has indicated, you may elect to put them on a common scale before combining. That at least allows for giving equal nominal weights.
The bigger question I see in your query is, should the variables be combined in the first place? You might wish to consider looking at factor analysis of the survey items first, as a way of indicating which ones appear to affiliate empirically.
Good luck with your work.
  • asked a question related to Statistical Analysis
Question
4 answers
I'm working on my thesis right now about relationship between sport heritage values and supporter's sense of place (of the city they live in). I had the idea that I should first measure the general sense of place using a likert scale system of several statement similar to those of Willams and Vaske (2003) and Jorgenson & Stedman (2001).
Then I had the idea to compose statements (also in likert scale format) regarding the sports club (the ''sense of the sports club'' as it were) that I based on sport heritage factors/values I explored in the theoretical framework.
So in short: I want to measure two different sets of likert scale data from the SAME respondent group and then I want to measure if there is a correlation between the sport heritage data outcome and the sense of place data outcome. As in for example: does a high 'sense of sportsclub' result in a higher sense of place compared to a lower sense of sportsclub?
Now comes my question(s): Is this a good idea? and so yes, how to measure this? Which statistical analysis should I use or is it maybe better to make sport heritage statements with only yes/no answers? Or am I tackling it the wrong way? (statistical analysis isn't my strong suit)
Any advice is welcome!
Relevant answer
Answer
You could use item analysis with Cronbach's alpha for reliability on both 'sense of sport club' Likert type items and 'sense of place' Likert type items to define how to calculate 'test' scores for both, and then calculate the Pearson correlations between the 2 test scores.
see:
  • asked a question related to Statistical Analysis
Question
5 answers
I am suggesting a RCT parallel intervention assessing diet on diabetes symptoms. The control group (following a keto diet) and intervention (following an organic keto diet) and I’m measuring participants HbA1c, waist to hip ratio and weight. What is the best statistical test to choose to analyse results please?
Relevant answer
Answer
Several approaches available.anova or t test
  • asked a question related to Statistical Analysis
Question
3 answers
Is it important to select between paired and unpaired tests on the basis of gender also?
Relevant answer
Answer
A agree with Jos. "Paird" means that you have meaingful, defined, unambigious pairs of values. It does not matter by what criterion this pairing is done, but it must be unambigiuous. As soon as you may choose what value "to pair" with another, then it is not paired, and a paired analysis is not adequate.
However, the data may have some correlation structure, meaning that there are subgroups of values that might be more similar to others in their sub group than to values in other sub groups. The experimental factor you are interested in is obviously and hopefully one that defines such sub groups. But there might be other (uncontrolled) factors that could also be used to divide the data into groups that were systematically different. If you don't consider this, this may be the reason for some variance in the data that you cannot explain, making it more difficult to "see" the systematic difference between the groups defined by your experimental factor.
Example:
you are interested in the effect of a treatment, and you have data from a group of people being either treated or untreated (controls). You want the systematic difference between these group means, and you may test that for instance with a t-test.
Now consider you have additional knowledge that gender* has a huge impact on the variable you are analyzing, possible as much or even more than the treatment. If you have data from female and male people, their values will vary a lot, not only depending on the treatment (you are intersted in) but also depending on thier gender. The variation introduced by the gender effect will result in a bad signal-to-noise ratio and larger p-values when testing the treatment effect.
This may be improved if you use a model that is able to consider both, the treatment and the gender effect. This would then be a two-way ANOVA model. And there can be an additional complication: it may be that the treatment effect itself differs between males and females. This is called an interaction of treatment and gender, and this can also be considered in such a model.
If the differences between genders are really not of interest, one may also chose a model that assumes that the gender means are also only random samples from a distribution with its own mean and variance - i.e., to model the gender effect as a random effect in mixed or hierarchical model. This is a bit weired for a factor like gender with levels male and female, but it may be more appropriate if gender is really coded as a social feature with levels like trans, two-spirit, cis, non-binary, fluid, neutral...
It is possible to account for several such covariables in one statistical model.
---
*gender is a social feature (used as gender role, gender identity or gender expression), wheras sex is the biological property; which one is more appropriate to use depends on the field of study.
  • asked a question related to Statistical Analysis
Question
1 answer
Hello. I hope someone can help me with figuring out the latest approaches to doing a survey study. I have a couple of questions as following:
1. What would be a more advanced level of doing analysis for a survey study beyond descriptive statistical analysis and correlational analysis?
In particular, how would one decide between regression, factor analysis, decision tree analysis, and/or cluster analysis? Or something else?
2. What if the data set has many missing values...I read threads about the recommendations but I lack the knowledge and I hope to know the easiest way possible without losing N size. In fact, I'm tempted to the mean substitution despite the bad reputation regarding it.
Ultimately, if the data set can't be managed properly, I guess I should just report the descriptive analysis and the correlational findings...Would this still be publishable, if this is the new-kind topic? Any recommendation for books to read will be a great help. Alternately, if any gold-standard survey study written as an example could be useful too.
Any thoughts would be much appreciated.
Thank you so very much!
Relevant answer
Answer
1. What is your research question?
2. A book that might help you is ‘using multivariate statistics‘ by Tabashnick and Fidell.
  • asked a question related to Statistical Analysis
Question
6 answers
Hello all,
Although the total effect (.18) and the indirect effects through mediators are positive, we have a negative direct effect of X on Y. How can it be possible?
Note 1: The analysis was run via PROCESS model 4.
Note 2: All variables are continuous.
Relevant answer
Answer
It is certainly possible for a variable to have a negative direct effect X --> Y and at the same time a positive effect X --> M on one or more mediator variables that then in turn positively affect the outcome Y. However, in your case, the direct effect (.09) seems small (assuming you're showing standardized path/regression coefficients in the graph). Is the direct effect X --> Y statistically significant? If not, then this may simply indicate that the effect of X on Y is fully mediated (in that case, the direct effect may simply be zero in the population and negative in your sample only due to random sampling error).
Otherwise, you should ask yourself whether a negative direct effect X --> Y is substantively plausible in your study and whether all the other direct paths (X --> M and M --> Y) are also plausible in terms of their sign based on your expectations/theory.
  • asked a question related to Statistical Analysis
Question
5 answers
I once read if i have a single independent variable and two+ dependent variables, i should use multivariate analysis.
But then i read somewhere that multivariate analysis = inferential statistics (where the analysis results generalizing the whole population)
Is it possible to use statistical analysis that won't generalize the results?
Relevant answer
Answer
Yes, it is possible to use statistics (inferential) without generalising your results. This is mostly common in situations where data were collected from samples that weren't selected randomly. If samples were selected purposively, then generalisation is made to the respondents and not the population. The argument is that a biased sample may not contain statistic (without 's') that reflects the population parameter; hence you don't generalise to the entire population in such situations.
  • asked a question related to Statistical Analysis
Question
4 answers
Following a previous question I posted about statistical analysis for a table (image included), I conducted a chi-square test - χ2 (6, N = 317) = 28.78, p < .0001.
As these results were for a 4x3 table, I conducted a post hoc test to determine where any relationship existed. However, I've not conducted a post hoc test before so am uncertain as to how to interpret their residuals (image included from R).
Any help would be greatly appreciated.
Relevant answer
Answer
See the article in attachment and an R script with post hoc tests a residuals plot (at the bottom of the script).
  • asked a question related to Statistical Analysis
Question
3 answers
Hi. I had done a non probability survey to collect information on retirment planning among workers about to retire and those who had already retired. I now want to compare the two populations planning strategies. My variables are categorical include aspects of financial planning, housing, living arrangement, health and so on. Please suggest a suitable method to do this.
Relevant answer
Answer
First, the use of non-probability sample means that any statistical analyses you do will be limited to your own sample and not generalizable to a larger population. So, you might characterize your results as "suggestive" or "exploratory."
With regard to statistics, the fact that you have a two-category comparison means that you do t-Tests whenever you are dealing with an interval variable or a another two-category variable (which would be a t-Test of proportions).
Alternatively, if all your other variables are categorical, then you can represent the data as cross-tabulations and use chi-square.
  • asked a question related to Statistical Analysis
Question
8 answers
I designed a factorial experiment involving 2 explanatory variables (A and B, qualitative). Because I couldn’t achieve the assumptions of a parametric model, I used kruskal.test on the variable to explain (VAR) for A and B like: kruskal.test(VAR ~ A, data = data) and kruskal.test(X ~ B, data = data).
But, I was also interested in the effect of “A and B interaction” on VAR. So, does anybody know if it is right to perform a kruskal-Wallis test on interactions? Here, what I did it with R:
interAB<-interaction(data$A, data$B)
kruskal.test(VAR ~ interAB, data = data)
Moreover, in order to access which level of each variable is significantly different from each other, I used as post-hoc test after the kruskal.test: pairwise.wilcoxon.test(data$VAR, data$A,p.adjust.method ="holm",exact=F, paired=F). The pairwise test didn’t work on the variable interAB and I was wondering what method I should use as post-hoc test for each variable A and B and for the interaction interAB.
Any Idea please?
Relevant answer
Answer
Obviously this is an older thread, but it popped up for me, so I thought I would add a few thoughts.
First, there are methods to conduct a nonparametric analysis for a factorial design. One is the Scheirer–Ray–Hare Test. Another is the aligned ranks anova.
Second, I probably wouldn't recommend pairwise Wilcoxon-Mann-Whitney tests as a post-hoc test. For Kruskal-Wallis, I like the Dunn (1964) test as a post-hoc. For other options, see the tests with functions beginning with "kw" at the following: https://cran.r-project.org/web/packages/PMCMRplus/PMCMRplus.pdf .
  • asked a question related to Statistical Analysis
Question
9 answers
Hello, I conducted a pilot study with limited number of participants n=6. originally planned 20, but due to covid got less participants. I was checking A1C before and after telephone-based intervention (two tailed t-test).
For this small sample should I even check the normality?
Also I did not see any statistical significance, also probably to the small sample size. Should I use descriptive statistics only?
Relevant answer
Answer
Interested
  • asked a question related to Statistical Analysis
Question
10 answers
Is it necessary to test for normality and homoscedasticity of variances before comparison of means in statistical analysis? If yes, how can I do?
Relevant answer
Answer
It depends what you mean by test. Null hypothesis significance tests for checking normality or other assumptions are generally a bad idea. What matters is whether the assumption is violated (it usually is to some degree) but the degree to which it is violated. Looking at descriptive statistics (e.g., the variances or SDs of the groups to see if they are similar) or graphical methods (histograms, residual plots etc.) is a good idea.
To elaborate on the point about tests. With low statistical power the test may not be able to detect violations of the assumptions that are large enough to matter. With high statistical power the tests can detect minor problems that won't have a material impact on the performance of ANOVA. Last, but not least, hypothesis tests of assumptions also have assumptions. These are rarely checked and in many cases is is likely that they are less robust than ANOVA itself.
So yes, check assumptions but don't test them (at least using significance tests).
  • asked a question related to Statistical Analysis
Question
3 answers
If we have two unpaired data sets, out of which one comes under normal distribution and the other one comes under the not normal distribution. Which statistical test should be applied to analyze the data
Relevant answer
Answer
Praveen Kumar K S , non-parametric methods, that, as its name suggest, do not rely on the (parametric) distribution of your data. See for example:
  • asked a question related to Statistical Analysis
Question
8 answers
I have a question about whether statistical analysis to conduct in SPSS.
I have a factor that includes two levels. Also, I have multiple dependent variables with repeated measures.
Which statistical analysis is more suitable? A one-way repeated measures ANOVA with the independent factor (between-subjects) and time (within-subjects) or something else?
Relevant answer
Answer
Olusegun Emmanuel Ogundele how do you come to this conclusion? As Bruce Weaver already pointed out, the decision depends on what the goal of the study is. How can you be sure what he "has to" use?
  • asked a question related to Statistical Analysis
Question
5 answers
Hello!
I have successfully developed and implemented ANFIS in R with the help of FRBS package. Just one thing that is remaining is to visualize the ANFIS network.
Currently due to some constraints because of COVID, I don't have any access to Matlab while working from home. So I was wondering if there is any way to implement it in R.
Relevant answer
Answer
Ritesh Pabari
No I couldn't. I used Matlab instead for ANFIS. It was quite robust. You could view the architecture, rules etc quite easily. And you could also customize membership function with ease.
  • asked a question related to Statistical Analysis
Question
4 answers
Hello,
I have a question regarding statistical analysis of the thickness of thermally evaporated organic thin films. I need to deposit a rather thick organic film (300 nm) through a slow thermal evaporation process. After a few repetitions, it becomes clear that there is always a mismatch between the target thickness and the measured thickness, and the deviation had a normal distribution. In order to estimate the standard deviation of the thickness and run capability analysis, I needed to have a large sample size of these deposited films. In the interest of saving time and material, I have done my repeated qualification runs at a smaller target thickness of 50 nm. While I now have the mean and standard deviation of the process with a target thickness of 50 nm, I'm wondering how to derive the corresponding data for a target thickness of 300 nm. I assume I can simply multiply the mean by 6. But how about standard deviation? Can I simply scale that linearly as well?
Thanks,
Pouyan
Relevant answer
Answer
Dear Motamedi: There is no chance for you to multiply the result you obtained on 50nm 'to get the expected result of 300nm, you need a simple regression coefficient to estimate that value. If you are not well acquainted with this simple statistics, please check that in a statistic book or design of experiments.
  • asked a question related to Statistical Analysis
Question
3 answers
I am conducting a systematic review and metaanalysis of the prevalence of exocrine pancreatic insufficiency after pancreatic surgery.
I have used MetaXL software to calculate a pooled prevalence with 95% confidence interval assuming a random effects model.
Later, I have done a subgroup analysis, obtaining a pooled prevalence with 95%CI for each subgroup. Now I wish to compare those prevalences (more than two) to evaluate if the difference between the prevalence of each subgroup is statistically significant (p < 0.05), but I don't know how to do it.
The software MetaXL doesn't seem to have an option for comparing prevalences.
What is the statistical analysis I have to do? Can you recommend me any software? (I'm a student, so I'd appreciate if the software is open access).
Thank you.
Relevant answer
Answer
You have to do meta-regression to examine if the difference between the prevalence of each subgroup is statistically significant - see the link from the Cochrane handbook:
The Stata software allows one to do meta-regression easily but unfortunately it is not free. R software - completely free and open source - will also allow to do so but it is little complex to understand. You can see the following links to read and understand how to do meta-regression using R:
  • asked a question related to Statistical Analysis
Question
5 answers
I'm doing a paper and need to find a negative or positive relationship between attitudes (5 option likert scale) and level of education (5 options). Do I do chi-squared, spearman's rho or Kendal's tau. Please help.
Relevant answer
Answer
The first thing to clarify is your research question.
  • asked a question related to Statistical Analysis
Question
7 answers
I am wanting to calculate the average trend in maximum annual NDVI in Iceland from 2010-2020 using MODIS MYD13Q1 V6. How would I do this?
I have currently inserted the NDVI bands from the MODIS (which I downloaded from the earth explorer website) into ArcMap for each year and used the cell statistics tool to calculate the maximum NDVI and used the raster calculator to remove any negative values, leaving me with maximum NDVI layers for 2010 through to 2020 but I do not know how to calculate the average trend for this period.
Please can someone advise me on how to do this and also any statistical analysis I could use - I am familiar with R so preferably using RStudio to do this statistical analysis, thanks!
Relevant answer
Answer
Yes. I agree with Lakshmi N. Kantakumar . You can perform your desired analysis (the average trend of maximum NDVI) giving less effort and saving your valuable time. There are sample javascript codes available in the following link
I hope this will give you an idea for your analysis.
Thanks.
  • asked a question related to Statistical Analysis
Question
6 answers
Hi all,
I am working with a sample size of 749, and testing relationship between a continuous DV and multiple ordinal predictors. The predictors are education level ( 4 categories) and income level (4 categories). Additionally, I might use gender (nominal) as a predictor as well. In this scenario, which regression model would be appropriate?
Relevant answer