Science topic

# SPSS - Science topic

Explore the latest questions and answers in SPSS, and find SPSS experts.
Questions related to SPSS
Question
I have a questionnaire with questions that measure one  variable. The questions have the Likert scale format. I want to check convergent or discriminant validity for the questionnaire. How can I carry out it?
Would anybody pls make a refrence to acceptable coefficients for discriminant validity?
Question
research study on students mental health
You can use Pearson's correlation to find out the correlation , and and use ANOVA to comparisons between groups two groups. And perfom paired t test with score of anxiety and depression questionnaire with respondents age group to find anxiety and depression amoung respondent with respect of their age.
Question
Result of inclusion of an independent variable (continuous) when included, among others, in R, says
algorithm did not converge
fitted probabilities numerically 0 or 1 occurred
When done similarly in SPSS gives p values of 0.99 or 1 for all variables and also for the constant.
But when the variable is removed, this gives reasonable-seeming p values for all other variables included (categorical and continuous). The results of p values and estimates are same in R and SPSS in this case.
The relationship of the very independent variable with the dependent variable is:
Accuracy=0.898
Sensitivity=0.769
Specificity=1
p-value<0.0001
It appears that 2 variables contain the same information. I'm attaching a recently accepted paper of ours that had a similar problem and tells how we dealt with it. Best wishes, David Booth
Question
Hello,
I am currently doing an analysis in SPSS using Hayes Process Model Nr. 21. I have one multicategorical moderator (W) and one continuous moderator (Z). Even though my output gives me significant results regarding the moderation interactions of X*W and M*Z, the output does not show me an index for the moderated moderated mediation.
Hayes itself states: “If your model has more than one moderator, an indirect effect may be a function of two moderators simultaneously, in which case no index is provided. “
But how can I then be sure that the moderated moderated mediation worked out if the index is not provided?
BR
Ann-Kathrin
Dear @Meyer Below is a link to tuitorial on on Moderated Moderated Mediation using model 21 in Process Macro for SPSS. Hope it would help your cause.
Question
This is the final model.
Log (odds of discontinuing exclusive breastfeeding) = -4.259+0.850 superior support+0.802 sufficient duration to express breastmilk.
Thank you.
You may be interested in taking this free online course; it includes logit modelling
Modules
1. Using quantitative data in research (watch video introduction)
2. Introduction to quantitative data analysis (watch video introduction)
3. Multiple regression
4. Multilevel structures and classifications (watch video introduction)
5. Introduction to multilevel modelling
6. Regression models for binary responses
7. Multilevel models for binary responses
8. Multilevel modelling in practice: Research questions, data preparation and analysis
9. Single-level and multilevel models for ordinal responses
10. Single-level and multilevel models for nominal responses
11. Three-level multilevel models
12. Cross-classified multilevel models
13. Multiple membership multilevel models
14. Missing Data
15. Multilevel Modelling of Repeated Measures Data
Question
Hi everyone,
This is my first time of attempting to run survival analysis. I have a data on patients with End-stage Kidney Diseases (ESKD). I would want to run survival probability by treatment method (eg conservative therapy vs dialysis), duration of diagnosis etc
Thank you
@dorcas before you start your analysis, always make sure KM assumptions are met in your data.
Best
MB
Question
HI all,
I've got 2 independent sample groups (firm ownership, dummy), and a DV (Pay level, continuous), significance for the t-test is 0,055. I would like to improve the results by introducing controls and moderators in the model.
I wanted to ask how can I check for possible moderation effect in SPSS ( moderators are size, performance, etc) for those groups, and would it still make sense to try, if they already didn't show any significance as control variables with ANCOVA? Thank you
Question
I am operating data for establishing the norms for Likert 7-point scale. I was suggested to use z-norms. But while computing the standard scores in SPSS, I got the value +2.15 and -1.96 (instead of ± 3). And while checking the data, it showed that the data do not follow the normality. In this case, how do I set the norms for interpretation?
More can be found by looking at David L Morgan work in the attached screenshot. Only Morgan can be really trusted here. Good luck, David Booth
Question
My sample sizes are 41 and 12 respectively and are normally distributed, continuous data, and randomly selected. However the means for both sample sizes (I even did a combined sample of 41 and 12) are above the mean score that is being compared to. Both have a standard deviation of around 20. I am using SPSS. My data: administered a survey to two groups (two languages) and language one, 41 people replied and language two, 12 people replied. Thus, my first sample size is statistically significant and my second sample size is not.
I am comparing to a mean of 60 and the sample size of 41 yields a mean of 80 and the sample size of 12 yields a mean of 88. When running a one sample t test respectively on both sample sizes, my significance is < .05 which means H0 is rejected (means are the same in comparison to the compared value). Yet doing a two sample t test yields a significance that is > .05 which means H0 is accepted but this would not make sense since a two sample t test gives me a mean that is much higher than 60. Any advice on how to proceed with statistical analysis?
the two tailed/independent samples t test on SPSS tells me in the equal variances assumed row that the significance is > .05. The row beneath it is equal variances not assumed and there is no value for f or significance. To my understanding, if Levine's says significance > .05 I use equal variance assumed and that same significance is telling me that it is not significantly different to the mean value of 60 since it is > .05. This still does not make sense. In this case what am I concluding in respect to the mean value I am comparing my data to?
First thing: forget all decision on statistical methods based on sample size as the first criterion, it's typically 1) a very very old habit, dating from the before-computers era and 2) a very non-rigorous way to analyze data because it does not deal at all with the real important points: what is the question you wand to answer? How do you model your data? How do you use this model to answer this question? None of these points involves the sample size.
If you *know* (reasonnably) that your data are (close enough from a) normal (distribution), you do not have to bother with sample sizes, especially with such arbitrary rules like « more than 30 ».
Note that comparing t-tests on each sample individually (« one sample ») and a two-samples t-tests is totally meaningless, it shows that you simply did not really about « what is your question » because it answers two very, very different questions: the one-sample t-test answers to « is the theoretical mean from this sample different from 0 » (so it is not really surprising that the test is significant…) [by default; you can compare to another reference value, it does not change the logic], whereas the two-sample t-test answers to « do the two samples have the same theoretical mean » (which is probably your real question, and you can expect either a significant or a non-significant test; please keep in mind that « non-significant » does not mean « H0 is true/accepted », it only means « I cannot prove that H0 is false/reject H0 », which is completely different).
So I would really advise to take contact with a professional statistician locally to deal with your data. Or at least to follow a basic statistics course.
Question
Hello everyone,
I am running a ANOVA analysis including Maine effects and interaction effects. The output shows, that SPSS excludes all main effects from the analysis. How come?
This suggest that you do not have enough information to estimate the excluded parameters, that is they are aliased. Do a cross-tabulation of the predictors and see how many observations there are in each cell of the full tabulation.
Question
Hello all,
I've noticed that there is a lot of material on how to run GLMMs on R, yet not a lot on SPSS.
Can anyone shed any light or material that could be helpful?
I need help on running it, analysing and reporting on SPSS - this is for my MSc dissertation
Many thanks,.
GLMM can be quite complex. It would be easier for you if you can keep it in linear mixed effect model with minor transformations. I suggest you to consider LMM with splines if your outcome is continuous.
Question
Respected members,
kindly assist me in writing and interpreting results of MODERATION (gender, age etc.) .
I used SPSS Split file option and there after used FREE STATISTICS CALCULATORS ( https://www.danielsoper.com/statcalc/default.aspx) to get T values.
Although I have the results but I do not know how to write and interpret for research paper/ thesis.
Kindly give your valuable suggestions on the same.
Thank you
Hello Antriksha,
You will need to elaborate your query in order to get constructive recommendations. It would be very helpful if you were to describe:
2. The variables involved and how they were measured/quantified;
3. The results of any tests that you ran with your data.
Question
I want to conduct a linear regression in SPSS with 4 IVs and 1DV but I also have 2 other variables besides the IVs that I want to use as adjustment variables. The adjustment variables are age and gender. Since I collected categorical data for age and gender, I created dummy variables for each category of age and did the same for gender. What are the steps in SPSS for conducting a linear regression and adding in these dummy variables?
@Baneen You're welcome.
Question
I have extended the UTAUT2 model and I have already developed my model hypotheses. I read about various methods/techniques used such as (Factor Analysis, Partial Least Square (PLS), Structure Equation Model (SEM), Regression Analysis) and tools such as (SPSS, Smart-PLS, Mplus, R, PLS-graph, AMOS). But I am not sure which one to use.
Please I would like some advice about recommended techniques and tools. I am really looking to find out what you consider to be the most efficient method and your rationale as cost and time are limited factors.
Many thanks
smart pls is do better than spss based my experiences
Question
I am working on Abusive supervision as X value, Job stress as Y value and Work Family Conflict as a mediator and Procedural Justice as a moderator. While working on my data set in SPSS regression analysis shows a significant negative effect of moderator and my hypothesis is acceptable. But while cross validating with PROCESS it shows a different results and appears that moderation is insignificant. I am adding the results output of both SPSS and PROCESS for reference.
Different models, different results. Even if moderation and mediation are independent phenomena, the number of terms is different in each model. Hence, in the 'PROCESS' file your moderator is being adjusted for one additional variable. This is what I think...
Try to construct a multiple linear regression model with the 10 terms, and check if you can cross-validate it.
Question
I am trying to test the model in the figures below. That is, a linear relationship between X and Y, with two moderators variables (X and W) in which there is an influence of the joint effects of the two moderators together.
I could use the macro for SPSS called PROCESS, but I am more interested in developing a hierarchical regression with STATA by multiplying the different variables (previously centered).
Has anyone worked with similar models before or recall a similar study I can base to develop the methodological part of this kind of moderation effect correctly?
You could refer to Hayes’s (2018) book, referenced below.
Hayes, A. F. (2018). Introduction to mediation, moderation, and conditional process analysis: A regression-based approach (2nded.). Guilford Press. https://www.guilford.com/books/Introduction-to-Mediation-Moderation-and-Conditional-Process-Analysis/Andrew-Hayes/9781462534654
Good luck,
Question
Dear
I need to convert three independent variables (V1; V2; V3) into categories of a new variable (V (n): Cat1 (v1) .; Cat2 (v2); Cat3 (v3)). I have studied the procedures of Counting and recoding variables but they do not work. Please could you tell me if it is possible to create the new three category variable in SPSS.
Thank you
You appear to be saying that V1, V2 and V3 are dichotomous (Yes/No) variables, and that for each person, only one of them can be a Yes. Is that right? If so, how are they coded? E.g., 0=No, 1=Yes; or 1=Yes, 2=No? Thank you for clarifying.
Assuming you do have one of those two coding schemes, one of the following bits of code might help.
*** Start of syntax ***.
* The following code assumes 0=No, 1=Yes coding,
* that there is only 1 Yes per person, and that
* there is no missing data.
NEW FILE.
DATASET CLOSE ALL.
DATA LIST LIST / V1 to V3 (3F1).
BEGIN DATA
1 0 0
0 1 0
0 0 1
END DATA.
COMPUTE religiosity = SUM(V1*1,V2*2,V3*3).
FORMATS religiosity(F1).
LIST.
* The following code assumes 1=Yes, 2=No coding,
* that there is only 1 Yes per person, and that
* there is no missing data.
NEW FILE.
DATASET CLOSE ALL.
DATA LIST LIST / V1 to V3 (3F1).
BEGIN DATA
1 2 2
2 1 2
2 2 1
END DATA.
COMPUTE religiosity = SUM((V1=1)*1,(V2=1)*2,(V3=1)*3).
FORMATS religiosity(F1).
LIST.
*** End of syntax ***.
Question
Let us start to discuss survival analysis uses particularly in medicine and health sciences.
To which extent it will be helpful...
Its application using SPSS
Question
Hi, in SPSS (vs 17) I am trying to create a before-after plot with lines for all cases, colored based on 3 categories. The data are organised as: 2 lines per case (before and after, labeled as 1 and 2), and labeled based on 3 categories of cases (labeled as 1, 2 and 3, per 2 lines per case). Using line plots I have managed to produce a before-after plot for all cases, but only with a seperate color for each case. I am also managing to create a before-after plot with 3 lines, which are all cases per category combined. Using scatter plots, I am managing to get something resembling a before-after plot, only without the connecting lines between the points, but managing to color the seperate cases based on category 1, 2 or 3. However, as described above, I want to have a before-after plot with lines colored based on three categories. I feel this should not be hard to do, but I'm just not managing. Many thanks in advance.
Use "Graph board Template chooser" in SPSS. There are many options for graphs according to your variables.
Question
I have a data set of particulate concentration (A) and corresponding emission from car (B), factory (C) and soil (D). I have 100 observations of A and corresponding B , C and D. Lets say, there are no other factor is contributing in particulate concentration (A) other than B, C and D. Correlation analysis shows A have linear relationship with B , exponential relationship with C and Logarithmic relationship with D. I want to know which factor is contributing more in concentration of A (Predominant factor). I also want to know if any model can be build like following equations
A = m*A+n*exp(B)+p*Log (C), where m, n and p are constant, from the data-set I have
Maybe you can consider the recursive least squares algorithm (RLS). RLS is the recursive application of the well-known least squares (LS) regression algorithm, so that each new data point is taken in account to modify (correct) a previous estimate of the parameters from some linear (or linearized) correlation thought to model the observed system. The method allows for the dynamical application of LS to time series acquired in real-time. As with LS, there may be several correlation equations with the corresponding set of dependent (observed) variables. For the recursive least squares algorithm with forgetting factor (RLS-FF), acquired data is weighted according to its age, with increased weight given to the most recent data.
Years ago, while investigating adaptive control and energetic optimization of aerobic fermenters, I have applied the RLS-FF algorithm to estimate the parameters from the KLa correlation, used to predict the O2 gas-liquid mass-transfer, hence giving increased weight to most recent data. Estimates were improved by imposing sinusoidal disturbance to air flow and agitation speed (manipulated variables). The power dissipated by agitation was accessed by a torque meter (pilot plant). The proposed (adaptive) control algorithm compared favourably with PID. Simulations assessed the effect of numerically generated white Gaussian noise (2-sigma truncated) and of first order delay. This investigation was reported at (MSc Thesis):
Question
Hello, I'm looking to calculate the Interquartile Range (IQR) for responses using a 7pt likert scale (sample size = 26). For some questions, Im noticing a difference in the Median and IQR when I calculate using Excel, compared to when calculated in SPSS. Any explanations??? I think SPSS may be using a 95% CI and removing the outliers. Is this possible?
Note: I checked a few sources and can confirm I calculated it correctly in Excel
ps. If you upload the data as Bruce Weaver advises, and in R do
for (i in 1:9) print(IQR(x,type=i))
you'll see some of the varieties.
Question
Hello everyone,
I am working on a journal revision. The reviewers ask me to do a mixed procedures analysis because my experiment was a multiple-period task in which a participant repeated a task over several periods, and all periods observations were used in the analysis.
The reviewers also provided a reference for rerunning the analysis. When I was reading the reference paper, the results table is reported as in the picture attached.
My question is how do I conduct an ANOVA or mixed procedure and report results similar to the table attached. Specifically, do I need to conduct two ANOVA analyses to report one for between subjects and one for within-subjects? Or one analysis is enough. If so, how do I find the two Errors (one for between subjects and one for within-subjects)?
BTW, I use SPSS.
Thank you very much for your help!
There is a terminology issue here (and the terminology is confusing). They reviewers might have meant:
1) a mixed ANOVA with a combination of within (repeated measures) and between (independent measures factors). This is relatively easy to run via the repeated measures ANOVA commands in SPSS (e.g., see https://www.discoveringstatistics.com/repository/mixed_2020.pdf ). This is a single model that handles all the error terms etc.
2) They could have meant a linear mixed model (or multilevel model). If you have a completely balanced design with no missing cells and no time-varying covariates then this is essentially equivalent to the mixed ANOVA (assuming a single random factor for participants and a nested design). Generally this approach is useful because you have imbalance or complex random effects that you want to model (and the mixed ANOVA is a special case of this kind of model). The mixed term here is a mixture of random and mixed effects being modelled. Its a highly flexible approach. You can run this in SPSS but it isn't straightforward if you are new to these models.
The table you show appears to be from a mixed ANOVA. Your description isn't sufficient to tell what the reviewers meant.
Question
How can I run weighted logistic regression in SPSS?
Question
Been awhile in here. Does anyone have a good reference for chi-square independent test using SPSS? Specifically an article from a Business/ Management/ Marketing Journal.
Question
I am doing my dissertation on Eating disorders and body dissatisfaction. I want to associate eating disorders, body dissatisfaction and self esteem. Also to compare this with BMI. Which method should I use?
you can fit a linear model or a GLM.
did you look at pairwise correlations as a start?
Question
Hi
I need to use regression models for my research. I used SPSS for linear regression but I want to use univariate and multivariate power regression such as:
Y=a(Xb)
Y=a(Xb)(Zc)
Y=a(XZ)b
and...
where:
a,b,c: model parameters
Y: dependent variable
X,Z: independent variables
Is there any user friendly statistical software to do it?
(I know about SAS or R software, but I think they perform regression by programming)
Thanks
R provides flexible solutions to those problems. E.g., you can simply use the nls() function, which will fit a nonlinear least-squares model.
In case of the nls() function, you provide a formula in the form y ~ a*x + b (in case of a simple linear regression) and some starting values as a list.
So you'll have for example
nls(y ~ a*x + b, start = list(a = 1, b = 1))
for a simple linear model, but you can use nonlinear formulas, too:
nls(y ~ a*x^b, start = list(a = 1, b = 1))
nls(y ~ a*x^b*z^c, start = list(a = 1, b = 1, c = 1))
nls(y ~ a*(x*z)^b, start = list(a = 1, b = 1))
and so on. For very complex models and "extreme" values for the parameters a, b, c, you may have to adjust the starting values.
When people have little experience with programming, it may sound intimidating to "perform regression by programming", but it is way more straight forward than searching the toolbars of programs such as Excel. Doing things "by programming" also comes with the advantage that you can "program" the complete workflow and apply it to new sets of data.
Another programmatic solution would be to use functions from the scikit-learn module in Python (which offers seemingly unlimited possibilities), but R is easier to just jump into, in my opinion, and widely used in statistical analyses.
I'd encourage you to give R a try, if you haven't done so yet...
Question
Hi everyone,
I wish to perform an analysis of an experiment with a dichotomous independent variable and a continous dependent variable and two moderators that are measured with a 5 point likert scale in SPSS. I have little experience with experiments in statistics. I have performend an independent t-test to analyse direct effect between the dependent and independant variable, but am now unsure how to proceed with my analysis of the the two moderator effects.
My understanding is that to analyse a moderator in SPSS I would need to create a new variable where i multiply the independent variable with the moderator variable to get the combined effect. However, I'm not sure how this would work when the independent variable is a dichotomous variable.
Another way to do this analysis that I thought of would be to change the moderators variables into dichotomous variables (e.g score 1-3 would be category 0 en 4-5 would be category 1) and do another independent t-test on the dependent variables with each of the moderators.
Can anyone point me in the right direction?
You can run this as a moderated multiple regression problem with the IV and moderators as predictors. Enter the product terms between each moderator and the IV. You can also include the three-way product of moderator1*moderator2*IV.
Question
I'm looking for some advice on how to run a mediation analysis with multiple predictor variables in SPSS. I have downloaded Hayes PROCESS macro to assist me with the mediation analysis but it appears that it can only accommodate one predictor variable at a time. I know there is a MEDIATE macro that was available before PROCESS but I don't know where I would find resources/guides to help me with the MEDIATE macro for SPSS as PROCESS has taken it's place now. Should I run three separate mediation analyses (one for each predictor variable) or is there a way for me to run a single mediation analysis with the three predictor variables included?
Question
This is a call for PhD student volunteers to participate in a Qualitative Interview. Please write to [email protected] or just make a comment if you can participate. I will send further details in a direct email.
Brief: Doctoral students require a high amount of cognitive, emotional, and personal competencies to complete and achieve the required degree qualifications. This research is to find out research is about Student perspectives about the research process and the impact of software application tools such as Mendeley, AtlastTI, NVivio, SPSS or any other tolls and their impact on individual productivity.
PhD is not an easy task but worth doing it. You will meet ups and downs in the process but I believe they are to strengthen us alongside the journey. Having good modeling skills using different software is a plus to the PhD student.
Question
Hi
As you know, for using nonlinear regressions in statistical software like SPSS or Minitab or codes, you need to determine a start point(start value or initial guess) for regression's parameters. Actually you need to choose an optimum start value for parameters to achieve the best nonlinear equation.
How can we determine the optimum start value?
Is there an other way (the other software) to use nonlinear regression regardless to parameter's start value?
NO. But a numerical analyst would do plots and look for approximate values of the regression coefficients. As my old numerical analysis professor used to say graphs tell you lots and lots of cool stuff.
David Booth
PS if you had an optimum start you wouldn't need anything else would you?
Question
Hi!
I'm currently working on my dissertation - it's on the gender differences in moral reasoning behind moral dilemmas associated with cybercrime. Each moral dilemma has 3-4 open questions where each participant had to write an answer and their reasoning behind the answer. I have coded these answers via themes/insights and have calculated the percentages behind these themes.
It is worth noting, that each question allowed multiple answers, therefore leading to participants having two or more codes associated with their answer.
My leading professor asked me to analyze these percentages to seek any statistical differences. My only lead is a Chi-square test, however, it probably doesn't seem the right choice because:
1. Each question is not limited to a yes/no answer, there are answers like: I don't know, or answers providing alternative solutions, thus omitting the aforementioned moral dilemma.
2. Participants often answered with multiple answers, that is they could answer yes and provide a reason but later on in their response answer no and also provide a reason.
What I need to know if there is a way to analyze statistically this data, what is more, if it's even possible to somehow analyze the percentages I've calculated of this qualitative data?
Nkululeko Fuyane and Michael Felix Noel The question of whether "quantizing" qualitative data counts as mixed methods has seen quite a bit of debate. For better or worse, all the major qualitative data programs now call their ability to export codes as spreadsheets a form mixed methods, so that creates still more confusion.
Question
I want to compare the performance of two diagnostic tests (binary yes or no) on the same population, first, I want to compare if there is a significant difference in their ability to predict my specific outcome/disease and second, I want to identify if there is a significant difference in the distribution of different characteristics between the positive diagnostic test results (both numeric and binary). Can this be done on SPSS? and if so under which test?
For the first question, it is mainly a matter of "True and false positive rates". Have a look to the paper in the link as an example, but you can search more specific papers.
For the second question, check the previous answer.
Good luck
Question
I am using the PROCESS Macro to conduct a moderation analysis. To visualise the significant interaction, I copied and pasted the necessary text and created a new syntax. The scatterplot was produced but I am unable to add fit lines at the subgroups (please see screenshot of error message attached).
Does anyone know why this might be? I have had a browse online and have seen a lot of mention about re-coding categorical variables when faced with this problem. However, in my case, the variables included in the creation of the scatterplot are all continuous.
You may wish to look at the material from the attached search. The plots in R should be fairly easy if that's all you need to do. Best wishes, David Booth
Question
I recently updated to SPSS 27 on Mac OS X 11.4 and I am having difficulty installing the custom dialogue box for Andrew Hayes' PROCESS Macro. I also recently installed the fix pack from IBM to 27.0.1.0. When I attempt to install, I get the attached error message.
I saw that others identified an error in the code for the custom dialogue box (https://www.ibm.com/support/pages/process-macro-v35-custom-dialog-returns-error-about-invalid-character-ibm-spss-statistics-27-or-subscription-release), but I'm not even able to open the code to examine it without getting an error.
I'm assuming this is an SPSS issue rather than a PROCESS issue. Has anyone else encountered this?
Susan Murphy wrote:
I have to say the answers telling you to go to R are not helpful.
If I understand you, Susan, you're saying that something like the following (made up) exchange is not helpful? :-o
Q. How can I perform task X using stats package Y (where Y is not R)?
A. Use R.
I suppose we must try to be patient with R-evangelists. They are just trying to save our statistical souls, after all. ;-)
Question
I feel like maybe this question is so easy that it's hard and I keep doubting every choice I make, so I thought I would just finally ask online. I have one dependent variable with three levels: percentage of caches a bird make in one of three different types of substrates (sand, gravel, or other). My independent variable has two levels (the three conditions the birds are in whilst caching).
I pretty much know my pattern of results and I have figures drawn up and everything (I can post a small subset of the data if that would be helpful to people in answering the question), but I cannot, for the life of me, figure out if I have chosen the right test to analyse this data and I keep having doubts every time I make any progress.
I'd ran a similar study not too long ago that was almost identical, except the independent variable only had two levels (gravel and sand). Therefore, I could just run a Friedman ANOVA (since the data violated normality assumptions) for percentage of caches in one or the other substrate.
Right now the only solution I have been able to come up with for this is running multiple Friedman ANOVAs, but, as mentioned, I am really starting to doubt this. This is primarily because--unlike when I only had 2 levels in my DV--, I cannot just run a test for one of the levels (e.g. in my first experiment, if 25% of the caches were in sand, you automatically knew the other 75% were in gravel; however, for this experiment, if 25% are in sand, that only means that 75% are in gravel or "other" and you do not know how it is divided up between the these two).
I know there are more advanced forms of analysis as well, like mixed models, but I've been specifically advised against doing those in this particular instance.
It's just that I keep writing and writing, then I get freaked out about my stats, so I do a bunch of research, then I try to re-calculate and re-write….and it's getting to the point where I can't do that anymore. So, I kind of need to know now if I need to change my method of analysis.
Question
In my research, I've made a following equation (with a demographic dummy variable Czech = 1, Dutch = 0): DV * Czech = b0 + b1*Czech + b2*Czech ... b10*Czech.
I can simply interpret the b1...b2 for the Czech demographics, however, how can I interpret the variables for the Dutch population (Dutch=0)? I would need to use only the intercept? Or do I have a mistake in the model equation (the DV should not be an interaction term)?
In DV = b0 + b1*Czech, b0 is the mean of DV for Dutch, and b1 is the mean difference in DV between Czech and Dutch.
Czech*DV on the left-hand side of your equation makes no sense. Theother terms (b2*Czech, b3*Czech,...) make no sense either. This whole part of your right-hand side could be written as b0 + (b1+b2+b3+...)*Czech, and the sum of the coefficients can be (andmust be) treated as a single model coefficient (i.e. as the mean difference in the DV between Dutch and Czech.
An interaction would be modelled between two difference predictors by their product. In your example I see only Czech being a predictor. Another one might be, say, Age. Then the model
DV = b0 + b1*Czech + b2*Age + b3*(Czech*Age)
would be a model where:
b0 is the mean DV of Dutch at age 0,
b1 is the mean difference between Czech and Dutch at age 0,
b2 is the mean change in the DV per year of age for Dutch, and
b3 is the interaction of Czech and Age, i.e. the difference in the mean change per year between Czech and Dutch.
In this example, the coefficients b0-b2 are not particularily usefulpractically, but required by the model to estimate the inteaction. If the Variable Age is centered at age a, then b0 and b1 would refer tosubjects at age a.
Question
What is the best method of checking normality through SPSS before conducting a regression?
Normality can be checked with a goodness of fit test, e.g., the Kolmogorov-Smirnov test. When the data is not normally distributed a non-linear transformation (e.g., log-transformation) might fix this issue.
Question
Indipendent variable:
1) no =0, yes =1
2) no =0, yes =1
3) no =0, yes =1
4) no =0, yes =1
Mediators:
1) no =0, yes =1
2) no =0, yes =1
Dependent variable:
1) no =0, yes =1
Using SPSS, I want to test the mediation effect with the help of multivariate logistic regression.
Is there any method except the way proposed by Baron & Kenny?
As there are no latent variables, I'm not sure the PROCESS approach is necessary. It gets you a CI for the indirect effect, but you can all the path coefficients straight from SPSS logistic regression. A significance test of the indirect effect is simply the joint test of significance for the ab path.
i.e., a x b = indirect effect and if the a path and b path are both p < .05 then the indirect is significant (with alpha = .05; as if either path is NS the overall indirect effect is logically NS).
Also as you have no latent variables then you could do this with piecewise SEM. This would be pretty easy to set up in R. See
I'm not sure if there is an easy way to do it in SPSS.
Question
For my thesis research in SPSS, I asked respondents to evaluate a company based on five items - affordability, quality, sustainability, trendiness, ethical behavior (5-point Likert scale, where 1-low and 5-high). I would like to ask, if it is better to do factor analysis out of the items or do a mean value out them. Any help is welcome! Thank you!
Factor analysis is used to treat large quantity of data to reduce the dimensionality of the problem. In your case, you have only five variables. I think factor analysis is not suitable for your data.
Question
Does the assumption of normality need to be met for a mediation analysis in PROCESS MACRO in SPSS?
Como menciona el propio creador de la macro PROCESS, “cuando aprendemos una nueva estrategia analítica cambia nuestra forma de acercarnos y reflexionar sobre cuestiones teóricas y también nuestro modo de pensar sobre cómo probar o contrastar hipótesis. Y todo gracias a disponer de pronto de métodos analíticos que nos abren un nuevo abanico de posibilidades antes desconocidas Muchos de los adelantos científicos en las últimas décadas surgieron en mayor medida como consecuencia de innovaciones de tipo metodológico que como resultado de innovaciones en aspectos teóricos".
Question
I am unable to find methods how to import, save and impute labelled SPSS file on R..
Daniel Wright yes you are right , Dr Wright. I'm so sorry. just i wanted to help my friend's problem . i'm a master student , I could not express what I meant very well .
thank you for correcting my mistake , professor.
Question
I used binary logistic regression to model the behavior of drivers' stop and go decision in dilemma zone. Now I have to estimate the elasticity of variables (corresponding change in outcome probability on changing one unit of X). My variables are of continuous and categorical nature. Any help would be highly appreciated.
P.S: I have uploaded the screenshot of one of the papers where author has calculated the elasticities.
FGS try the videos or even z-library for some examples
Best, D Booth
Question
We are investigating the impact of online learning environment on sustained attention and student engagement among undergraduate students (Year 1-3).
We had the participants watch a lecture video and answer multiple choice questions related to the content of the video. We also had them fill an engagement questionnaire which had questions about their involvement with peers and various university related activities.
Hypothesis I - 1st year students are able to focus more over a long period of time than 2nd and 3rd year students.
Hypothesis II - 1st year students actively participate and are more engaged with peers than 2nd and 3rd year students.
My questions are:
1. What statistical test would be most appropriate?
2. How do I set a value to prove significance...for example they had 13 MCQ questions, how do I know what number of questions they need to have answered correctly in order to say they paid more attention?
I'm very lost, i'd appreciate your help and any remarks that could provide me some clarity.
Are you familiar with the expression, "getting the cart before the horse"? I'm afraid that's what you're facing. If so, that's not an impossible situation, but it can be a daunting one. Deciding what statistical analysis you will use is normally part of the research design, so you are sure that the data you gather can be analyzed in a way that will answer your research questions. that is normally done before you begin any data collection. I agree with what MOST of the others have recommended, but you cannot apply ANOVA, for example, unless you have quantitative data, and you did not describe the data you received from the surveys. If you solicited qualitative answers to the survey questions, you cannot apply quantitative statistical analysis to it. You will have to quantify that data or apply qualitative data analysis methods to it and have a blended study. There is nothing wrong with having a blended study, but is is more complex than a qualitative or quantitative one. You may need someone on site who can actually see what you did to help you sort it out. I wish I were there! I'd love the challenge!
Question
Good day everyone.
i need guidance regarding how to calculate correlation coefficient of two continuous variables; one being predictor and other being outcome variable..
looking forward for a quick response.
Thank you..!
The Pearson sample correlation coefficient is usually only defined for two one-dimensional, discrete samples of the same length. However, we implemented an m-dimensional relative continuous Pearson sample correlation coefficient, and you can find it in our technical report "Technical Reports Compilation: Detecting the Fire Drill anti-pattern using Source Code" on page 84 :) You'll find the implementation in R there as well.
Question
I am trying to do an analysis of the Job Demands and Resources Model (JDR) with a one mediating variable using the Hayes model.
However, I am not sure how I can do an analysis of one big model of both demands AND resources (Multiple independent variables) and Outcomes (dependent variables). Motivation is the mediating variable.
@Kristina Below attached is our paper based on JD-R model with mediation effect. Hope you find it helpful.
Question
Hi, I'm currently writing my masters disso and I thought this community may be able to give me a few different perspectives!
Im using secondary data from the UKHLS and doing a cross-sectional panel analysis, to measure how age/sex/ethnicity and personality affect an individuals subjective well being over different points in time. The same respondents are included in each wave. Due to my IVs being a mixture of time variant/invariant how would i run a regression analysis on SPSS? (we have to use SPSS).
So far Ive changed my panel data to wide format and created dummy variables for each of my IVs.
I would appreciate any advice on what the best way of doing this would be, my supervisor also mentioned clustered standard errors?!
Many thanks!
My two cents. Why use SPSS? Stata and R are probably better choices. R is free. There are a ton of books in the z-library on.panel.data in these two packages with.lots of examples.. Go for it. It's your research. Bruce Weaver is excellent SPSS programmer but he tell s me he uses Stata. Download R it'ss free. I believe Stata has a student version you can download free. Get a couple of books with lots of examples and go.for.it dont forget there are lots of videos available by Google search. As well
Best wishes, David Booth
Question
I am using SPSS version 22.
Hello Reine,
The answer is a qualified "yes." Have a look at this straightforward discussion by Paul Allison: https://statisticalhorizons.com/iia
Question
Hi,
For my thesis project, I want to fit a quadratic model on each individual separately. For example, I have variable X1 and X2 (e.g., price1 and price 2) and variable Y1 and Y2 (e.g., Reaction Time1 and Reaction time 2), and 75 participants. Each participant conducted X1 and X2, and it is expected that the RT distribution is a quadratic distribution. Moreover, the expectation is that the peak of the distribution in X2 is on a different spot than on X1.
I want to estimate for each participant their RT distribution on X1 and X2 separately, and compare the outcome parameters with a statistical test.
Does anyone know how to analyze this problem in SPSS or in R?
Thank you very much!
I am sorry but you goal doesn't make sense. Sample sizes of one don't work in regression. Consult with your advisor unless s/he suggested this. In that case consult a STATISTICAL consultant. Best wishes, David Booth
Question
I have 7 variables of different food samples, and I have 2 CFU values of each, for Day One and Day Two for each sample.
What SPSS tests should I apply to test them?
Hello Fatima,
So, does "food sample" mean different food types (e.g., poultry vs. egg), or does it mean you have seven total samples of one food type?
If different food types, how many sample batches/replications for each type did you create and monitor for the two days?
One food type/seven samples: dependent t-test would allow you to determine whether there was a significant change in measured CFUs from day 1 to day 2. However, with as few as seven samples, the statistical power of this comparison will not be very good, unless the magnitude of the effect is large.
Seven food types/unknown number of samples: A one-between (food type), one within (day) repeated measures anova would allow you to check for whether food type mattered, whether day mattered (both as main effects), and more important (as long as you had at least two samples per food type), whether there was a food type by day interaction. The same consideration about number of samples per food type and statistical power will apply.
Question
Hi
I am currently doing a research on service quality on customer satisfaction and loyalty . I am currently on the planning and analysis results using SPSS but I am a little confused with regards to regression analysis . What would be the best steps to follow to present data ( see file attached) to show Regression analysis of relationship between service quality and customer
satisfaction.
When I am on SPSS I am not too sure where to put control variables as linear regression option only has option for dependent/independent variables . I am not too sure what model 1/2 means
Hello Martin,
The simple answer is, model 1 includes just the so-called control variables as IVs (independent variables). Model 2 adds the target IVs to that set.
The extent to which model 2 (and therefore the target IVs) substantively add to the explanatory power of model 1 (just the control variables) may be determined by the difference in R-squared (proportion of variance in the DV that is explained/accounted for by differences in the IVs for a model) between model 2 and model 1. A "big" difference would suggest that the target IVs help to explain differences in the DV above and beyond what the control variables are able to explain/account for.
This approach is sometimes called hierarchical regression.
Question
I am studying the impacts of receiving a certain treatment (kind or mean) on the motivation. I expect that the treatment influences the motivation through three mediation variables. So my conceptual framework looks somewhat like this: X -> Z1, Z2, Z3 -> Y.
To explain it with an easier example: I want to know the effects your gender (male/female) has on your life expectancy, but I study it through the mediating variables height, weight and smoking (all on a likert scale). I expect that the gender affects the mediating variables, but they in their turn have an effect on the life expectancy.
(A little more detailed: I have an experimental design where I use vignettes to reboot the type of treatment someone will receive (the respondents either get the soft treatment or the hard treatment, not both and it is randomly assigned). Then I test my mediation and Y variables through survey questions with a 5 point Likert-scale.)
Which test do I use for this in SPSS or does anybody have an idea how to format my conceptual framework better?
To determine which statistical test to use, you need to know: whether your data meets certain assumptions.
Question
What if the Cronbach's Alpha of a scale (4-items) measuring a control variable is between the .40 -.50 in your research. However, the scale is the same scale used in previous research in which the scale received a Cronbach's Alpha of .73.
Do you have to make some adjustments to the scale or can you use this scale because previous research showed it is reliable?
What do you think?
Hello Lisa,
as I suspected. These are clearly not indicator of a common underlying factor. Hence, alpha and every other internal consistency approach towards reliability are inappropriate. For its control function, however, the scale will do its job as it can be regarded as a composite of specific facets. And, yes, each of the facets won't be perfect error free indicators of their underlying attribute but that should not hurt much.
All the best,
Holger
Question
I am currently working on a project that addresses educational communication and technology (ECT) barriers in online education and distance learning in the Philippines. However, according to my internet research, SPSS can only handle 1500 cases or respondents. My project requires over 3000 student respondents with more than 150 variables, which includes sub-variables. Is there any statistical software that can meet my requirements?
Virtually all statistical software packages can handle voluminous contents.
The question should have been about the quickness and the presentability of each software package.
Question
Hi,
I am testing to see whether a person's food choice is influenced by their personality traits (Big 5 - Conscientious, extraversion, agreeableness, openess and neurotism) I have 26 question completed food choice questionnaires where they need to say what is important to them when choosing food to eat on a daily basis. The final answers are split into categories e.g health, weight control, natural content, familiarity, mood, ethics, price. The scale is 4 likert 1) not important at all, 2) a little important 3) moderately, 4 very important. Each category of the food choice questionaire is the dependent variable.
The independent variable is also ordinal I am testing personality traits. Each personality type is split into 5 e.g for conscientiousness, 1) very low, 2) low, 3) Mid 4) high conscientious and 5) very high.
My theory is that for example people placing a high importance on health and weight control are also high in conscientiousness, similarly those who place a higher importance on imporving mood might be higher on the neurotism scale and lower conscientious
What I do not understand is if I have liert scale on both variables, how do I interpret my results?
I have run a test using the importance of weight control as the dependent which is ordered 1. not important at all, 2. a little important, 3 moderately important and 4 very important. The independent variable that has shown significance is the personality trait conscientious. It is showing that people in the high level of conscientiousness so level 4 (5th is the reference which is very high).
My p test is <0.002 with expB of 0.071 CI :0.013-0.385 (P coefficient is -2.642)
I just don't understand if a number <1 on expB means that the likelihood of being in the higher level on the dependent is higher or lower or the other way around. It seems like a very low expB. Could it mean that the likelihood of being in the lower level for weight control is lower?
Thank you
Louise Kjlj, if you have scored the Big 5 Personality Inventory in the usual way, each of your 5 scores is the mean (or sum) of 10 items (some of them with reverse coding). See this document, for example:
I would be surprised if you converted your 5 means (or sums) into discrete 5-point variables.
Regarding how to deal with an ordinal explanatory variable, you could use TEST sub-command for PLUM to generate whatever contrasts are of interest to you. E.g., of you have a 5-level ordinal variable, and you want to contrast each level with the next, you could do this (with OV= Ordinal Variable):
PLUM Y BY OV /PRINT = FIT PARAMETER SUMMARY /TEST OV -1 1 0 0 0; OV 0 -1 1 0 0; OV 0 0 -1 1 0; OV 0 0 0 -1 1.
Or if you prefer, you could generate the codes for Helmert contrasts, etc.
HTH.
Question
How can I insert statistical significance (i.e. F test P value < 0.05) annotations on top of my column bars on SPSS ?
Question
I am conducting a study into the effects of evaluation anxiety on working memory task performance. I have an Independent variable, the task which the individual will complete, observed v unobserved, and the same people complete both tasks, a dependent variable, the scores on the memory task in each condition, and also a covariable, score on the Liebowitz social anxiety scale. I did also include memory but this is not something I want in the main analysis but just to compare the means with anxiety score afterwards. I want to compare the difference between the two memory scores with the anxiety score. Which test should I use? I have been told by my supervisor a one way within ss ANCOVA is the correct way, but I am not sure. Clarification on this would be greatly appreciated. Below I include the analysis I have run. Thank you in advance.
Do you have pre and post test in the study?
Question
Recently, I did a path analysis in SPSS. Before the path analysis, Pearson Correlation were done. Factor A and B are correlated, r=.428, p<.001. However, when I use AMOS to run the path analysis, it showed insignificant effect.
Could it be possible?
Did I do anything wrong?
Why the statistical analysis showed correlation significant but insignificant in path analysis?
Because these are addressing different research questions. What are your research questions? Be specific.
Question
Dear everyone,
I do not know how to figure this analysis out. For my thesis I have the following data:
Independent (categorical) : diet choice
- vegetarian
- pesco--pollo vegetarian
- non vegetarian
Moderator (continuous) : environmental concern
Dependent (continuous) : happiness
Does anyone know how to analyse this in SPSS? I am very lost.
1 = vegetarian (vegetarian + pesco-pollo)
0 = non- vegetarian.
In addition to Andrew Hayes' PROCESS macro (available for SAS, SPSS, and R), you could:
1. Use ordinary multiple linear regression. You'll need to dummy code the IV into two 0/1 variates (e.g., 1, 0 could signal vegetarian on these dummy variates; 0, 1 could signal pesco-pollo veg; and 0, 0 would signal non-vegetarian). Then add the moderator as a third IV, then form two interaction terms (dummy1 X moderator; dummy2 X moderator) and use the happiness score as the DV.
2. Run a simple path analysis model (following the ideas in #1, above).
3. Use ancova (the moderator would be the covariate); again, you'd need to incorporate the interaction terms.
Question
I am working on a follow-up data set in SPSS in wide format for which ~ 400 cases were lost to follow-up since baseline. I am looking to insert the lost cases and their variable information as system missing based on the patient ID.
E.g.
PID0001
PID0003
PID0004
PID0005
PID0007
etc.
to
PID0001
PID0002
PID0003
PID0004
PID0005
PID0006
PID0007
etc.
I am hoping there is a simple solution using syntax so I don't have to do it manually.
Hello Sophia,
Just add the formerly "lost" cases to the bottom of your data set (whether manually or by merging files by "adding cases"), then sort the full data set by ID number. (In spss: Data/Sort cases.../ and use ID as the sorting variable.) Be sure to save the final, sorted set.
Question
Is it possible to run two-way anova in SPSS with secondary data? I only have the mean, standard deviations, median, and mode. I also have the sample size. However, I don't have the original data that was used to determine the mean and the rest.
@Saad Below link discusses how to compute t-test and F ratios using only summary statistics when not having access to raw data. Hopefully, it might be helpful to you.
Question
Hi everyone,
anyone can help me choose a test to compare the change in frequencies nominal variable over two separate measures, for example the change in used ventilation device on time 0 and day 3
Hello Farah Al Souheil. Is "used ventilation device" a dichotomous Yes/No variable? If so, look up the McNemar test, aka., the McNemar Change Test or McNemar's Chi-square. You can see an example using SPSS here:
HTH.
Question
I am using SPSS to run moderation analysis with categorical moderators. How to interpret the output with reference to the levels of the moderator categories?
Moderation is identical to testing for interaction effects, so you can create a set of dummy variables and multiply each of them times the independent variable that they are supposed to moderate.
Question
Dear all.
I want to run a moderation analysis using SPSS, where I have 2-IVs, 1-MV, and 1-DV.
I am new here, do not have much knowledge of moderation. I will appreciate it if you guide both on writing the model, steps involved in SPSS, and the interpretation.
@Ahsen you can use model 1 in Process Macro for SPSS to run Moderation analysis however, it would work only with one IV and DV at a time. You can also use JAMOVI or JASP for Moderation analysis.
Question
Hi everyone,
so it’s well known that OR and p-value can only be determined for 2 modality variables
so if i have one of the qualitative variables as a severity scale how should i make the correlation and determine p-value
thanks;
Question
I made the question to check people knowledge and how much aware they are actually of the terms, so I put only two correct questions and the rest is wrong but now I got 200 responses and many responses selected all, some selected wrong and write both(eg: one Right and one wrong). so I have no idea how to analyze. please help
Hello Nelly,
Unfortunately, hindsight is so much clearer than foresight more often that we would like!
Unless you want to restructure the survey and collect more data, here's one option you could try:
Create an ersatz score, which could range from 0-8 for each respondent. If a respondent does not check an incorrect response option, you award 1 point. If a respondent does check a correct response option, you award 1 point. A "perfect" score of 8 would accrue for someone who checked only the two correct options.
If that's too complex, then simplify the above plan so that only perfect scores are given credit (perhaps, "1") and all others are given no credit (e.g., "0").
Question
Hi,
Kruskal Wallis gives a 2 tailed significance value only. according to this value SPSS decides whether to continue and calculate the pairwise comparisons automatically or not.
Since I have an hypothesis with a direction, I can divide the significance value by two, but in this case, SPSS won't run the post-hoc analysis, and so I'm interested to know how to run only Dunn's-Bonferroni test separatly.
Many thanks
Daniel Wright and Stefano Nembrini answers are absolutely on the point and I missed it, due to my struggle with "divide by 2" problem. The KW test "only" tests if one of the groups is stochatically dominating, so there is no to sided testing, either there is at least one dominating group or not. Like in an ANOVA, either there is at least one group different from the other or not. Therefore, it is already a onesided question. And Stefano's point is worth thinking about, is it really what you want to show?
If you are trying to make statements about differences in medians between the groups, this would only be valid if the shapes of the distributions would be equal/similar across the groups (like MW test).
Question
Noman Soomro, forgive me if I appear disagreeable, but you did four things in your analysis, on page 7, that you refer to as exploratory factor analysis that are not regarded as best practice.
First, you used principal components analysis (PCA), which is not exploratory factor analysis (EFA). Many people use PCA because it is the default extraction method in SPSS under the Data Reduction tab. The statisticians whom I respect recommend using either principal axis factoring (PAF) or maximum likelihood (ML) as the extraction method - within EFA.
Second, you used the Kaiser criterion (eigenvalues > 1) to determine the number of factors to extract. For more than 20 years, this method has been criticised as likely to produce too many factors. The scree test or, better, parallel analysis are two methods recommended as being preferable.
Third, you used varimax rotation. Orthogonal rotations such as that are often criticised as being inappropriate in almost all research involving humans - and, in my experience, they often produce results that are much less interpretable than are results produced by oblique rotations such as promax.
Fourth, by having only 52 participants, I think you have far too few people with whom to run either PCA or EFA.
Again, I don't want to be disagreeable. It's just that I frequently see researchers doing exactly the kinds of things you've done - and obtaining results that are unimpressive and untrustworthy. I think that many researchers simply do what they have seen other researchers do - and that's not necessarily advisable.
In case it's helpful, here are some references relating to the above.
Beavers, A. S., Lounsbury, J. W., Richards, J. K., Huck, S. W., Skolits, G. J., & Esquivel, S. L. (2013). Practical considerations for using exploratory factor analysis in educational research. Practical Assessment, Research, and Evaluation, 18(1), 6.
Briggs, S. R., & Cheek, J. M. (1986). The role of factor analysis in the development and evaluation of personality scales. Journal of Personality, 54(1), 106–148. https://doi.org/10.1111/j.1467-6494.1986.tb00391.x
Costello, A. B., & Osborne, J. W. (2005). Best practices in exploratory factor analysis: Four recommendations for getting the most from your analysis. Practical Assessment, Research & Evaluation, 10, 1–9.
de Winter, J., Dodou, D., & Wieringa, P. A. (2009). Exploratory factor analysis with small sample sizes. Multivariate Behavioral Research, 44(2),147–181. https://doi.org/10.1080/00273170902794206
Fabrigar, L. R., Wegener, D. T., MacCallum, R., C., & Strahan, E. J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4(3), 272—299. https://doi.org/10.1037/1082-989X.4.3.272
Gaskin, C. J., & Happell, B. (2014). On exploratory factor analysis: A review of recent evidence, an assessment of current practice, and recommendations for future use. International Journal of Nursing Studies, 51(3), 511–521. https://doi.org/10.1016/j.ijnurstu.2013.10.005
Hayton, J. C., Allen, D. G., & Scarpello, V. (2004). Factor retention decisions in exploratory factor analysis: A tutorial on parallel analysis. Organizational Research Methods, 7(2), 191–205. https://doi.org/10.1177/1094428104263675
Matsunaga, M. (2010). How to factor-analyze your data right: Do’s, don’ts, and how-to’s. International Journal of Psychological Research,3(1), 97–110. https://doi.org/10.21500/20112084.854
Preacher, K. J., & MacCallum, R. C. (2003). Repairing Tom Swift’s electric factor analysis machine. Understanding Statistics, 2(1), 13–43. https://doi.org/10.1207/S15328031US0201_02
Zwick, W. R., & Velicer, W. F. (1986). Comparison of five rules for determining the number of components to retain. Psychological Bulletin, 99(3), 432–442. https://doi.org/10.1037/0033-2909.99.3.432
Question
I have sample a of 138 observations (cross sectional data) and running OLS regression with 6 independent variables.
My adjusted R2 is always coming negative even if I include only 1 independent variable in the model. All the beta coefficients as well as regression models are insignificant. The value of R2 is close to zero.
My queries are:
(a) Is negative adjusted R2 possible? If yes how should I justify it in my study and any references that can be quoted to support my results.
(b) Please suggest what should I do to improve my results? It is not possible to increase the sample size and have already checked my data for any inconsistencies.
@David If you can suggest any book or published research paper I can refer to because I couldn't find any authentic source on google that can be cited for supporting my results.
Question
Hello,
I am currently running a binary logistic regression on my data in SPSS. However, I want SPSS to provide me with the AIC (Akaike's Information Criterion) as well. Can anyone help me out with this and provide me with the steps that I need to take in SPSS, as I am not very experienced with SPSS.
Thank you!
Hi Tom,
NOMREG will provide this information in SPSS. Your results will be identical to SPSS Logistic Regression as long as your dependent variable is dichotomous. See the following technical note from SPSS:
Sach
Question
Hello,
I am currently analyzing data for my master's thesis.
One question was a ranking question where participants could rank 4 different options based on their perceived effectiveness.
In SPSS I now have 4 different variables for each participant.
Variable 1 shows the first ranked options, variable 2 the second-ranked options, etc..
How can I analyze this Data, to see which option was overall perceived as most effective, second effective, third effective, and least effective?
Question
I have 2 independent groups with approx 40 variables, measured over 4 time periods.
Outcomes are not supposed to be normal, neither were predictors. The only normality requirement is about residuals.
What's important is the type of the outcome variable, namely, first categorical or continuous distinction, and later their variations inside.
ALT for example is a continuous variable, it may be skewed. Follow up day is a count so the link function and modelling will differ than that of ALT.
Question
Hi, I am running path analysis with latent variables. My model fit indices are good, however some of the factor loadings are negative. Also some of the standardized estimate are more than 1 like chemical on N2O is more than 1 (1.60) and topographical on N2O is -1.03.
Is it alright to have negative loadings in the attached path diagram. How can i correct this diagram?
Thanks
Hello Waqar,
beyond what Christian already said, I have some problems understanding the latent variable. It seems that in your model, these are rather categories but not underlying common causes.
If a (measurement) model is substantially misspecified, you'll get bogus effect estimates.
HTH
--Holger
Question
Hi there,
I am currently investigating whether males nurses are more likely than female nurses to be brought before a fitness to practice hearing. I have tried to do a one sample binomial test in SPSS, however the expected results show a 50/50 split in terms of gender. The issue being that the actual proportion of male nurses is 11%. That being the case is it possible to discover whether the proportion in the sample is significant using a binomial approach or is another test needed?
Regards
TK
Why not use a z test to compare the two proportions? This approach takes sample size of the each group into account.
Question
Hello,
I'm researching the effect of a self-compassion intervention on well-being. There are 2 groups (intervention and control) and 3 time points (pre, post and follow-up).
Since I have multiple dependent variables (life satisfaction, positive affect, psychological well-being, optimism, negative affect, depression, stress) I wanted to run a Mixed MANOVA instead of a Mixed ANOVA but can't seem to find how to include multiple dependent variables in Mixed Models in SPSS.
Is this the correct study design and is it possible to run the Mixed MANOVA in SPSS or do I have to run multiple Mixed ANOVAS?
Bruce Weaver thank you for bringing up Thom Baguley post in the other thread!! These are really crucial points and why I do not really like the MANOVA approach. I really cannot come up with an example from my area where I am really interested in the compound variable in the first place and not the univariate analyses.
Question
I have a file of lab data in which every patient has multiple values from measurements taken on different days. It looks like this:
BP1 BP2 BP3 BP4 BP5 date1 date 2 date 3 date 4 date 5 (BP1 was measured on date1 etc)
I'm interested in the date on which the lowest BP value was measured. I have been able to use a command to find what the lowest BP value was and also what date variable it can be found in, so for example: I know that for a specific patient the lowest BP was 75 and that this value is in the BP3 variable for that patient. But I want to tell SPSS that if BP3 contains the lowest value, that it should copy the date in "date3" to a new variable so I have a list with the dates on which the values were the lowest.
Hello Thom,
SPSS offers a simple operator, MIN, which may be used to capture the minimum value from a set of variables; e.g., compute lowest_value = MIN(var1, var2, var3, var4, var5) .
Question
How do I know if the data I collected follows a normal distribution or not through SPSS? And if it does not follow a normal distribution, how do I convert it so that parametric tests can be performed on it?
Very easy and good video to learn about the data collected follows approximately normal distribution or not through SPSS
(Normality test using SPSS: How to check whether data are normally distributed)
Question
SPSS Software Training , looking for recommendation for online training , specifically for legal (EU law) analysis.
Dear Dunja Duic ,
Question
Hi
I have 4 IV (categorical data) - age, gender, ethnicity, job position
and 1 DV (continuous data) - muscle strength
What is the most suitable statistical analysis because I want to find which factor (age, gender, ethnicity, job position) contributed the most toward the muscle strength?
Thanks!
Timo Van Canegem is totally right, in any case age can be used as a continuous variable, with respect to dummy binary variables (any categorical variable with a number = p of categories can be dissected into p yes/no variables) they behave in the same way that continuous variables in regression modeling as aptly demonstrated by Cox (see attached)
Question
I am trying to measure the level of awareness (obtained through Likert-scale and total scores categorized as Low, Moderate, High) of participants regarding a certain topic.
Your original scale seems to provide the equivalent of an interval-level dependent variable, so you can use ordinary regression. If for some reason you do need to break your continuous dependent variable into three categories, then ordinal logistic regression is appropriate -- but note that going from a continuous variable to a three-category variable involves a considerable "loss of information."
Question
I'm doing research about The changes in dietary habits among undergraduate students before and during COVID-19. My research objective is to determine dietary habits before and during COVID-19. I want to know if there are possible changes between the two periods. Does anyone know how to determine this using logistic regression in SPSS?
Several things: 1 multivariate means more than one DV. 2. Your question: add a 0,1 dummy as an IV in your regression. Let 0 be pre and post period be 1. If this new regression coefficient is significantly different from zero then there's a difference.between the two periods. Best wishes, David Booth
Question
Hello RG researchers,
I am a bit confused due to different questions and comments.
Well, I have a single factor containing 11 items (Likert rating). For the EFA, I am using SPSS (maximum likelihood) and I use lavaan and Amos for the CFA. I've got three questions:
1. KMO and Bartlett's tests' criteria are met while the normality tests (Kolmogorov-Smirnov and Shapiro-Wilk test) are not met (they are both significant). So, am I good to keep up the EFA or shall I need to use Satorra-Bentler or Yuan-Bentler adjustments (if yes, what software do I need to use)?
2. Should I be checking the normality for each item or checking the variable's normality is enough?
3. For the divergent validity, I use two other variables aside from my main questionnaire. Do they also need to be distributed normally as well?
Sara
1. To test normality, I recommend interpreting the skewness and kurtosis coefficients instead of statistical tests. In this case, if there are normality problems, parameter estimations can be made with unweighted least squares in SPSS.
2. Multivariate normality test is sufficient.
3. Depends on the statistic to be used. If normality is an assumption of the statistic to use, yes.
Good luck
Question
How do we calculate ( Predictive R-squared ) using ( SPSS ) to find if Regression Overfitting ??
Bruce Weaver Sorry I didn't connect R**2 With the PRESS statistic. Mea Culpa. Thanks for setting it right. Best, David Booth
Question