Questions related to Regression
I have a sample with 138 participants. Only 6 of them reported living alone (%4.3) while the remaining 132 share the household with others (family/partner/housemate etc.)
I am trying to decide whether I can add "living alone" as a dichotomous variable in my hierarchical regression. What worries me is the very low percentage of individuals living alone in the sample. Do you think this would be problematic?
Thank you so much for your answer!
For instance, when using OLS, objective of the could be
# to determine the effect of A on B
could this kind of objective hold when using threshold regression?
This is the final model.
Log (odds of discontinuing exclusive breastfeeding) = -4.259+0.850 superior support+0.802 sufficient duration to express breastmilk.
I have 27 features and I'm trying to predict continuous values. When I calculated the VIF (VarianceInflation Factors), only 8 features are less than 10 and the remaining features range from 10 to 250. Therefore, I am facing a multicollinearity issue.
My work is guided by two aims:
1- ML models should be used to predict the values using regression algorithms.
2- To determine the importance of features( interpreting the ML models).
A variety of machine learning algorithms have been applied, including Ridge, Lasso, Elastic Net, Random Forest Regressor, Gradient Boosting Regressor, and Multiple Linear Regression.
Random Forest Regressor and Gradient BoostingRegresso showing the best performance (Lowest RMSE), while using only 10 features (out of 27 features) based on the feature importance results.
As I understand it, if I face multicollinearity issues, I can fix them using regularized Regression models like LASSO. When I applied Lasso to my model, the evaluation result is not as good as Random Forest Regressor and Gradient BoostingRegresso. However, none of my coefficients become zero when I apply the feature importance.
Moreover, I want to analyse which feature is affecting my target value and I do not want to omit my features.
I was wondering if anyone could help me determine which of these algorithms would be good to use and why?
For my master's thesis I am conducting research to determine the influence of certain factors (ghost games, lack of fan interaction, esports) on fan loyalty. For my statistical analyses, I will first conduct confirmatory factor analysis to validate which items (f.e., purchased merchandise) belong to what latent factor (f.e., behavioral loyalty).
However, I am unsure for my next step. Can I use multiple lineair regression with my latent variables to identify the relationship between the factors and loyalty. The data is collected through a survey of mainly 7-point Likert scale questions. Can I use lineair regression or is ordinal regression a must with Likert scale data?
Thanks in advance for answering!
The correlation between team purpose/trust and work-family synergy is .10 (ns) in my sample (N = 319). In my AMOS structural model, the standardized regression coefficient for these variables is -.23 (p < .001). How do I explain this apparent anomaly?
I want to measure the development during conflict of two countries and ultimately compare the results, however, I am struggling to determine an appropriate model. My data is annual and I have the data for both my dependent variable and independent variables for both countries.
How do I know if it is panel data or time-series data? I personally think it may be simple time-series data, would I then be able to use ARIMA or ARMA models for the regression?
I am running regressions on one country's interest rate spreads vs another, and I am thinking of adding inflation rates as an additional independent variable.
Do I have to transform these into logarithm? And which is more accurate? Log or natural log?
Hi - I would be grateful for some advice....I have conduced a regression analysis using HP and OP as predictor variables of overall affect.
A non-significant model was noted F(2, 109) = 2.13, p = .124. The model explains 2.0% of variance in post-playing/singing overall affect (adjusted R2 = .020).
However, in looking at the regression co-efficients I have the results illustrated in the attached.
I'm confused as to how I can have a non-significant model when one of the predictor variables is significant?
my data has some outliers, so i preferred to use Quantile regression.
do i have to remove the outliers from the data before running the quantile regression?
Please let me know
thanks in advance
As part of my research, I have to analyse 10 years time series financial data of four public limited companies through Multiple Linear Regression. First I analysed each company separately using Regression. The adjusted R square value is above 95% and VIF is well within limits but Durbin-watson measure shows either 2.4, 2.6 or 1.1 etc which signifies either positive or negative auto correlation. Then I tried the model with the combined data of all the four companies. This results in very less adjusted R squared value (35%) and again a positive auto correlation of 0.94 Durbin-Watson . As I am trying DuPont Analysis where the dependent variable is Return on Equity and independent variables are Net Profit Margin, Total Asset Turnover and Equity Multiplier, which are fixed, I cannot change the independent variables to reduce the effect of auto correlation. Please suggest me what to do.
Generally when it comes to assessing the performance of an ANN the most reliable approach is using a test set. However, as you further progress into the future there will be no more test data to use, as your model will provide the estimated values. In this regard, if you want to retrain your ANN using new incoming data you cannot test your adjusted model anymore. On what terms should you choose between two trained models without a test set? Validation error, loss etc. ? I look forward to any suggestions.
Hi my peers,
I have a study with panel data from 2009 to 2018, for 104 companies (1040) observations. My study is causal relationship . I need to highlight the temporal effect on the regression results... How can I do that?
I tried two methods but each produced different result to some extent.
The first was through creating year dummies through command.... tab year, gen (yr_) ,this produce ten dummy years.
The second method was through i.year which produced only 9 years, leaving the beginning year i.e 2009. However, the regression result to great extent unchanged but significant years become less in i.year inclusion.
I dont know what correctly to follow of these or if there is another way that show the temporal effects on the study findings.
Thank you in advance for your help
Given the independent variable (x) and two regulating variables (M1 and M2), if we want to plot a three-way interaction diagram, which interaction items need to be calculated? After obtaining the estimated parameters of these interaction items, which software or plug-in can be used to draw the three-way interaction diagram?
Thanks & Regards
Thank you for your help in advance.
I am working on a journal revision. The reviewers ask me to do a mixed procedures analysis because my experiment was a multiple-period task in which a participant repeated a task over several periods, and all periods observations were used in the analysis.
The reviewers also provided a reference for rerunning the analysis. When I was reading the reference paper, the results table is reported as in the picture attached.
My question is how do I conduct an ANOVA or mixed procedure and report results similar to the table attached. Specifically, do I need to conduct two ANOVA analyses to report one for between subjects and one for within-subjects? Or one analysis is enough. If so, how do I find the two Errors (one for between subjects and one for within-subjects)?
BTW, I use SPSS.
Thank you very much for your help!
In my dependent variable, four categories are there. I already have found the regression result of mprobit. Although, I had run a syntax:
mfx, predict(p outcome(4)) varlist(_all), but I got result with value like 0.00000083. Is this result is fine? or any other command is there in STATA.
Please help me!!!
I need to use regression models for my research. I used SPSS for linear regression but I want to use univariate and multivariate power regression such as:
a,b,c: model parameters
Y: dependent variable
X,Z: independent variables
Is there any user friendly statistical software to do it?
(I know about SAS or R software, but I think they perform regression by programming)
I built the SVR model in MATLAB. I want to use this model to predict the optimal experimental parameters (not SVR model parameters), such as using pso or GA, but I don't know the SVR model regression function (objective function), please help me.
I am testing Fama French three- and five-factor models for Japan. I have done all the regression part, however I am strugglin with GRS test.
Can anybody help me with that please? How can i test GRS in Stata?
I know that the command is grstest2, however I do not know how to use it and how to read the results.
Highly appreciate the help
I am using probit regression to check mortality rate of larvae after exposure to various bacterial isolates and my total sample number is 16 including one control. i transformed the mortality rates to probits from finney table.
In my research, I've made a following equation (with a demographic dummy variable Czech = 1, Dutch = 0): DV * Czech = b0 + b1*Czech + b2*Czech ... b10*Czech.
I can simply interpret the b1...b2 for the Czech demographics, however, how can I interpret the variables for the Dutch population (Dutch=0)? I would need to use only the intercept? Or do I have a mistake in the model equation (the DV should not be an interaction term)?
Thank you for your response!
Hi there. I'm a medical student and now I'm working on survival prediction models. But I encountered some difficult problems about feature selection. The original sequencing data was huge, so that I relied on univariate COX regression to obtain a subset (subset A), and I'd like to perform Lasso regression to further select features for the final survival prediction model construction (using multi-variate COX regression). However, the subset A was still huge than I expected. Can I further obtain subsets by limiting the range of Hazard Ratio (HR) for Lasso regression? Or, could I perform Random Survival Forest to obtain subsets from subset A, for final survival prediction model construction? Is there anything I need to pay special attention to during these processes?
No previous study used large-scale data from many countries on a topic- can this be a strong rationale for a study? for instance, if I say that there is no previous cross-national study on a topic so it cannot be said that it is a worldwide phenomenon rather than a specific regional phenomenon. Also, a cross-national study will give the result a stronger generalization power which might of interest to international practitioners like WHO.
- Can these statements be a strong rationale and justification for a study? Please let me know, Thank you.
I have non-stationary time-series data for variables such as Energy Consumption, Trade, Oil Prices, etc and I want to study the impact of these variables on the growth in electricity generation from renewable sources (I have taken the natural logarithms for all the variables).
I performed a linear regression which gave me spurious results (r-squared >0.9)
After testing these time series for unit roots using Augmented Dickey- Fuller test all of them were found to be non-stationary and hence the spurious regression. However their first differences for some of them, and second differences for the others, were found to be stationary.
Now when I test the new linear regressions with the proper order of integration for each variables (in order to have a stationary model) the statistical results are not good (high p-value for some variables and low r-squared (0.25))
My question is how should I proceed now? Should i change my variables?
I am working on nonlinear monetary policy rules using time series analysis and I need to estimate a threshold variable and through that utilize a threshold regression.
I want to center (subtract the mean) my data in order to lower my intercept (as close as possible to zero). However, as I have categorical data as well, which I don't know how I can possibly center, I can't seem to lower my intercept.
And the thing is that my dummy variable drive up the intercept... So I don't know to fix the issue.
When I enter only continuous variables that are centered, the intercept is close to zero.
I thank you in advance?
I am looking for suggestions on statistical methods for the following study design.
- We have one control group and four treatment groups of mice.
- Sample size is 6 mice per group. The number of variables (i.e. metabolites) is about 200.
- Each treatment aims to increase the lifespan of mice. However, we do not know how long each mouse lived. **We only know on average how much longer each treatment lives than control.** For example, for treatment group 1, we know on average the mice lived 30% longer than the control, while treatment group 2 80%, etc.
What I am puzzling is this is not a group comparison (i.e. using ANOVA) or a regression, because we do not know exactly how much longer EACH mouse lives. Instead, what we know is the average lifespan of each group. Let's say we would like to perform a univariate analysis, namely analyzing variables one by one.
Here are my thoughts so far
- Conduct simple linear regression anyway, i.e. we make the lifespan of each mouse the average lifespan of its corresponding treatment group and then perform linear regression. Or
- Use multinomial regression, e.g. proportional odds model or baseline odds model, because there is an order by lifespan for these groups.
I feel either is optimal. For the multinomial regression, we are not able to distinguish 1% vs 10% and 1% vs 99%, i.e. we are taking the continuous information inefficiently.
Are these valid methods? Or what statistical method would you recommend to conduct this analysis Thank you!
Many references are explained that kriging and GP regression is the same. However, in the formulation, they have differences from each other. Can you explain the difference between these two methods from the statistics domain?
Actually need to compare for different kriging methods, for example, simple kriging assumed that the mean of the random function is known and constant as GP regression. However ordinary kriging assumed the mean of the function is known and changing locally (E(x)).
I want to do comprehensive study of errors in variables from both numerical analysis and statistical viewpoint and compare the results with regression for selected parameter estimation problems in my domain where it is expected to perform better in terms of accuracy. These problems are of type linear and non linear regression. I want to check if the method under study is an improvement over generalized least squares. I am including multiple factors like accuracy, computational efficiency, robustness, sensitivity in my study under different combinations of stochastic models. What kind of statistical analysis/ experimental design/metric/hypothesis test is required for a study of this nature to establish superiority of one method over another(to make a recommendation of one method over another for a particular class of problems)
I’m analyzing how multiple emojis change the perception of messages. In the questionnaire, the participants had to rate neutral sentences and the same neutral sentences enriched with one positive, negative or neutral emoji or the same emoji twice or three times on a 7-point Likert-scale from very negative to very positive (DV). The IV in this case is the number of emojis used in the sentences (0, 1, 2 and 3). I also want to control, how adding emojis to the sentences influence the perception of the sentences for gender and digital generations.
To analyze the effect, I conducted an ordinal logistic regression, because the DV is ordinal. I also split the IV, to calculate different models with the IV’s 0 or 1 emoji, 1 or 2 emojis and 2 or 3 emojis. Now I want to control this effect for gender and digital generations.
- Is there another way than splitting the data set e.g. in different genders and calculating the regression again for different data sets, one containing only men, one only women and one only other genders?
- Is there another method than ordinal logistic regression, to check whether the effects derived from the collected data are significant?
I trust this email finds you well.
I am currently testing a theorical model using the PLS-SEM and Smartpls. The aim is to assess the influence of certain soft skills (empathy) on client relationship outcomes (e.g customer loyalty).
I would like to control to assess the effect of certain (control) variable on some of my dependent variables.
The issue that I am encountering is the following.
I have 5 control variables:
· Gender (w/ 2 groups: male female)
· Salary (w/ 5 groups: below 20k, 21-30k, 31-50k, 51-70k, +70k)
· Education (w/ 5 groups: no diploma, high school diploma, bachelor, master, PhD)
· Visit frequency (w/ 5 groups: once a year, once every 6 months, ect...)
If I understood things correctly to yield significant result and interpret findings correctly, I will need to use dummy variables. Which means that, based on the above, there will be a total of 17 control variables.
Based on your experience, is there a way to simplify the above? I was thinking to reduce the number of groups for each variable. For instance, instead of having 5 salary categories, reduce the number of categories down to 2 such as:
- Salary: above and below the national average salary.
- Education: those who have at least a master, those who do not.
What do you think of the above?
Of course, if you have any information / resources / material / that may allow me to address the above issue, that will be appreciated.
I thank you once more for your assistance and wish you a nice day.
Good day Scholars
I am currently working on a time-series regression with endogeneity problem. I just want to know if Cochrane orcult estimator performs better than the two-stage estimator in a single-regression equation with endogeneity problem.
Thanks for always being there
have 3 IV likert scale variables and 1 DV which is also likert scale data. I want to do moderation analysis. can i use regression for this data. And suggest ways to do.
I have 60 samples of Whole Genome Sequence (BAM, VCF ) File. Now I want to do SNP analysis ( I am trying to find some positions which differ among case and Control ).
I did with Python and Found some Positions based on 70 % different among Case and Control.
While I googled I found many Researchers are using plink and other tools for linear ( Logistic ) Regression and finding p values.
if anybody has Notes, information, or links regarding file to Plink regression please share.
I also watched some Youtube Videos but still which is insufficient.
I have attached some of my data screenshots.
1. Sample of Merged 60 VCF File
2. sample of the single VCF file.
Thank You all
#plink #vcf #plinkToVcf #SNP #SNP_Regression #WGS
I have sample a of 138 observations (cross sectional data) and running OLS regression with 6 independent variables.
My adjusted R2 is always coming negative even if I include only 1 independent variable in the model. All the beta coefficients as well as regression models are insignificant. The value of R2 is close to zero.
My queries are:
(a) Is negative adjusted R2 possible? If yes how should I justify it in my study and any references that can be quoted to support my results.
(b) Please suggest what should I do to improve my results? It is not possible to increase the sample size and have already checked my data for any inconsistencies.
I am preparing to run some analyses on two variables that seemingly have a nonlinear relationship. I say this because when I fit a line to the scatterplot, r-squared was higher for the cubic equations was higher.
Any tips on this? I'm trying to use X at one time point to predict Y at a later time point. What analysis for this case would be appropriate, given what the scatterplot suggests?
I hope someone can help!
I am doing a Fractional Outcomes regression (Logit) for my thesis and can't find information on what assumptions the model makes. I suppose I would need to test those assumptions are met in my sample in order to be able to conduct the analysis. Furthermore, I wanted to know if there is any possibility of doing a robustness test on such a model.
Additionally, in a Fractional Outomes regression do my independent variables have to be between 0 and 1 as well?
Thank you very much,
I am doing my research about the determinants of bank capital structure, for this I use information about banks’ consolidated balance sheets and income statements from the CapitalIQ database. My original dataset has 1712 observations, when accounting for missing values the dataset reduces to 1074 observations. I have the following problem, that when I run my OLS regressions using the complete observations dataset with heteroskedasticity and cluster robust standard errors (coeftest in R), most of my variable coefficients become insignificant. This does not happen while using the original dataset which contains missing values or the complete cases one without robust se. Does missing values or attrition affect the heteroskedasticity in data and therefore the significance of coefficients while accounting for heteroskedasticity and cluster robustness?
I will be glad to get any input or clarification.
i estimated Autoregressive model in eview. I got parameter estimation for one additional variabel which i have not included in the model. the variable is labelled as ' SIGMASQ '.
what is that variable and how to interpret it?
i am attaching the results of the autoregressive model.
thanks in advace.
- What is the difference between Least Square Regression and Robust Regression?
- How can we interpret the results of the regression model in both cases?
- If the variables in the data set have not shown proper correlation, can we use these techniques?
- Any R script references?
I am conducting my dissertation which has a quantitative data analysis process and, as I have never done it before, I need your help to understand the needed steps.
The basic model will require three separate regressions to confirm the relationship between constructs. The data has been collected d with a Likert-style questionnaire where multiple questions measure each variable.
Before I start running the regressions, which steps should I implement?
I believe I would need to do reliability, do I test the Cronbach's alpha for each question of for the variables altogether?
Similarly for the regression, do I take the Likert average of each variable, or do I test each question separately?
Thank you for your help! If anyone would also be available for a video call it would be of amazing support.
I have many data regarding two independent variables. I asking for how to get a fitting curve to predict the attitude of the two variables. I wonder if it will be beneficial to use multi regression modeling. I checked linear regression but it was not valid.
I am looking into substitution between accruals and relative earnings management for my masters thesis. And since there is a joint relation between them I understand I need to run 2SLS for the simultaneous equations. However, I clear from previous literatures about the exogenous variables that they use but I am not quite clear on the instrumental variables they use for the first stage of the regressions.
From my point of understanding I believe they use accruals (AM) as instrumental variable for the relative earnings management (RM) equation and RM as instrumental variable for the AM equation.
I would really appreciate some insights on the soundness of my understanding on this. Also would be great if I could get some ideas on what IV variables to use.
Hi there. I have a question about comparing path estimates. Basically, I have 8 models with the same outcome variable across the 8 models, but different predictors in each model. There are 5 time points of data for both variables in each model. I have constrained the cross-lagged path from the predictor to the outcome variable to be the same at each time point, and so I essentially have one cross-lagged path estimate of interest from each model.
In summary, I have 8 cross-lagged path estimates from 8 different models for the same outcome variable….and I want to compare them. Could you explain how best to go about doing so? I’ve seen two approaches in the literature which don’t actually seem to be widely used: 1) The Cumming approach of testing for 50% overlap in the confidence intervals of the standardized regression coefficients, and 2) The Clogg et al. (1995) approach and calculate z scores from the standardized regression coefficients and their standard errors.
Would you recommend either of these? Thanks!
Also, for a more qualitative comparison between path estimates, surely the standardized regression coefficient is superior to the unstandardized??
I have a balanced panel of firm data, in total 39 firms throughout 21 quarters. These are a subset of S&P100 firms from the period 2013-2020 and were selected based on earnings call transcript availability and respective newspaper coverage. The regression model has cumulative abnormal returns following quarterly earnings calls as the dependent variable and as independent variables: a variable representing economic sentiment, book-to-market, leverage, firm size, EPS surprise and volatility prior to the earnings call.
I know that clustered standard errors rely on asymptotic arguments, therefore it might not be reliable to draw inference on those since the number of clusters along both dimensions is less than the generally recommended number (40-50 approximately). Nevertheless, I could argue that there well might be unobserved components of the error term that clustering would account for.
The case is similar with fixed effects.
So my question is: what is the recommended procedure? Should I just report regression results with and without clustered standard errors and describe the above concerns, or are there any other arguments I could use as to why/why not to cluster and/or use fixed effects in this case? I know the definitions and what these are used for, I'm rather looking for arguments for or against in this specific case.
I am running a binomial coefficients regression on R and I’m not getting all my variables back. How best can I solve this?
from the picture, there is a missing age group 1 and a diagnosis type.
Command for including trend in seemingly unrelated regression in stata.
I ran a regression and I have trouble understanding the interpretation of a coefficient if I have some independent variables that are proportions. I ran a FE regression btw, so I know that the interpreation would be something like "for a given fund, as X varies
across time by one unit, Y changes by 𝛽2 units".
The dependent variable in my regresson is i.e. excess returns like i.e. "2.5" (or -1.7 etc.). As independent variable I know i.e. have "Members-to-beneficiaries-ratio" expressed as i.e. "0.77" or "3.5" or even "125.8" etc. So, for a given fund, as the Members-to-beneficiaries-ratio varies across time by one unit, the excess return changes by i.e. - .345 units. Does this mean if in my first example the MtBRatio changes from 0.77 to 1.77 then the excess return changes from 2.5 to 2.155?
Any help on this would be greatly appreciated! Thanks a lot in advance.
I would like to know if it is possible to extract a correlation value (r) from a regression value (r²). Is there any calculation that allows this?
In adsorption, a reviewer suggested me to not to use linearized models in the age of computers where it is possible to perform more precise non-linear regression.
I have found an association between social distancing attitudes and the willingness to do social distancing, as studied on a Likert type scale (Never to Always). Now there is also an association between what type of work they have and the same response.
My question is how can I isolate the influence of city from the influence of line of work? Wondering if there is a technique for nominal/ordinal variables similar to adjustment/control variables in regression. I would be interested in anything from AI / statistics, preferrably to be used in R or Excel.
And open for collaborations if anyone wants to co-author or similar
I am doing a moderated mediation regression in process (model 7). I have covariates with a lot of levels, (e.g., 5) and created dummies.
Now, when I run the PROCESS model, I get the error: NOTE: Due to estimation problems, some bootstrap samples had to be replaced. The number of times this happened was:2740.
When I run the model with the categorical variables (no dummies), I do not get the error, but then I do not have dummies. What would be the most simple option for now?
Thanks in advance.
I am doing a research on Regression Test Cases Prioritization with Neuro Fuzzy System.
Is there any suggestions How can I implement this on MATLAB or any other Platform?
EDIT: Up to the literature suggested in the answers, IT IS NOT POSSIBLE because they are required at least some calibration data, which - in my case - are not available.
I am looking for a technique/function to estimate soil temperature from meteorological data only, for soils covered with crops.
In particular, I need to estimate soil temperature for a field with herbaceous crops at mid-latitudes (north Italy), but the models I found in literature are fitted for snow-covered and/or high-latitude soils.
I have daily values of air temperature (minimum, mean and maximum), precipitation, relative humidity (minimum, mean and maximum), solar radiation and wind speed.
Thank you very much
Hello, I would like to regress the independent variable GINI (Gini Index) on the dependent variable GDPpc (GDP per capita growth). For GDPpc to reach its stationary form, its first difference had to be taken. However, GINI was already in a stationary form.
I tried two different regressions. One with GDPpc and GINI being both in their first difference and one with only GDPpc being in its first-difference.
The first regression (d1_GDPpc, d1_GINI) yields a really high p-value, while the second regression (d1_GDPpc, GINI) yields a low enough p-value for the effects to be statistically significant.
My questions are the following:
Is it possible to regress variables in their first difference with variables in their level? Which regression model should I choose?
This question may seem absurd or look like kind of a p-hacking method but I'm gonna explain.
I am currently working on mangrove ecosystems and I have a range of 14C calibrated data that I use to estimate the elevation of sea level over a certain period of time (380 to 6850 BP). I performed my regression analysis, obtain my tendencies and now I would like to compare my results with some from other studies. I found a good article with lots of data that I could use in my discussion. However, although the authors have drawn a dot plot with tendencies and estimated elevation rates of mean sea level (MSL) on it, they did not report confidence intervals of the estimated rates. Since I have the dataset they used, I would like to obtain those intervals to see if my elevation rates are comprised within or not. Problem is: when I use their data and perform the regressions on the mentioned intervals I cannot find the same tendencies that they did so I wonder if there is a maneer to specify the regression coefficient and let R finding the right subset of data that fits it?
I wonder also if there is another way to do, other than with regression analysis and IC?
Thank you in advance.
I am writing a thesis paper on the relationship between Income Inequality and Economic Growth. I want to use Educational Attainment as a control variable, but it does not have values for some years. Specifically, my paper's time span is from 1961 to 2018 and the variable has missing values only for 2 years: 1961 and 1963. Is there gonna be any issues using this variable in the regression, considering the fact that only 2 values are missing?
I've clarified the relationship between customer satisfaction and e-wom intention is U-shape (quadratic regression). In my model, Triggers moderate this relationship. But I don't know to prove it, because Process in SPSS only for linear regression
I am running a linear regression with three-way interaction in R
lm(A~X*Y*Z), A=numerical variable, whereas X, Y, Z all are categorical variables with factors.
X=5 factors, Y=2 factors, Z=4 factors. Every time I am running regression the three-way interaction for the last level is missing. For e.g. if I relevel the Z factors, the last is getting dropped in the three-way interaction.
Coefficients: (8 not defined because of singularities). (This is mentioned in the R output)
I have tried using zero intercepts but it did not make any difference
lm(A~0+X*Y*Z) or lm(A~X*Y*Z-1) and all other possible combinations.
I need three-way interaction results to make a conclusion about my data.
I am creating 5 different regression models to measure a single phenomenon, tuition discounts. Tuition discounts are my independent variable in each regression. The Dependent variables are revenue, SAT scores, and percentage of underrepresented races. Each regression also has control variables such as net tuition price, endowment, total enrollment. My question is: can I use percentage of race as a control variable for my regression measuring the relationship between tuition discounts and SAT scores? Percentage of race is a good predictor for SAT scores so it makes sense to use as a control, but it is a dependent variable in one of my other regressions. It seems a little cannibalistic to use the variable in two places.
Excuse me, I know this may sound silly, but I'm having a hard time on finding out how to calculate the CC50 of a compound using Prism. I know I have to transform the concentrations tested into X=log(X) and normalize the absorbance measured into percentage values, and then fit a non-linear regression curve, but which equation do I chose when setting the parameters for the non-linear regression?
It's easy for calculating the IC50, since I'm adressing inhibition and there are clear options of equations regarding inhibition, but when I'm trying to assess citotoxicity, I guess the idea is a bit different, and I can't find anything elucidating this issue. Could anyone help me with this, please?
I use the random-effect Tobit regression for my data with 131, 511 observations. The coefficient is 0.002 significant at a 5% level. The reviewer thinks the results are not very strong and suggested me to use t-statistics adjusted for a large sample. Anyone can give me some comments on this, thank you very much!
I am using GMM technique for panel data. I have eight variables in my model. For some varaibles, I am coming across different signs of correlation and regression coefficients like, one of my variable has a positive coefficient in the regression model but when I am plotting the correlation matrix, it is showing a negative coefficient of correlation with the dependent variable. I need to know if this is a point of concern for the model, if it is, then what can be the solution to this problem?
I'm a beginner in the field of econometrics. I'm currently working on a Difference-in-difference regression, and I have a (probably a very basic) question in this regard:
- In what way are the coefficient normalized in a static Difference-in-Difference regression?
My results are statistically significant, but parameter estimates are negative. I have attached my result. Both of my independent and dependent variables are likert scale 1- strongl disagreed to 5 - strongly agree. And that is way scored 1-5.
Can anyone please answer. It will be a great help.
For a current research project, I am dealing with a large acquistion dataset which in many cases includes multiple events/acquisitions per firm. Target is to calculate the Buy-and-Hold Abnormal Returns (BHAR) of the companies' stock for a 2-year period following the event and check for a stasticial significance of the BHAR in connection with financial and non-financial variables.
When running a regression for all events as stand-alone datapoints (i.e. counting the same firm multiple times with multiple events), I am not receiving any meaningful results. When however using a pivot fuction to summarise the results on a per company basis (i.e. take the average BHAR per company and only counting each firm once), I am obtaining a statistically relevant regressions.
I have been reading into the relevant literature but have not found any views if this approach is scientifically permitted. Does anyone have a view on wether an averaging of event study results on a per company basis is scientifically accepted?
My question is this: if I have data on employee satisfaction at time 1 and time 2, and I want to predict say leaving intention at time 2, could response surface analysis with polynomial regression be used?
If in a multivariate model we have several continuous variables and some categorical ones, we have to change the categoricals to dummy variables containing either 0 or 1.
Now to put all the variables together to calibrate a regression or classification model, we need to scale the variables.
Scaling a continuous variable is a meaningful process. But doing the same with columns containing 0 or 1 does not seem to be ideal. The dummies will not have their "fair share" of influencing the calibrated model.
Is there a solution to this?
I want to make a empirical forecast model on the basis of hypothesis or other statistical parameters like Ordinary Linear Regression and OG Regression. if some one has this then please send me. I want to learn methodology because i am not a stat student.
email : [email protected]
I am currently working on a research project with the title "How does internet use affect trust? An intergenerational comparison." I'm essentially running an ordered logit model, with the dependent variable trust being made up of 11 categories. The main independent variable is also split into 4 categories of daily internet use and there are also 4 age generation categories and a range of other controls.
I've run the regression model and done a preliminary analysis using odds ratios. However, I essentially want to extrapolate the marginal effects of higher internet use compared to lower internet use the probability of a high trust outcome P(yi > 5) t and whether this varies between generations.
Thank you and please let me know if any further details are required of if this is unclear!
I hope all is well.
Thank you very much for your help
I am used to STATA and very new to using R package.
STATA has lasso inference for linear and logistic regression. However, it doesnt have LASSO features for cox regression.
I wonder if I can use R to do LASSO inference for cox regression model?
I am literally very new to R and would appreciate if you can help me do syntax in R for my model.
I am sorry that I am very Naive in R
If I am using STATA, I would do the following to produce the cox model:
1)stset PTIME, failure(PSTATUS)
2)stcox i.sex BMI_TCR COLD_ISCH_KI SERUM_CREAT END_CPRA i.ETHCAT AGE AMIS BMIS i.STEROIDS_MAINT AGE_DON i.DIAB i.dgf
3)estat phtest (to test proportional hazard assumptions)
I wonder what is the syntax to do the same in R ?
Also, what are the syntaxes to use this model to perform LASSO inference for cox harazrd regression?
Finally how to do the post-estimation tests after fitting in the LASSO inference for cox harard regression?
Thank you vey much for your help
Looking forward to hearing back from you
I want to generate a new variable standard error of forecasting (stdf) after VAR and/or SVAR. I can easily generate stdp (standard error of prediction) and stdf after regress, and stdp after VAR. What about stdf after var? Could any one of you please help me?
I ran a multiple linear regression model which has one dependent variable and four independent variables influencing it. The R -square of the model was very high (reached 95%) but when I used the approximation for some cases, there was a significant difference in the calculated value compared to my research results.
According to the model, only one predictor has a very large impact on the output while the other three predictors are minimal. (This should not be the case, since they all affect the result)
Is there anything else I should consider when using regression? How can I improve the influence of the predictors? More data?