 Statistical Analysis

Statistical analysis is an important component in research as it includes various tests that help in formulation and answering of various research questions. Each of these statistical tests normally requires a researcher to ensure high formulation and analysis standards in order to correctly use them and draw accurate conclusions. This normally requires a researcher to use appropriate statistical tests that are precise, unbiased, efficient, and effective in addressing the research issue. Moreover, it requires understanding of various statistical terms such as independent and dependent variables to decide which test to use. This paper discusses hypothetical analysis test such as Pearson correlation, Chi-square goodness-of-fit test, independent t-test, and Chi-square test. It also discusses the difference between correlation and regression.

In statistics, performing analysis test for the purpose of answering a hypothetical research question is important to draw conclusions which address the projected issue at hand. Using statistical tests to determine the relationship between data to be tested against standardized data is only effective when all variables are accurately identified as either independent or dependent. This enables the statistician to present the dependent variable as an outcome variable, while presenting independent variable as a determining variable. Additionally, statistical tests are normally used to help measure continuity of outcome variables. This paper discusses various forms of scientific tests as well as differences between correlation and regression.

Pearson Correlation

According to Gravetter & Wallnau (2010), Pearson correlation analysis is normally used to determine the linear relationship between continuous outcome variables and continuous determining variables. Authors note that Pearson correlation analysis is used to indicate the existence of correlation between variables rather than determining the strength of the relationship between these variables. Pearson correlation analysis requires two variables to be measured on a continuous scale. This involves measuring the continuous dependent variable e.g. weight for age x-index against continuous independent variable e.g. family income.

Additionally, Gravetter & Wallnau (2010) point out that Pearson correlation analysis requires variables that need to be compared to originate from a population with bivariate normal distribution. This enables the statistician to calculate Pearson’s correlation coefficient which is used to determine if there is a correlation between variables at a given significance level. For instance, the calculated Pearson value can be either 0 meaning no correlation or 1.0 meaning a perfect correlation between variables. However, Pearson correlation analysis is only appropriate for continuous data with bivariate normal distribution. It excludes discrete type of data.

On the other hand, Gravetter & Wallnau (2010) point out that positive correlation normally indicates linear correlation of two variables in same directions. This indicates a strong correlation between variables as an increase in one variable leads to an increase of the other variable. The same also applies in a scenario where one of the variables decreases. However, negative correlation normally indicates the correlation of variables in opposite directions. In this case, an increase in one variable results in a decrease in the other, and vice versa. Moreover, authors note that strong correlation does not signify causality since correlation does not prove the cause and effect of relationship, but rather describes the nature of the relationship.

Correlation and Regression

According to Chatterjee & Hadi (2006) correlation is used to determine whether there is any relationship between variables. On the other hand, authors point out that regression is concerned with devising a prediction formula that denotes the relationship between variables so as to estimate the dependence of one variable from the other. However, both correlation and regression analyses have a common feature of analyzing the relationship between variables though they differ in terms of the scale at which their coefficients are deduced.

Chatterjee & Hadi (2006) further point out that correlation coefficient, r, is calculated from a bivariate normal distribution, while regression coefficient is calculated from the rate of change of dependent variables relative to change in independent variables. This makes regression analysis a preferred approach of preserving the continuous dependent variable and testing the association of this outcome with a combination of continuous and categorical independent variables. Authors describe categorical variables as a representation of data that can be divided into several groups, for instance sex, race. They point out that regression analysis is preferred over correlation analysis since its estimation of the linear regression coefficient can involve more than one independent variable in predicting the value of the dependent variable. For instance, one can use regression analysis to predict total annual income of a businessman (dependent variable) based on a variety of independent variables such as age. This is different with correlation analysis, where correlation coefficient is determined only between two variables.

Additionally, Chatterjee & Hadi (2006) point out that once all variables are plotted in the graph showing linear relationship, a regression line that denotes a high degree of association is drawn. Authors note that the point where regression line intercept the axis of dependant variable, known as the y-intercept, shows the value of a constant c, slope, m, and linear coefficient can be calculated from the regression equation y=mx+c.

Chi-Square Goodness-Of-Fit Test and Independent T-Test

According to Winn (2009), chi-square goodness-of-fit test is based on the comparison of observed values and expected or theoretical values so as to measure how both theoretical and experimental data best fit each other. Observed values are the ones which are obtained empirically in the field through direct observation, while expected values are the ones which have been developed on the basis of particular hypothesis. For instance, in 50 flips of a coin the researcher would expect 25 heads and 25 tails, while through observation one may record 32 heads and 12 tails. So this requires Chi-square goodness-for-fit test to determine fitness of the expected and observed data so as to address the fairness of the coin.

On the other hand, Winn (2009) points out that independent t-test is normally used to measure the difference of means between two independent groups of continuous variable.  However, Winn notes that independent t-test deals with continuous variables while the chi-square goodness-of-fit test deals with discrete variables. Secondly, as independent t-test compares means of independent groups of continuous variables, the chi-square goodness-of-fit test shows the fitness of theoretical data with observed data. Moreover, the author notes that in independent t-test observations are normally distributed, while in chi-square goodness-for fit test observations follow chi-square distribution.

Winn (2009) notes that independent t-test can be used instead of chi-square goodness-of-fit test if the data presented is continuous and normally distributed and requires the comparison of mean values of two independent variables. However, chi-square goodness-of-fit test can be used by the researcher in case where data presented is discrete, but only points out the possibility of two outcomes.

Chi-Square Test of Independence

According to Ryan (2006), chi-square test of independence is normally used in cases where there are two categorical variables from the same population. He notes that this test is used to determine whether there is a relationship between two variables. The author points out that chi-square test is different from chi-square goodness-of-fit test since it deals with the comparison of observed class frequencies and expected frequencies. This is different from chi-square goodness-of-fit test, which compares the collection of categorical data with some expected theoretical distributions. Additionally, the author points out that chi-square test of independence uses contingency table, while chi-square goodness-of-fit test does not. Moreover, chi-square goodness-of-fit test concentrates on figures such as 0.05, 0.01 or 0.001 in the chi-square table, while the independent assessment chi-square uses 0.95 or 0.99 figures.

Further, assumptions underlying chi-square test and chi-square goodness-of-for fit test are not the same. Ryan (2006) notes that chi-square independent test assumes that all variables under consideration are independent, while goodness-of-fit test does not. On the other hand, he points out that chi-square independent test assumes that variables are normally distributed, while the goodness-of-fit test assumes that variables follow chi-square distribution. However, it is important to note that both tests assume that the sampling was done randomly.

As pointed out by Ryan (2006), nonparametric data are typically counted and then put into categories to determine the relationship between different groups. Therefore, in case one is presented with nonparametric data, it is necessary to group variables in terms of frequency data in contingency tables to determine the difference in proportions between the groups. In this case chi-square independence test would be used. However, in cases where a group is tested if it fits in a declared group, a chi-square goodness-of-fit test is used.

Conclusion

The paper has pointed out the need for researchers to know these statistical analysis tests so as to apply them appropriately in solving problems. Moreover, the paper has emphasized the need to develop accurate statistical analysis procedures.