Actions

Pearson Correlation Coefficient

Revision as of 19:34, 16 July 2019 by User (talk | contribs)

Pearson’s correlation coefficient is the test statistics that measures the statistical relationship, or association, between two continuous variables. It is known as the best method of measuring the association between variables of interest because it is based on the method of covariance. It gives information about the magnitude of the association, or correlation, as well as the direction of the relationship.[1]


Pearson Correlation Coefficient Calculator[2]

The Pearson correlation coefficient is used to measure the strength of a linear association between two variables, where the value r = 1 means a perfect positive correlation and the value r = -1 means a perfect negataive correlation. So, for example, you could use this test to find out whether people's height and weight are correlated (they will be - the taller people are, the heavier they're likely to be).

Requirements for Pearson's correlation coefficient

  • Scale of measurement should be interval or ratio
  • Variables should be approximately normally distributed
  • The association should be linear
  • There should be no outliers in the data

Equation Pearson Correlation Coefficent Equation


Values of Pearson's Correlation Coefficient[3]

The first step in studying the relationship between two continuous variables is to draw a scatter plot of the variables to check for linearity. The correlation coefficient should not be calculated if the relationship is not linear. For correlation only purposes, it does not really matter on which axis the variables are plotted. However, conventionally, the independent (or explanatory) variable is plotted on the x-axis (horizontally) and the dependent (or response) variable is plotted on the y-axis (vertically). The nearer the scatter of points is to a straight line, the higher the strength of association between the variables. Also, it does not matter what measurement units are used. Pearson's correlation coefficient (r) for continuous (interval level) data ranges from -1 to +1:

Value of Pearson Correlation Coefficent

Positive correlation indicates that both variables increase or decrease together, whereas negative correlation indicates that as one variable increases, so the other decreases, and vice versa.


Pearson's Mathematical Development of Correlation and Regression Brief History[4]

In 1896, Pearson Pearson, K. (1896), “Mathematical Contributions to the Theory of Evolution. III. Regression, Heredity and Panmixia,” Philosophical Transactions of the Royal Society of London, published his first rigorous treatment of correlation and regression in the Philosophical Transactions of the Royal Society of London . In this paper, Pearson credited Bravais (1846) with ascertaining the initial mathematical formulae for correlation. Pearson noted that Bravais happened upon the product-moment (that is, the “moment” or mean of a set of products) method for calculating the correlation coefficient but failed to prove that this provided the best fit to the data. Using an advanced statistical proof (involving a Taylor expansion), Pearson demonstrated that optimum values of both the regression slope and the correlation coefficient could be calculated from the product-moment, PCC Product-Moment, where x and y are deviations of observed values from their respective means and n is the number of pairs. For example, in linear regression, if the slope is calculated from the product-moment, then the observed x values predict the observed y values with the minimum possible sum of squared errors of prediction, PCC Errors of Prediction.

A simpler proof than Pearson's for the product-moment method appeared in Ghiselli (1981). Although neither Pearson's nor Ghiselli's proof is likely to enhance the flow of a typical introductory statistics class, a simple numerical re-creation of Ghiselli's proof can illuminate the important point about the optimal prediction of y from x. Such an example appears in Table 1. This table uses pairs of deviation scores to help demonstrate that the expression PCC Expression = b x minimizes squared prediction errors when b is calculated as the product-moment. The first three columns demonstrate the calculation of b as the mean (moment) of the products of the deviation scores. Both x and y vectors are centered on 0 so that each score is itself a deviation score. The rightmost four columns of Table 1 show that adding a small offset to the value of b or subtracting a small offset from b enlarges the sum of the squared errors of prediction. Both Ghiselli's and Pearson's proofs demonstrate more generally that any departure from the product-moment worsens the prediction of y from x.

Table 1.
Numeric Example Demonstrating that Adding or Subtracting an Offset from the Product-Moment Worsens the Prediction of Y from X. Pearson Correlation Table


Pearson Correlation Coefficient - Practical Uses in Investing[5]

For an investor who wishes to diversify a portfolio, the Pearson coefficient can be useful. Calculations from scatter plots of historical returns between pairs of assets such as equities-bonds, equities-commodities, bonds-real estate, etc., or more specific assets such as large cap equities, small cap equities and debt-emerging market equities will produce Pearson coefficients to assist the investor in assembling a portfolio based on risk and return parameters. Note, however, that a Pearson coefficient measures correlation, not causation. If large cap and small cap equities have a coefficient of 0.8, it will not be known what caused the relatively high strength of association.


See Also

Big Data
Predictive Analytics
Statistical Analysis
Statistics
Data Mining
Data Analysis
Data Analytics
Machine Learning]
Quantum Computing
Bayes' Theorem
Decision Tree
Tree Diagram


References

  1. Definition - What Does Pearson Correlation Coefficient Mean? Statistics Solutions
  2. Pearson Correlation Coefficient Calculator Social Science Statitics
  3. Values of Pearson's correlation coefficient UWE
  4. A Brief History of Pearson's Mathematical Development of Correlation and Regression ASA
  5. The Practical Uses of Pearson Correlation Coefficient in Investing Investopedia


Further Reading

  • Pearson Correlation - Online Statistics Education: An Interactive Multimedia Course of Study [Rice University (Lead Developer), University of Houston Clear Lake, and Tufts University]
  • An Introduction to Pearson's Correlation http://www.statstutor.ac.uk/resources/uploaded/pearsons.pdf Stats Tutor]
  • A comparison of the Pearson and Spearman correlation methods Minitab