Pearson's Correlation

This is my first article on my Correlation blog. 

Hope you guys will like it.πŸ˜ƒπŸ˜ƒ

Correlation coefficients are used to measure how strong a relationship is between two variables. There are several types of correlation coefficient, but the most popular is Pearson's.

 Pearson’s correlation (also called Pearson’s R) is a correlation coefficient commonly used in linear regressionThe full name is the Pearson Product Moment Correlation (PPMC). 

If you’re starting out in statistics, you’ll probably learn about Pearson’s R first. In fact, when anyone refers to the correlation coefficient, they are usually talking about Pearson’s.

Correlation coefficient helps us to find how strong a relationship is between data. It gives both direction and strength of the relationship. This correlation works good with linear datasets.

Below is the formula for Pearson's Correlation :-




We can also write the above formula in the form of Z score.




The Formula returns the result in between -1 and 1, where:

  • 1 indicates a strong positive relationship.
  • -1 indicates a strong negative relationship.
  • A result of zero indicates no relationship at all.






  • Key Points from the above image :-

    • A correlation coefficient of 1 means that for every positive increase in one variable, there is a positive increase of a fixed proportion in the other. For example, shoe sizes go up in (almost) perfect correlation with foot length.
    • A correlation coefficient of -1 means that for every positive increase in one variable, there is a negative decrease of a fixed proportion in the other. For example, the amount of gas in a tank decreases in (almost) perfect correlation with speed.
    • Zero means that for every increase, there isn’t a positive or negative increase. The two just aren’t related.



    Relationship between Pearson's Correlation and Covariance :-







    Potential problems with Pearson correlation :-


    The Pearson's correlation is not able to tell the difference between dependent and independent variables. For example, if you are trying to find the correlation between a high calorie diet and diabetes, you might find a high correlation of .8. However, you could also get the same result with the variables switched around. In other words, you could say that diabetes causes a high calorie diet. That obviously makes no sense. Therefore, as a researcher you have to be aware of the data you are plugging in. In addition, the PPMC will not give you any information about the slope of the line; it only tells you whether there is a relationship.

    But it is one of the best Correlation method used in the Statistics.




    Comments