corr
Linear or rank correlation
Description
[
specifies options using one or more name-value pair arguments in addition to the
input arguments in the previous syntaxes. For example,
rho
,pval
] = corr(___,Name,Value
)'Type','Kendall'
specifies computing Kendall's tau
correlation coefficient.
Examples
Find Correlation Between Two Matrices
Find the correlation between two matrices and compare it to the correlation between two column vectors.
Generate sample data.
rng('default')
X = randn(30,4);
Y = randn(30,4);
Introduce correlation between column two of the matrix X
and column four of the matrix Y
.
Y(:,4) = Y(:,4)+X(:,2);
Calculate the correlation between columns of X
and Y
.
[rho,pval] = corr(X,Y)
rho = 4×4
-0.1686 -0.0363 0.2278 0.3245
0.3022 0.0332 -0.0866 0.7653
-0.3632 -0.0987 -0.0200 -0.3693
-0.1365 -0.1804 0.0853 0.0279
pval = 4×4
0.3731 0.8489 0.2260 0.0802
0.1045 0.8619 0.6491 0.0000
0.0485 0.6039 0.9166 0.0446
0.4721 0.3400 0.6539 0.8837
As expected, the correlation coefficient between column two of X
and column four of Y
, rho(2,4)
, is the highest, and it represents a high positive correlation between the two columns. The corresponding p-value, pval(2,4)
, is zero to the four digits shown. Because the p-value is less than the significance level of 0.05
, it indicates rejection of the hypothesis that no correlation exists between the two columns.
Calculate the correlation between X
and Y
using corrcoef
.
[r,p] = corrcoef(X,Y)
r = 2×2
1.0000 -0.0329
-0.0329 1.0000
p = 2×2
1.0000 0.7213
0.7213 1.0000
The MATLAB® function corrcoef
, unlike the corr
function, converts the input matrices X
and Y
into column vectors, X(:)
and Y(:)
, before computing the correlation between them. Therefore, the introduction of correlation between column two of matrix X
and column four of matrix Y
no longer exists, because those two columns are in different sections of the converted column vectors.
The value of the off-diagonal elements of r
, which represents the correlation coefficient between X
and Y
, is low. This value indicates little to no correlation between X
and Y
. Likewise, the value of the off-diagonal elements of p
, which represents the p-value, is much higher than the significance level of 0.05
. This value indicates that not enough evidence exists to reject the hypothesis of no correlation between X
and Y
.
Test Alternative Hypotheses for Correlation
Test alternative hypotheses for positive, negative, and nonzero correlation between the columns of two matrices. Compare values of the correlation coefficient and p-value in each case.
Generate sample data.
rng('default')
X = randn(50,4);
Y = randn(50,4);
Introduce positive correlation between column one of the matrix X
and column four of the matrix Y
.
Y(:,4) = Y(:,4)+0.7*X(:,1);
Introduce negative correlation between column two of X
and column two of Y
.
Y(:,2) = Y(:,2)-2*X(:,2);
Test the alternative hypothesis that the correlation is greater than zero.
[rho,pval] = corr(X,Y,'Tail','right')
rho = 4×4
0.0627 -0.1438 -0.0035 0.7060
-0.1197 -0.8600 -0.0440 0.1984
-0.1119 0.2210 -0.3433 0.1070
-0.3526 -0.2224 0.1023 0.0374
pval = 4×4
0.3327 0.8405 0.5097 0.0000
0.7962 1.0000 0.6192 0.0836
0.7803 0.0615 0.9927 0.2298
0.9940 0.9397 0.2398 0.3982
As expected, the correlation coefficient between column one of X
and column four of Y
, rho(1,4)
, has the highest positive value, representing a high positive correlation between the two columns. The corresponding p-value, pval(1,4)
, is zero to the four digits shown, which is lower than the significance level of 0.05
. These results indicate rejection of the null hypothesis that no correlation exists between the two columns and lead to the conclusion that the correlation is greater than zero.
Test the alternative hypothesis that the correlation is less than zero.
[rho,pval] = corr(X,Y,'Tail','left')
rho = 4×4
0.0627 -0.1438 -0.0035 0.7060
-0.1197 -0.8600 -0.0440 0.1984
-0.1119 0.2210 -0.3433 0.1070
-0.3526 -0.2224 0.1023 0.0374
pval = 4×4
0.6673 0.1595 0.4903 1.0000
0.2038 0.0000 0.3808 0.9164
0.2197 0.9385 0.0073 0.7702
0.0060 0.0603 0.7602 0.6018
As expected, the correlation coefficient between column two of X
and column two of Y
, rho(2,2)
, has the negative number with the largest absolute value (-0.86
), representing a high negative correlation between the two columns. The corresponding p-value, pval(2,2)
, is zero to the four digits shown, which is lower than the significance level of 0.05
. Again, these results indicate rejection of the null hypothesis and lead to the conclusion that the correlation is less than zero.
Test the alternative hypothesis that the correlation is not zero.
[rho,pval] = corr(X,Y)
rho = 4×4
0.0627 -0.1438 -0.0035 0.7060
-0.1197 -0.8600 -0.0440 0.1984
-0.1119 0.2210 -0.3433 0.1070
-0.3526 -0.2224 0.1023 0.0374
pval = 4×4
0.6654 0.3190 0.9807 0.0000
0.4075 0.0000 0.7615 0.1673
0.4393 0.1231 0.0147 0.4595
0.0120 0.1206 0.4797 0.7964
The p-values, pval(1,4)
and pval(2,2)
, are both zero to the four digits shown. Because the p-values are lower than the significance level of 0.05
, the correlation coefficients rho(1,4)
and rho(2,2)
are significantly different from zero. Therefore, the null hypothesis is rejected; the correlation is not zero.
Input Arguments
X
— Input matrix
matrix
Input matrix, specified as an n-by-k
matrix. The rows of X
correspond to observations, and the
columns correspond to variables.
Example: X = randn(10,5)
Data Types: single
| double
Y
— Input matrix
matrix
Input matrix, specified as an
n-by-k2
matrix when X
is specified as an
n-by-k1
matrix. The rows of Y
correspond to observations, and the
columns correspond to variables.
Example: Y = randn(20,7)
Data Types: single
| double
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: corr(X,Y,'Type','Kendall','Rows','complete')
returns
Kendall's tau correlation coefficient using only the rows that contain no missing
values.
Type
— Type of correlation
'Pearson'
(default) | 'Kendall'
| 'Spearman'
Type of correlation, specified as the comma-separated pair consisting
of 'Type'
and one of these values.
Value | Description |
---|---|
'Pearson' | Pearson's Linear Correlation Coefficient |
'Kendall' | Kendall's Tau Coefficient |
'Spearman' | Spearman's Rho |
corr
computes the p-values for
Pearson's correlation using a Student's t
distribution for a transformation of the correlation. This correlation
is exact when X
and Y
come from a
normal distribution. corr
computes the
p-values for Kendall's tau and Spearman's rho
using either the exact permutation distributions (for small sample
sizes) or large-sample approximations.
Example: 'Type','Spearman'
Rows
— Rows to use in computation
'all'
(default) | 'complete'
| 'pairwise'
Rows to use in computation, specified as the comma-separated pair
consisting of 'Rows'
and one of these values.
Value | Description |
---|---|
'all' | Use all rows of the input regardless of missing
values (NaN s). |
'complete' | Use only rows of the input with no missing values. |
'pairwise' | Compute rho(i,j) using rows with
no missing values in column i or
j . |
The 'complete'
value, unlike the
'pairwise'
value, always produces a positive
definite or positive semidefinite rho
. Also, the
'complete'
value generally uses fewer
observations to estimate rho
when rows of the input
(X
or Y
) contain missing
values.
Example: 'Rows','pairwise'
Tail
— Alternative hypothesis
'both'
(default) | 'right'
| 'left'
Alternative hypothesis, specified as the comma-separated pair
consisting of 'Tail'
and one of the values in the
table. 'Tail'
specifies the alternative hypothesis
against which to compute p-values for testing the
hypothesis of no correlation.
Value | Description |
---|---|
'both' | Test the alternative hypothesis that the correlation
is not 0 . |
'right' | Test the alternative hypothesis that the correlation
is greater than 0 |
'left' | Test the alternative hypothesis that the correlation
is less than 0 . |
corr
computes the p-values for
the two-tailed test by doubling the more significant of the two
one-tailed p-values.
Example: 'Tail','left'
Output Arguments
rho
— Pairwise linear correlation coefficient
matrix
Pairwise linear correlation coefficient, returned as a matrix.
If you input only a matrix
X
,rho
is a symmetric k-by-k matrix, where k is the number of columns inX
. The entryrho(a,b)
is the pairwise linear correlation coefficient between column a and column b inX
.If you input matrices
X
andY
,rho
is a k1-by-k2 matrix, where k1 and k2 are the number of columns inX
andY
, respectively. The entryrho(a,b)
is the pairwise linear correlation coefficient between column a inX
and column b inY
.
pval
— p-values
matrix
p-values, returned as a matrix. Each element of
pval
is the p-value for the
corresponding element of rho
.
If pval(a,b)
is small (less than
0.05
), then the correlation
rho(a,b)
is significantly different from zero.
More About
Pearson's Linear Correlation Coefficient
Pearson's linear correlation coefficient is the most commonly used linear correlation coefficient. For column Xa in matrix X and column Yb in matrix Y, having means and , Pearson's linear correlation coefficient rho(a,b) is defined as:
where n is the length of each column.
Values of the correlation coefficient can range from –1
to
+1
. A value of –1
indicates perfect
negative correlation, while a value of +1
indicates perfect
positive correlation. A value of 0
indicates no correlation
between the columns.
Kendall's Tau Coefficient
Kendall's tau is based on counting the number of (i,j) pairs, for i<j, that are concordant—that is, for which and have the same sign. The equation for Kendall's tau includes an adjustment for ties in the normalizing constant and is often referred to as tau-b.
For column Xa in matrix X and column Yb in matrix Y, Kendall's tau coefficient is defined as:
where and
Values of the correlation coefficient can range from –1
to
+1
. A value of –1
indicates that one
column ranking is the reverse of the other, while a value of +1
indicates that the two rankings are the same. A value of 0
indicates no relationship between the columns.
Spearman's Rho
Spearman's rho is equivalent to Pearson's Linear Correlation Coefficient applied to the rankings of the columns Xa and Yb.
If all the ranks in each column are distinct, the equation simplifies to:
where d is the difference between the ranks of the two columns, and n is the length of each column.
Tips
The difference between corr(X,Y)
and the MATLAB® function corrcoef(X,Y)
is that
corrcoef(X,Y)
returns a matrix of correlation coefficients for
two column vectors X
and Y
. If
X
and Y
are not column vectors,
corrcoef(X,Y)
converts them to column vectors.
References
[1] Gibbons, J.D. Nonparametric Statistical Inference. 2nd ed. M. Dekker, 1985.
[2] Hollander, M., and D.A. Wolfe. Nonparametric Statistical Methods. Wiley, 1973.
[3] Kendall, M.G. Rank Correlation Methods. Griffin, 1970.
[4] Best, D.J., and D.E. Roberts. "Algorithm AS 89: The Upper Tail Probabilities of Spearman's rho." Applied Statistics, 24:377-379.
Extended Capabilities
Tall Arrays
Calculate with arrays that have more rows than fit in memory.
This function supports tall arrays for out-of-memory data with the limitation:
Only the 'Pearson'
type is supported.
For more information, see Tall Arrays for Out-of-Memory Data.
Thread-Based Environment
Run code in the background using MATLAB® backgroundPool
or accelerate code with Parallel Computing Toolbox™ ThreadPool
.
This function fully supports thread-based environments. For more information, see Run MATLAB Functions in Thread-Based Environment.
GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.
This function fully supports GPU arrays. For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
Version History
Introduced before R2006a
See Also
corrcoef
| partialcorr
| corrcov
| tiedrank
Open Example
You have a modified version of this example. Do you want to open this example with your edits?
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)