What Should I Use for Dot Product and Matrix Multiplication?: NumPy multiply vs. dot vs. matmul vs. @
When I first implemented gradient descent from scratch a few years ago, I was very confused which method to use for
dot product and matrix multiplications - np.multiply
or np.dot
or np.matmul
? And after a few years, it turns out that… I am
still confused! So, I
decided to investigate all the options in Python and NumPy
(*
, np.multiply
, np.dot
, np.matmul
, and @
), come up with the best approach to take, and document the findings here.
TLDL; Use np.dot
for dot product. For matrix multiplication, use @
for Python 3.5 or above, and np.matmul
for
earlier versions.
Table of contents
- What are dot product and matrix multiplications?
- What is available for NumPy arrays?
(1) element-wise multiplication: * and sum
(2) element-wise multiplication: np.multiply and sum
(3) dot product: np.dot
(4) matrix multiplication: np.matmul
(5) matrix multiplication: @ - So.. what’s with np.not vs. np.matmul (@)?
- Summary
- Reference
1. What are dot product and matrix multiplication?
If you are not familiar with dot product or matrix multiplication yet or if you need a quick recap, check out the previous blog post: What are dot product and matrix multiplication?
In short, the dot product is the sum of products of values in two same-sized vectors and the matrix multiplication is a matrix version of the dot product with two matrices. The output of the dot product is a scalar whereas that of the matrix multiplication is a matrix whose elements are the dot products of pairs of vectors in each matrix.
Dot product:
\[[a_1 \ a_2] \begin{bmatrix} b_1 \\ b_2 \end{bmatrix} =a_1b_1 + a_2b_2\]Matrix multiplication:
\[\begin{bmatrix} a_{11} \ \ a_{12} \\ a_{21} \ \ a_{22} \\ \end{bmatrix} \begin{bmatrix} b_{11} \ \ b_{12} \\ b_{21} \ \ b_{22} \\ \end{bmatrix} = \begin{bmatrix} a_{11}b_{11} + a_{12}b_{21} \ \ \ a_{11}b_{12} + a_{12}b_{22}\\ a_{21}b_{11} + a_{22}b_{21} \ \ \ a_{21}b_{12} + a_{22}b_{22}\\ \end{bmatrix}\]2. What’s available for NumPy arrays?
So, there are multiple options you can use to perform dot product or matrix multiplication:
- basic element-wise multiplication:
*
ornp.multiply
along withnp.sum
- dot product:
np.dot
- matrix multiplication:
np.matmul
,@
We will go through different scenarios depending on the dimensions of vectors/matrices and understand the pros and cons of each method. To run the code in the following sections, We first need to import numpy.
import numpy as np
(1) element-wise multiplication: * and sum
First, we can try the fundamental approach using element-wise multiplication based on the definition of dot product: multiply corresponding elements in two vectors and then sum all the output values. The downside of this approach is that you need separate operations for product and sum and it is slower than other methods we will discuss later.
Here is an example of dot product with two 1D arrays.
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
>>> a*b
array([ 4, 10, 18])
>>> sum(a*b)
32
Can we use the same *
and sum
operation for matrix multiplication? Let’s check out.
c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([1, 1, 1])
>>> c*d
array([[1, 2, 3],
[4, 5, 6]])
>>> sum(c*d)
array([5, 7, 9])
Wait, it looks different from what we would get from our own calculation below!
\[\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \begin{bmatrix} 1 \\ 1 \\ 1 \\ \end{bmatrix} = \begin{bmatrix} 1 \times 1 + 2 \times 1 + 3 \times 1 \\ 4 \times 1 + 5 \times 1 + 6 \times 1 \end{bmatrix} =\begin{bmatrix} 6 \\ 15 \end{bmatrix}\]So, it turns out that we need to be careful when we apply sum
after *
operation.
Let’s look at it step by step. Here is what happened at
c*d
. Each
row of 2D array $c$ is
considered as
an element of the matrix and it is
paired with the second array $d$
for element-wise multiplication.
And then, when we apply sum
, the Python’s default sum
function takes all the element in a NumPy array at once,
which
became
$1+2+ ..
.+ 5+6 = 21$. But what we want is to sum only elements in each row. So we need to find an alternative to sum
.
Here comes np
.sum
to rescue. When we pass the parameter axis=1
, it sums elements across columns in the same row. The default
is
axis=0
which
sums elements across rows within the same column, so we need to make sure we pass axis=1
parameter.
>>> np.sum(c*d, axis=1)
array([ 6, 15])
Yes! This is what we expected.
(2) element-wise multiplication: np.multiply and sum
Okay, then what about np.multiply
? What does it do and is it different from *
?
np.multiply
is basically the same as *
. It is a NumPy
’s version of element-wise
multiplication instead of Python’s native operator.
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
>>> c = np.multiply(a, b)
>>> c
array([ 4, 10, 18])
>> np.sum(c, axis=1)
array([ 6, 15])
(3) dot product: np.dot
Is there any option that we can avoid the additional line of np.sum
? Yes, np.dot
in NumPy! You
can use either np.dot(a, b)
or a.dot(b)
and
it takes care of both element multiplication and sum. Simple and easy.
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
>>> np.dot(a, b)
32
Great! Dot product in just one line of code. If the dimension of the array is 2D or higher, make sure the number of columns of the first array matches up with the number of rows in the second array.
a = np.array([[1, 2, 3]]) # shape (1, 3)
b = np.array([[4, 5, 6]]) # shape (1, 3)
>>> np.dot(a, b)
# ValueError: shapes (1,3) and (1,3) not aligned: 3 (dim 1) != 1 (dim 0)
To make the above example work, you need to transpose the second array so that the shapes are aligned: (1, 3) x (3, 1). Note that this will return (1, 1), which is a 2D array.
a = np.array([[1, 2, 3]]) # shape (1, 3)
b = np.array([[4, 5, 6]]) # shape (1, 3)
>>> np.dot(a, b.T)
array([[32]])
As a side note, if you transpose the second array, you will get a (3 x 3) array, which is the outer product instead of inner product (dot product). So, be make sure you transpose the right one.
Now let’s try a 2D x 2D example as well with the following example. Will it work even if it’s called dot
product?
c = np.array([[1, 2, 3], [4, 5, 6]]) # shape (2, 3)
d = np.array([[1], [1], [1]]) # shape (3, 1)
>>> np.dot(c, d)
array([[ 6],
[15]])
It works! Even if it is called dot
, which indicates that the inputs are 1D vectors and the output is a scalar by
its definition, it works for 2D or higher dimensional matrices as if it was a matrix multiplication.
So, should we use np.dot
for both dot product and matrix multiplication?
Technically yes but it is not recommended to use np.dot
for matrix multiplication because the name dot product
has a specific meaning and it can be confusing to readers, especially mathematicians! (Reference) Also, it is not recommended for high dimensional matrices (3D
or above) because np.dot
behaves different from
normal matrix multiplication. We will discuss this in the later of this post.
So, np.dot
works for both dot product and matrix multiplication but is recommended for dot product only.
(4) matrix multiplication: np.matmul
The next option is np.matmul
. It is designed for matrix multiplication and even the name comes from it
(MATrix MULtiplication). Although the name says matrix multiplication, it also works in 1D array and can do
dot product
just like
np.dot
.
# 1D array
a = np.array([1, 2, 3]) # shape (1, 3)
b = np.array([4, 5, 6]) # shape (1, 3)
>>> np.matmul(a, b)
32
# 2D array with values in 1 axis
a = np.array([[1, 2, 3]]) # shape (1, 3)
b = np.array([[4, 5, 6]]) # shape (1, 3)
>>> np.dot(a, b.T)
array([[32]])
# two 2D arrays
c = np.array([[1, 2, 3], [4, 5, 6]]) # shape (2, 3)
d = np.array([[1], [1], [1]]) # shape (3, 1)
>>> np.dot(c, d)
array([[ 6],
[15]])
Nice! So, this means both np.dot
and np.matmul
work perfectly for dot product and matrix multiplication. However,
as we said before, it is recommended to use np.dot
for dot product and np.matmul
for 2D or higher matrix
multiplication.
(5) matrix multiplication: @
Here comes our last but not least option, @
! @
, pronounced as [at], is a new Python operator that was introduced
since
Python 3.5,
whose name comes
from mATrices. It is basically the same as np.matmul
and designed to perform matrix multiplication. But why do
we need a new infix if we already have np.matmul
that works perfectly fine?
The major motivation for adding a new operator to stdlib was that the matrix multiplication is a so common operator that it deserves its own infix. For example, the operator //
is much more uncommon than matrix multiplication but still has its own infix. To learn more about the background of this addition, check out this PEP 465.
# 1D array
a = np.array([1, 2, 3]) # shape (1, 3)
b = np.array([4, 5, 6]) # shape (1, 3)
>>> a @ b
32
# 2D array with values in 1 axis
a = np.array([[1, 2, 3]]) # shape (1, 3)
b = np.array([[4, 5, 6]]) # shape (1, 3)
>>> a @ b.T
array([[32]])
# 2D arrays
c = np.array([[1, 2, 3], [4, 5, 6]]) # shape: (2, 3)
d = np.array([[1], [1], [1]]) # shape: (3, 1)
>>> c @ d
array([[ 6],
[15]])
So, @
works exactly same as np.matmul
. But which one should you use between np.matmul
and @
then? Although
it is your preference, @
looks cleaner than np.matmul
in code. Let us see a case where have three matrices $x,
y, z$ to perform a matrix
multiplication.
# `np.matmul` version
np.matmul(np.matmul(x, y), z)
# `@` version
x @ y @ z
As you can see, @
is much cleaner and more readable. However, as it is available only Python 3.5+, you have to use
np
.matmul
if you use an earlier Python version.
3. So.. what’s with np.dot vs. np.matmul (@)?
In the above section, I mentioned that np.dot
is not recommended for high dimensional arrays. What do I mean by
that?
There was an interesting question in stackoverflow about different behaviors between np.dot
and @
. Let’s looks at this.
# define input arrays
a = np.random.rand(3,2,2) # 2 rows, 2 columns, in 3 layers
b = np.random.rand(3,2,2) # 2 rows, 2 columns, in 3 layers
# perform matrix multiplication
c = np.dot(a, b)
d = a @ b # Python 3.5+
>>> c.shape # np.dot
(3, 2, 3, 2)
>>> d.shape # @
(3, 2, 2)
With the same inputs, we have completely different outputs - 4D array for np.dot
and
3D
array for @
.
What happened? This is
because of
the way np.dot
and @
are designed. Based on the their definition:
=======================
For matmul
:
If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast accordingly.
For np.dot
:
For 2-D arrays it is equivalent to matrix multiplication, and for 1-D arrays to inner product of vectors (without complex conjugation). For N dimensions it is a sum product over the last axis of a and the second-to-last of b
If a is an N-D array and b is an M-D array (where M>=2), it is a sum product over the last axis of a and the second-to-last axis of b:
$ dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])$
=======================
Long story short, in the normal matrix multiplication situation where we want to treat each stack of matrices in the
last two indexes, we should use matmul
.
4. Summary
*
==np.multiply
!=np.dot
!=np.matmul
==@
*
andnp.multiply
neednp.sum
to perform dot product. Not recommended for dot product or matrix multiplication.np.dot
works for dot product and matrix multiplication. However, recommended to avoid using it for matrix multiplication due to the name.np.matmul
and@
are the same thing, designed to perform matrix multiplication.@
is added to Python 3.5+ to give matrix multiplication its own infix.np.dot
andnp.matmul
generally behave similarly other than 2 exceptions: 1)matmul
doesn’t allow multiplication by scalar, 2) the calculation is done differently for N>2 dimesion. Check the documentation which one you intend to use.
One line summary:
- For dot product, use
np.dot
. For matrix multiplication, use@
for Python 3.5 or above, andnp.matmul
for earlier Python versions.