NumPy: Joining Arrays Together

NumPy provides several functions that allow us to join arrays, including stack, concatenate, vstack, hstack, dstack, column_stack, and block. In this article, we will:

describe these joining array functions from the perspective of shape, ndim and axis
give more formalize descriptions and more examples
compare them to each other

1. np.stack

numpy.stack(arrays, axis=0, out=None): Join a sequence of arrays along a new axis.

So the number of dimensions will increase 1. And all the input arrays must have the same shape. With input arrays a_0, a_1, ... a_(n-1), each array has the same shape (s_0, s_1, ... s_(m-1)) and the same number of dimensions m, then stacked_array = np.stack((a_0, a_1, ... a_(n-1)), axis=d)) will have the shape

for 0 <= d <= m-1,

if d == 0: shape = (n, s_0, ..., s_(m-1)), ndim = m + 1

if d == i(0<i<m-1): shape = (s_0, ..., s_(i-1), n, s_(i+1), ..., s_(m-1)), ndim = m + 1

if d == m-1: shape = (s_0, s_1, ..., n, s_(m-1)), ndim = m + 1

For example:

>>> shape = (2, 3, 4, 5)
>>> a1 = np.arange(120).reshape(shape)
>>> a2 = np.arange(120, 120*2).reshape(shape)
>>> a3 = np.arange(120*2, 120*3).reshape(shape)
>>> a4 = np.arange(120*3, 120*4).reshape(shape)
>>> a5 = np.arange(120*4, 120*5).reshape(shape)
>>> a6 = np.arange(120*5, 120*6).reshape(shape)
>>> arrays = [a1, a2, a3, a4, a5, a6]
>>> r0 = np.stack(arrays, axis=0)
>>> r1 = np.stack(arrays, axis=1)
>>> r2 = np.stack(arrays, axis=2)
>>> r3 = np.stack(arrays, axis=3)
>>> r4 = np.stack(arrays, axis=4)
>>> r0.shape, r1.shape, r2.shape, r3.shape, r4.shape
((6, 2, 3, 4, 5),
 (2, 6, 3, 4, 5),
 (2, 3, 6, 4, 5),
 (2, 3, 4, 6, 5),
 (2, 3, 4, 5, 6))

2. np.concatenate

np.concatenate(arrays, axis=0, out=None): Join a sequence of arrays along an existing axis.

So the number of dimension will not increase. It requires all arrays must have the same shape, except in the dimension corresponding to axis. If axis=None, the number of dimension will be 1, arrays will be flattened.

If we want to execute concatenated_array=np.concatenate((a_0, ..., a_(n-1)), axis=d, out=None), where d is an integer, the input arrays a_0, ..., a_(n-1) must be

for 0 <= d <= m-1 and for all 0 <= j <= n-1: 

if d = 0, a_j.shape = (t_j, s_1, ..., s_(m-1))
if d = i(0<i<m-1), a_j.shape = (s_0, ..., s_(i-1), t_j, s_(i+1), ..., s_(m-1))
if d = m-1, a_j.shape = (s_0, ..., s_(m-2), t_j)

then the concatenated_array will remain the same shape except the corresponding axis, which will be the sum of all the arrays, that is

for 0 <= d <= m-1, 

if d = 0, concatenated_array.shape = (t, s_1, ..., s_(m-1))
if d = i(0<i<m-1), concatenated_array.shape = (s_0, ..., s_(i-1), t, s_(i+1), ..., s_(m-1))
if d = m-1, concatenated_array.shape = (s_0, ..., s_(m-2), t)

where t = sum(t_0, ..., t_(n-1))

For example:

>>> a0 = np.arange(120).reshape(2, 3, 4, 5)
>>> a1 = np.arange(120, 150).reshape(2, 3, 1, 5)
>>> a2 = np.arange(150, 330).reshape(2, 3, 6, 5)
>>> concatenate_array = np.concatenate((a0, a1, a2), axis=2)
>>> concatenate_array.shape, concatenate_array.ndim
((2, 3, 11, 5), 4) # 11 = 4+1+6

3. np.vstack

np.vstack(tup): Stack arrays in sequence vertically(row wise). This is equivalent to concatenation along the first axis after 1-D arrays of shape(N, ) have been reshaped to (1, N).

np.vstack requires the input arrays have the same shape along all but the first axis. 1-D arrays must have the same length. Therefore, np.vstack(arrays) == np.concatenate(arrays, axis=0).

For example:

>>> a0 = np.arange(120).reshape(4, 2, 3, 5)
>>> a1 = np.arange(120, 150).reshape(1, 2, 3, 5)
>>> a2 = np.arange(150, 330).reshape(6, 2, 3, 5)
>>> r0 = np.vstack((a0, a1, a2))
>>> r1 = np.concatenate((a0, a1, a2), axis=0)
>>> np.allclose(r0, r1)
True

4. np.hstack

np.hstack(tup): Stack arrays in sequence horizontally(column wise). This is equivalent to concatenation along the second axis, except for 1-D arrays where it concatenates along the first axis.

Therefore,

1-D arrays:
np.hstack(arrays) == np.concatenate(arrays, axis=0) == np.concatenate(arrays, axis=None)

arrays >= 2-D:
np.hstack(arrays) == np.concatenate(arrays, axis=1)

For example,

# 1-D arrays
>>> a = np.array([1,2,3])
>>> b = np.array([2,3,4])
>>> hstacked = np.hstack((a,b))
>>> concatenated_0 = np.concatenate((a, b), axis=0)
>>> concatenated_1 = np.concatenate((a, b), axis=None)
>>> np.allclose(hstacked, concatenated_0)
True
>>> np.allclose(hstacked, concatenated_1)
True

# arrays >= 2-D
>>> a0 = np.arange(120).reshape(2, 4, 3, 5)
>>> a1 = np.arange(120, 150).reshape(2, 1, 3, 5)
>>> a2 = np.arange(150, 330).reshape(2, 6, 3, 5)
>>> r0 = np.hstack((a0, a1, a2))
>>> r1 = np.concatenate((a0, a1, a2), axis=1)
>>> np.allclose(r0, r1)
True

5. np.dstack

np.dstack(tup): Stack arrays in sequence depth wise(along third axis).

This is equivalent to concatenation along the third axis. For arrays more than 2-D, np.dstack(arrays) == np.concatenate(arrays, axis=2). 1-D arrays with shape(N,) will be reshaped to (1,N,1), and 2-D arrays with shape(M, N) will be reshaped to (M, N, 1). After reshape, 1-D arrays and 2-D arrays have at least 3 dimensions, axis=2 will be okay.

Therefore,

1-D arrays with shape (N, ):
np.dstack(arrays) == np.concatenate(arrays.reshape(1, N, 1), axis=2)

2-D arrays with shape (M, N):
np.dstack(arrays) == np.concatenate(arrays.reshape(M, N, 1), axis=2)

>= 3-D arrays:
np.dstack(arrays) == np.concatenate(arrays, axis=2)

For example:

1-D arrays
>>> a0 = np.arange(6)
>>> a1 = np.arange(6, 12)
>>> a2 = np.arange(12, 18)
>>> r0 = np.concatenate((a0.reshape(1, 6, 1), a1.reshape(1, 6, 1), a2.reshape(1, 6, 1)), axis=2)
>>> r1 = np.dstack((a0, a1, a2))
>>> np.allclose(r0, r1)
True

2-D arrays
>>> a0 = np.arange(6).reshape(2, 3)
>>> a1 = np.arange(6, 12).reshape(2, 3)
>>> a2 = np.arange(12, 18).reshape(2, 3)
>>> r0 = np.concatenate((a0.reshape(2, 3, 1), a1.reshape(2, 3, 1), a2.reshape(2, 3, 1)), axis=2)
>>> r1 = np.dstack((a0, a1, a2))
>>> np.allclose(r0, r1)
True

>= 3-D arrays
>>> a0 = np.arange(120).reshape(2, 3, 4, 5)
>>> a1 = np.arange(120, 150).reshape(2, 3, 1, 5)
>>> a2 = np.arange(150, 330).reshape(2, 3, 6, 5)
>>> r0 = np.concatenate((a0, a1, a2), axis=2)
>>> r1 = np.dstack((a0, a1, a2))
>>> np.allclose(r0, r1)

6. np.column_stack

np.column_stack(tup) equals to concatenate arrays along the second axis, 1-D arrays with shape(N, ) will be reshape to (N, 1).

Therefore,

1-D arrays:
np.column_stack(arrays) == np.concatenate(arrays.reshape(-1, 1), axis=1)

>= 2-D arrays:
np.column_stack(arrays) == np.concatenate(arrays, axis=1) == np.hstack(arrays)

For example:

# 1-D arrays
>>> a = np.array([1,2,3])
>>> b = np.array([2,3,4])
>>> r0 = column_stack((a, b))
>>> r1 = np.concatenate((a.reshape(-1, 1), b.reshape(-1, 1)), axis=1)
>>> np.allclose(r0, r1)
>>> True

# arrays >= 2-D
>>> a0 = np.arange(120).reshape(2, 4, 3, 5)
>>> a1 = np.arange(120, 150).reshape(2, 1, 3, 5)
>>> a2 = np.arange(150, 330).reshape(2, 6, 3, 5)
>>> r0 = np.column_stack((a0, a1, a2))
>>> r1 = np.concatenate((a0, a1, a2), axis=1)
>>> r2 = np.hstack((a0, a1, a2))
>>> np.allclose(r0, r1), np.allclose(r0, r1)
(True, True)

7. np.block

np.block(arrays): Assemble an ndarray from nested lists of blocks.

Blocks in the innermost lists are concatenated along the last dimension(-1), then these are concatenated along the second-last dimension(-2), and so on until the outermost list is reached.

Therefore, np.block(arrs) will execute r_0_i = concatenate((innermost_lists_i), axis=-1) first, and then execute r_1_j = concatenate([..., r_0_i...,], axis=-2), and so on until the outermost list is reached.

For example:

>>> A11 = np.ones((2, 3, 1))
>>> A12 = np.zeros((2, 3, 5))
>>> A21 = np.ones((2, 3, 4))
>>> A22 = np.zeros((2, 3, 2))

>>> B11 = np.zeros((1, 3, 2))
>>> B12 = np.ones((1, 3, 4))
>>> B21 = np.zeros((1, 3, 1))
>>> B22 = np.ones((1, 3, 5))

# A11, A12 required to be sampe shape except the last axis
>>> A1 = np.concatenate((A11, A12), axis=-1)
>>> A2 = np.concatenate((A21, A22), axis=-1)
>>> A1.shape, A2.shape
((2, 3, 6), (2, 3, 6))

# A1, A2 required to be sampe shape except the last second axis
>>> A_con = np.concatenate((A1, A2), axis=-2)
>>> A_block = np.block([
    [[A11, A12], [A21, A22]]
])
>>> np.allclose(A_con, A_block)
True
>>> A_block.shape
(2, 6, 6)

>>> B1 = np.concatenate((B11, B12), axis=-1)
>>> B2 = np.concatenate((B21, B22), axis=-1)
>>> B_con = np.concatenate((B1, B2), axis=-2)
>>> B_block = np.block([
    [[B11, B12], [B21, B22]]
])

# A_con, B_con required to be same shape except last third axis
>>> result_con = np.concatenate((A_con, B_con), axis=-3)
>>> result_block = np.block([
    [[A11, A12], [A21, A22]],
    [[B11, B12], [B21, B22]],
])

>>> np.allclose(result_con, result_block)
True
>>> result_block.shape
(3, 6, 6)

8. summary

The different between stack and concatenate is easy to see. The main differences are including the input arrays’ shape requirement and the returned array’s dimension number.

Table1: Compare Stack With Concatenate

Table1	stack	concatenate
arrays’ shape requirement	exactly same shape	same shape except the corresponding axis
axis values	int	int or None
ndim	+1	unchanged or = 1

The main differences between vstack, hstack, dstack and column_stack are and the input arrays’ shape requirement and the concatenate axis. They all can be implemented through concatenate. Table2 describes how to implement them through concatenate.

Table2: Implement Through Concatenate

Table2	vstack	hstack	dstack	column_stack	block
along axis for >= 2-D arrays	0	1	2	1	from last to first
along axis for 1-D arrays	0	0 or None	2	1	None
reshape required for 2-D arrays	No	No	Yes	No	No
reshape required for 1-D arrays	No	No	Yes	Yes	No