Technology Apr 28, 2026 · 2 min read

arrayJoin in ClickHouse: Why Your Rows Are Duplicating (and How to Control It)

When working with arrays in ClickHouse, arrayJoin feels straightforward. Until your query suddenly returns far more rows than expected. The Use Case Let’s say you have a table like this: CREATE TABLE events ( user_id UInt32, actions Array(String) ) ENGINE = MergeTree ORDER...

DE
DEV Community
by Mohamed Hussain S
arrayJoin in ClickHouse: Why Your Rows Are Duplicating (and How to Control It)

When working with arrays in ClickHouse, arrayJoin feels straightforward.

Until your query suddenly returns far more rows than expected.

The Use Case

Let’s say you have a table like this:

CREATE TABLE events (
    user_id UInt32,
    actions Array(String)
) ENGINE = MergeTree
ORDER BY user_id;

Example row:

user_id: 1
actions: ['click', 'scroll', 'purchase']

Now you want each action as a separate row.

The Tool: arrayJoin

SELECT user_id, arrayJoin(actions) AS action
FROM events;

Output:

1   click
1   scroll
1   purchase

So far, everything looks correct.

Where Things Go Wrong

Now let’s say you write:

SELECT user_id,
       arrayJoin(actions) AS action,
       arrayJoin(actions) AS action2
FROM events;

You might expect:

  • 3 rows

But you actually get:

  • 9 rows

Why This Happens

arrayJoin doesn’t just flatten arrays.

It expands rows.

Each element in the array creates a new row.

So when you use it multiple times:

  • First arrayJoin → expands rows
  • Second arrayJoin → expands again

Result:

3 elements → 3 × 3 = 9 rows

This is effectively a cartesian multiplication of rows.

The Hidden Impact

This becomes a real problem when:

  • Arrays are large
  • Multiple arrayJoins are used
  • You don’t expect row multiplication

Result:

  • Incorrect output
  • Sudden increase in row count
  • Slower queries

The Better Approach

1. Use a single arrayJoin when possible

SELECT user_id,
       arrayJoin(actions) AS action
FROM events;

2. Use ARRAY JOIN syntax (cleaner and explicit)

SELECT user_id, action
FROM events
ARRAY JOIN actions AS action;

3. Use arrayZip to avoid unintended multiplication

If you’re working with multiple arrays:

SELECT user_id,
       arrayJoin(arrayZip(actions, actions)) AS zipped
FROM events;

This ensures elements are paired instead of multiplied.

Why This Matters

arrayJoin is powerful-but easy to misuse.

If used without understanding:

  • Row count can explode
  • Queries become expensive
  • Results can be misleading

Real-World Use Cases

  • Event tracking pipelines
  • Flattening nested JSON
  • Working with semi-structured logs
  • Exploding arrays into rows for analysis

One Important Gotcha

Every arrayJoin multiplies rows.

If your result size looks unexpectedly large, this is one of the first things to check.

Final Thoughts

arrayJoin is one of the most useful tools in ClickHouse.

But its behavior is not always intuitive.

In many cases, the issue is not the data itself-but how the query expands it.

Understanding this early can save a lot of debugging time.

DE
Source

This article was originally published by DEV Community and written by Mohamed Hussain S.

Read original article on DEV Community
Back to Discover

Reading List