Skip to content

Conversation

@vigneshsiva11
Copy link

Which issue does this PR close?

Rationale for this change

ArrowBytesViewMap was previously accounting for the logical number of null bits when reporting memory usage. This under-reported memory consumption is because NullBufferBuilder may allocate more memory than is currently used.

Memory accounting in DataFusion is expected to reflect allocated memory rather than logical usage to ensure accurate memory tracking.

What changes are included in this PR?

  • Update ArrowBytesViewMap::size to use NullBufferBuilder::allocated_size instead of calculating size from the number of used null bits.

Are these changes tested?

  • Yes. Existing tests were run:
    • cargo test -p datafusion-physical-expr-common

Are there any user-facing changes?

  • No. This change only affects internal memory accounting and does not alter query behavior or public APIs.

Copilot AI review requested due to automatic review settings January 30, 2026 15:51
@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Jan 30, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR attempts to improve memory accounting for ArrowBytesViewMap by changing from calculating null buffer size based on logical length (self.nulls.len() / 8) to using the allocated size (self.nulls.allocated_size()). The intent is to account for allocated memory rather than just the used memory, which aligns with DataFusion's memory accounting goals.

Changes:

  • Modified ArrowBytesViewMap::size() method to use NullBufferBuilder::allocated_size() for calculating null buffer memory usage

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @vigneshsiva11 🙏

let in_progress_size = self.in_progress.capacity();
let completed_size: usize = self.completed.iter().map(|b| b.len()).sum();
let nulls_size = self.nulls.len() / 8;
let nulls_size = self.nulls.allocated_size() / 8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allocated size is in bytes (not bits) https://docs.rs/arrow/latest/arrow/array/struct.NullBufferBuilder.html#method.allocated_size

Return the allocated size of this builder, in bytes, useful for memory accounting.

So I don't think we should still be dividing by 8

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification and the link! I’ve updated the code to use allocated_size() directly without dividing by 8.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code still appears to divide by 8 🤔

Copy link
Contributor

@Dandandan Dandandan Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the comment in datafusion/physical-plan/src/aggregates/group_values/null_builder.rs:66, "NullBufferBuilder::allocated_size returns capacity in bits". Other places in the codebase that use NullBufferBuilder directly (such as MaybeNullBufferBuilder::allocated_size() at datafusion/physical-plan/src/aggregates/group_values/null_builder.rs:67

Copilot is confused by this comment elsewhere in the source code (it's there).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve memory accounting for ArrowBytesViewMap

3 participants