Skip to content

Conversation

@kumarUjjawal
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

chr currently routes scalar inputs through make_scalar_function(chr, vec![]), which performs a scalar → size-1 array → scalar roundtrip. This adds unnecessary overhead for constant folding / scalar evaluation.

This PR adds a match-based scalar fast path and removes reliance on make_scalar_function for chr, while keeping the array behavior unchanged.

What changes are included in this PR?

  • Refactor ChrFunc::invoke_with_args to:
  • Handle ColumnarValue::Scalar(Int64) directly (scalar fast path)
  • Handle ColumnarValue::Array(Int64Array) using the existing conversion logic
  • Add scalar benchmark to benches/chr.rs (chr/scalar) outside any size loop
Type Before After Speedup
chr/scalar 342.05 ns 87.339 ns 3.92x

Are these changes tested?

Yes

Are there any user-facing changes?

No

@github-actions github-actions bot added the functions Changes to functions implementation label Jan 30, 2026
@kumarUjjawal
Copy link
Contributor Author

image

@kumarUjjawal
Copy link
Contributor Author

image

Getting this in CI

Copy link
Member

@martin-g martin-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

It would be good to add some SLT tests though.
There are only array based unit tests and there are no SLT tests.
The Spark ones are empty - https://github.com/apache/datafusion/blob/f0de02fd664afcc4aad61fd8d13503533ed1e8d5/datafusion/sqllogictest/test_files/spark/string/chr.slt

let return_type = args.return_field.data_type();
let [arg] = take_function_args(self.name(), args.args)?;

match arg {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

match arg {
    ColumnarValue::Scalar(ScalarValue::Int64(Some(code_point))) => {
        if let Ok(u) = u32::try_from(code_point)
            && let Some(c) = core::char::from_u32(u)
        {
            Ok(ColumnarValue::Scalar(ScalarValue::Utf8(Some(
                   c.to_string(),
                ))))
            } else {
                exec_err!("invalid Unicode scalar value: {code_point}")
            }
    }
    ColumnarValue::Scalar(ScalarValue::Int64(None)) => {
        Ok(ColumnarValue::Scalar(ScalarValue::Utf8(None)))
    }
    ColumnarValue::Array(array) => {
        Ok(ColumnarValue::Array(chr(&[array])?))
    }
    _ => internal_err!("..."),
}
  • Easier to fold into match like this since we accept only one datatype; also unnecessary to double check the array datatype
  • Also worth looking into refactoring chr() function; no reason to pass in a slice when we can just pass the array

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@kumarUjjawal
Copy link
Contributor Author

LGTM!

It would be good to add some SLT tests though. There are only array based unit tests and there are no SLT tests. The Spark ones are empty - https://github.com/apache/datafusion/blob/f0de02fd664afcc4aad61fd8d13503533ed1e8d5/datafusion/sqllogictest/test_files/spark/string/chr.slt

Thank you for the feedback.

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jan 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants