-
Notifications
You must be signed in to change notification settings - Fork 1.9k
perf: Optimize scalar path for chr function #20073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
martin-g
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
It would be good to add some SLT tests though.
There are only array based unit tests and there are no SLT tests.
The Spark ones are empty - https://github.com/apache/datafusion/blob/f0de02fd664afcc4aad61fd8d13503533ed1e8d5/datafusion/sqllogictest/test_files/spark/string/chr.slt
| let return_type = args.return_field.data_type(); | ||
| let [arg] = take_function_args(self.name(), args.args)?; | ||
|
|
||
| match arg { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
match arg {
ColumnarValue::Scalar(ScalarValue::Int64(Some(code_point))) => {
if let Ok(u) = u32::try_from(code_point)
&& let Some(c) = core::char::from_u32(u)
{
Ok(ColumnarValue::Scalar(ScalarValue::Utf8(Some(
c.to_string(),
))))
} else {
exec_err!("invalid Unicode scalar value: {code_point}")
}
}
ColumnarValue::Scalar(ScalarValue::Int64(None)) => {
Ok(ColumnarValue::Scalar(ScalarValue::Utf8(None)))
}
ColumnarValue::Array(array) => {
Ok(ColumnarValue::Array(chr(&[array])?))
}
_ => internal_err!("..."),
}- Easier to fold into match like this since we accept only one datatype; also unnecessary to double check the array datatype
- Also worth looking into refactoring
chr()function; no reason to pass in a slice when we can just pass the array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Thank you for the feedback. |


Which issue does this PR close?
Rationale for this change
chrcurrently routes scalar inputs throughmake_scalar_function(chr, vec![]), which performs a scalar → size-1 array → scalar roundtrip. This adds unnecessary overhead for constant folding / scalar evaluation.This PR adds a match-based scalar fast path and removes reliance on
make_scalar_functionforchr, while keeping the array behavior unchanged.What changes are included in this PR?
ChrFunc::invoke_with_argsto:ColumnarValue::Scalar(Int64)directly (scalar fast path)ColumnarValue::Array(Int64Array)using the existing conversion logicbenches/chr.rs(chr/scalar) outside any size loopchr/scalarAre these changes tested?
Yes
Are there any user-facing changes?
No