-
Notifications
You must be signed in to change notification settings - Fork 1.9k
perf: optimise right for byte access and StringView #20069
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
cc @Jefffrey |
|
|
||
| /// Calculate the byte length of the substring of last `n` chars from string `string` | ||
| /// (or all but first `|n|` chars if n is negative) | ||
| fn right_byte_length(string: &str, n: i64) -> usize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked too closely, but I feel we can deduplicate right + left implementation code as the main difference is this byte length function? In that it flips which side it looks from?
| }, | ||
| (Some(string), Some(n)) => { | ||
| let byte_length = right_byte_length(string, n); | ||
| // println!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented code accidentally added here
| args: args.clone(), | ||
| arg_fields: arg_fields.clone(), | ||
| number_rows: size, | ||
| return_field: Field::new("f", DataType::Utf8, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this use Utf8View when is_string_view == true ?
| Ordering::Equal => string.len(), | ||
| Ordering::Greater => string | ||
| .char_indices() | ||
| .nth_back(n.unsigned_abs() as usize - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This may truncate on 32-bit machines
Which issue does this PR close?
rightfor byte access and StringView #20068.Rationale for this change
Similar to issue #19749 and the optimisation of
leftin #19980, it's worth doing the same forrightWhat changes are included in this PR?
Improve efficiency of the function by making fewer memory allocations and going directly to bytes, based on char boundaries
Provide a specialisation for StringView with buffer zero-copy
Use
arrow_array::buffer::make_viewfor low-level view manipulation (we still need to know about a magic constant 12 for a buffer layout)Benchmark - up to 90% performance improvement
Are these changes tested?
Existing unit tests for
rightAdded more unit tests
Added bench similar to
right.rsExisting SLTs pass
Are there any user-facing changes?
No