Skip to content

Conversation

@karuppuchamysuresh
Copy link

Which issue does this PR close?

Rationale for this change

The documentation in data_types.md was outdated and showed Utf8 as the default mapping for character types (CHAR, VARCHAR, TEXT, STRING), but the current implementation defaults to Utf8View. This caused confusion for users reading the
documentation as it didn't match the actual behavior.

Additionally, the "Supported Arrow Types" section at the end was redundant since arrow_typeof now supports all Arrow types, making the comprehensive list unnecessary.

What changes are included in this PR?

  1. Updated Character Types table: Changed the Arrow DataType column from Utf8 to Utf8View for CHAR, VARCHAR, TEXT, and STRING types
  2. Added configuration note: Documented the datafusion.sql_parser.map_string_types_to_utf8view setting that allows users to switch back to Utf8 if needed
  3. Removed outdated section: Deleted the "Supported Arrow Types" section (39 lines) as it's no longer necessary

Are these changes tested?

This is a documentation-only change. The documentation accurately reflects the current behavior of DataFusion:

  • The default mapping to Utf8View is the current implementation behavior
  • The datafusion.sql_parser.map_string_types_to_utf8view configuration option exists and works as documented

Are there any user-facing changes?

Yes, documentation changes only. Users will now see accurate information about:

  • The correct default Arrow type mappings for character types
  • How to configure the string type mapping behavior if they need the old Utf8 behavior

- Update Character Types section to show Utf8View as default mapping
- Add note about map_string_types_to_utf8view configuration option
- Remove "Supported Arrow Types" section as arrow_typeof now works with all Arrow types

Fixes apache#18314

Co-Authored-By: Claude (claude-sonnet-4.5) <noreply@anthropic.com>
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jan 30, 2026
Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @karuppuchamysuresh

Would you also be able to go through the page to doublecheck the other parts are accurate? e.g. for decimal I believe we can have type Decimal256 if the precision is high enough, so worth amending that

@karuppuchamysuresh karuppuchamysuresh force-pushed the fix/update-data-types-doc-18314 branch from b27cf1d to 7a45244 Compare January 30, 2026 15:06
@karuppuchamysuresh
Copy link
Author

Thanks for the review, @Jefffrey! I've addressed your feedback about Decimal256 and reviewed the entire page for accuracy.

Changes Made

Added explicit documentation for Decimal256 in the Numeric Types table:

I've split the DECIMAL mapping into two separate table rows to make it clear that both Decimal128 and Decimal256 are supported:

| DECIMAL(precision, scale) where precision ≤ 38 | Decimal128(precision, scale) |
| DECIMAL(precision, scale) where precision > 38 | Decimal256(precision, scale) |

The maximum supported precision for DECIMAL types is 76.

This is based on the implementation in datafusion/sql/src/utils.rs:310-316, which uses:

  • Decimal128 for precision ≤ 38 (DECIMAL128_MAX_PRECISION)
  • Decimal256 for precision 39-76 (DECIMAL256_MAX_PRECISION)

The split into two rows makes it immediately visible to users that both types are supported, rather than requiring them to read a note below the table.

Verification of Other Type Mappings

I also verified the accuracy of all other type mappings in the document:

✅ Numeric Types: All integer and float mappings are correct
✅ Date/Time Types: Verified against datafusion/sql/src/planner.rs:

  • DATE → Date32 (line 711)
  • TIME → Time64(Nanosecond) (lines 712-721)
  • TIMESTAMP → Timestamp(Nanosecond, None) (lines 688-710)
  • INTERVAL → Interval(MonthDayNano) (line 738)

✅ Binary Types: BYTEA → Binary (line 733)
✅ Boolean Types: BOOLEAN → Boolean (line 637)
✅ Character Types: Already updated in this PR to Utf8View

All mappings now accurately reflect the current implementation.

Adjusted table column alignment to match prettier@2.7.1 formatting
requirements. This fixes the CI formatting check failure.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update data_types.md page

2 participants