feat: Cast numeric (non int) to timestamp#3559
feat: Cast numeric (non int) to timestamp#3559coderfender wants to merge 8 commits intoapache:mainfrom
Conversation
41393a5 to
5aa49b0
Compare
5aa49b0 to
88e0c83
Compare
| * Uses except (difference) to find differences without using collect() | ||
| * Checks cometDF and sparkDF including schemas | ||
| */ | ||
| protected def assertDataFrameEquals( |
There was a problem hiding this comment.
In my recent PR , I had to cast back the generated I had to resort to this route because we have no native comet support from Timestamp -> Float causing operator / native comet check assertion failures . I tried casting back to String and Long either the cast is incompatible or unsupported on native side
bd49514 to
4d3a6a3
Compare
|
@andygrove I moved the 'assertDataFrameEquals' inside cast test to leverage existing Try and ANSI checks . This should help us gain confidence with native code and perhaps merge it |
|
Test failure: |
|
Thank you @andygrove There seems to be failure per spark version after I fixed the error message handling. I am making changes to fix tests with the right error message |
67d6e37 to
0546da6
Compare
0546da6 to
3f3dd19
Compare
|
it seems like there is some regression between Spark's cast from Float / Double to Timestamp in terms handling extreme values . Investigating into this |
|
Only spark 4 test is failing and it seems like we might not be using right exception parsing |
Would you like to wait forhttps://github.com//pull/3580 ? |
|
@parthchandra sure ! |
|
@andygrove , marking it as a draft to not accidentally merge until @parthchandra 's PR is reviewed |
|
@coderfender I've marked this as ready for review if you could rebase and resolve the ci failures. |
|
Sure thank you |
|
It seems like the errors are transient @parthchandra . Can you please rerun failed actions whenever you get a chance |
7bbe49b to
e593089
Compare
| let value_256 = i256::from_i128(value); | ||
| let micros_256 = value_256 * i256::from_i128(MICROS_PER_SECOND as i128); | ||
| let ts = micros_256 / i256::from_i128(scale_factor); | ||
| builder.append_value(ts.as_i128() as i64); |
There was a problem hiding this comment.
This doesn't look right. Casting down from i256 to i128 and then to i64 will truncate too many bits silently. Should probably check for overflow here before the cast.
There was a problem hiding this comment.
You'll probably need to pass eval_mode here to check whether to return null or throw error on overflow. Or you could just restrict this to legacy mode (probably easier).
There was a problem hiding this comment.
@parthchandra , thank you for the comment. Spark doesn't support Eval Mode for Decimal and Int type (while it checks overflow for float and double types) to timestamp casts and rather produces incorrect values
Refer : https://github.com/apache/spark/blob/972897433082b1a7136b877b4fa37970961169d0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L791
| } else { | ||
| // Path 2: Multiply then check overflow - error says BIGINT | ||
| let micros = val * MICROS_PER_SECOND as f64; | ||
| if micros.floor() <= i64::MAX as f64 && micros.ceil() >= i64::MIN as f64 { |
There was a problem hiding this comment.
There may be a boundary condition issue here but I am not sure if there is a better way.
i64::MAX as f64 is actually greater than i64::MAX so this check here has gap where we might get some incorrect results?
| builder.append_null(); | ||
| } else { | ||
| let value = arr.value(i); | ||
| // Note: spark's big decimal |
Which issue does this PR close?
Closes ##3560 .
Rationale for this change
Adding support for further more native cast support.
This should more or less close out the cast matrix in terms of support (barring other not planned casts such as Timestamp -> Int etc)
What changes are included in this PR?
How are these changes tested?
CometCastSuite.scalaand added new ones (along with benchmarking scripts) on the rust side to test cast at all Eval Modes