Skip to content

flattenSchema(df) with Spark 4.0 #847

@pinakigit

Description

@pinakigit

Describe the bug

Currently we are using val dfFlat = SparkUtils.flattenSchema(df) to flatten files having occurs clause. Everything is working fine in Spark 3.5.2.. But when we upgraded to Spark 4 it's throwing the below error.

parkArrayIndexOutOfBoundsException:%20%5BINVALID_ARRAY_INDEX%5D%20The%20index%2010%20is%20out%20of%20bounds.%20The%20array%20has%202%20elements.%20Use%20the%20SQL%20function%20get()%20to%20tolerate%20accessing%20element%20at%20invalid%20index%20and%20return%20NULL%20instead.SQLSTATE%3A%2022003%20==%20SQL%20(line%201,%20position%201)%20==%20'BKP_SEG_AREA'%5B10%5D.'BKP_SEG_ID'%20%5E%5E%5E%5E%5E%5E%5E%5E%5E%5E%5E%5E%5E%5E%5E%5E%5E%5E%5E%5E%5E%5E%5E

When we tried
tolerancespark.conf.set("spark.sql.ansi.strictIndexOperator", "false")

it throws error this is not avaialble above Spark 3.4.

Code snippet that caused the issue

  val dfFlat = SparkUtils.flattenSchema(df)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions