Compare schema of dataframe with schema of other dataframe

Question

I have schema from two dataset read from hdfs path and it is defined below: val df = spark.read.parquet("/path") df.printSchema() Answer Since your schema file seems like a CSV : use isSchemaMatching for further logic

Accepted Answer

Since your schema file seems like a CSV :// Read and convert into a MAP  val csvSchemaDf = spark.read.csv("/testschemafile")val schemaMap = csvSchema.rdd.map(x => (x(0).toString.trim,x(1).toString.trim)).collectAsMapvar isSchemaMatching = true//Iterate through the schema fields of your df and compare for( field <- df.schema.toList ){  if( !(schemaMap.contains(field.name) &&         field.dataType.toString.equals(schemaMap.get(field.name).get))){      //Mismatch       isSchemaMatching = false;  }}use isSchemaMatching for further logic

Advertisement

Answer