有很多种情况,spark sql语句中的某些字段是空的,想要知道sql中聚合函数对null的影响结果,下面做了几个测试用例。
image.png
测试用例
准备三个case class
case class ChildLog(id:String,name:String)case class WifeLog(id:String,name:String,child:ChildLog)case class UserLog(id:String,name:String,wife:WifeLog)
准备三组数据
val list1 = Array(UserLog("1", "lake", null), UserLog("2", "admin", null)) val list2 = Array(UserLog("1", "lake", WifeLog(null,null,null)), UserLog("2", "admin", WifeLog("3", "coco",null))) val list3 = Array(UserLog("1", "lake", WifeLog(null,null,ChildLog("hi","na"))), UserLog("2", "admin", WifeLog("3", "coco",null)))
初始化环境
val sparkConf = new SparkConf() .setMaster("local[1]") .setAppName("testNull") val spark = SparkSession .builder .config(sparkConf) .getOrCreate() val sc = spark.sparkContext
第一组
测试count对null值的结果
val list1 = Array(UserLog("1", "lake", null), UserLog("2", "admin", WifeLog("3", "coco",null))) val listRdd = sc.parallelize(list3) listRdd.toDS().createTempView("log") spark.sql("select count(wife) from log").show
结果(对空值会忽略)
+-----------+|count(wife)|+-----------+| 1|+-----------+
结论(如下两个语句相等)
spark.sql("select count(wife) from log").showspark.sql("select count(wife) from log where wife IS NOT NULL").show
如何统计null
spark.sql("select count(wife IS NULL) from log where").show
结果
+---------------------+|count((wife IS NULL))|+---------------------+| 1|+---------------------+
第二组
测试子类case
spark.sql("select count(wife.id) from log").show
结果
+----------------------+|count(wife.id AS `id`)|+----------------------+| 1|+----------------------+
结论(对子类case的空值也会忽略,对空的子类也会忽略)
第三组
测试多级子case的影响
spark.sql("select count(wife.child.id) from log").show
结果
+----------------------------+|count(wife.child.id AS `id`)|+----------------------------+| 1|+----------------------------+
结论(不论子级多深,只统计非空值的数据)
作者:dounine
链接:https://www.jianshu.com/p/38e85cbf0498
點(diǎn)擊查看更多內(nèi)容
為 TA 點(diǎn)贊
評(píng)論
評(píng)論
共同學(xué)習(xí),寫(xiě)下你的評(píng)論
評(píng)論加載中...
作者其他優(yōu)質(zhì)文章
正在加載中
感謝您的支持,我會(huì)繼續(xù)努力的~
掃碼打賞,你說(shuō)多少就多少
贊賞金額會(huì)直接到老師賬戶
支付方式
打開(kāi)微信掃一掃,即可進(jìn)行掃碼打賞哦