有个朋友给了这样一段代码:
import org.apache.spark.sql.SparkSessionimport org.apache.spark.ml.linalg.{Vector, DenseVector}object HRDT { def main(args : Array[String]) { case class HRMeta(features:Vector, label:String) val spark = SparkSession.builder()... import spark.implicits._ val data = spark.sparkContext.textFile("...") .map(l=>{ ... new HRMeta(new DenseVector(Array(1.0,2.0)), "0") }).toDF() }}
编译的时候会报错:
[error] /data1/weibo_recmd/story/ctr_predict/code/HRDT.scala:18:12: value toDF is not a member of org.apache.spark.rdd.RDD[HRMeta][error] possible cause: maybe a semicolon is missing before `value toDF'?[error] }).toDF()[error] ^[error] one error found[error] (Compile / compileIncremental) Compilation failed[error] Total time: 12 s, completed Feb 9, 2018 1:50:13 PM
但如果把 case class 定义放到main方法外面,编译就没问题。
我猜测和DataSetHolder.toDF时候在scala里面对泛型的使用有关系,但没仔细调查验证。在此暂存一下