MLib is a machine learning library built on top of Spark.from pyspalk.mllib.clustering import KMeans KMeans(rdd) where you pass the MLib a PySpark RDD