:: DeveloperApi :: Rating class for better code readability.
:: DeveloperApi :: Rating class for better code readability.
Reads an ML instance from the input path, a shortcut of read.load(path)
.
Reads an ML instance from the input path, a shortcut of read.load(path)
.
Implementing classes should override this to be Java-friendly.
Returns an MLReader
instance for this class.
Returns an MLReader
instance for this class.
:: DeveloperApi :: Implementation of the ALS algorithm.
:: DeveloperApi :: Implementation of the ALS algorithm.
This implementation of the ALS factorization algorithm partitions the two sets of factors among
Spark workers so as to reduce network communication by only sending one copy of each factor
vector to each Spark worker on each iteration, and only if needed. This is achieved by
precomputing some information about the ratings matrix to determine which users require which
item factors and vice versa. See the Scaladoc for InBlock
for a detailed explanation of how
the precomputation is done.
In addition, since each iteration of calculating the factor matrices depends on the known
ratings, which are spread across Spark partitions, a naive implementation would incur
significant network communication overhead between Spark workers, as the ratings RDD would be
repeatedly shuffled during each iteration. This implementation reduces that overhead by
performing the shuffling operation up front, precomputing each partition's ratings dependencies
and duplicating those values to the appropriate workers before starting iterations to solve for
the factor matrices. See the Scaladoc for OutBlock
for a detailed explanation of how the
precomputation is done.
Note that the term "rating block" is a bit of a misnomer, as the ratings are not partitioned by contiguous blocks from the ratings matrix but by a hash function on the rating's location in the matrix. If it helps you to visualize the partitions, it is easier to think of the term "block" as referring to a subset of an RDD containing the ratings rather than a contiguous submatrix of the ratings matrix.
:: DeveloperApi :: An implementation of ALS that supports generic ID types, specialized for Int and Long. This is exposed as a developer API for users who do need other ID types. But it is not recommended because it increases the shuffle size and memory requirement during training. For simplicity, users and items must have the same type. The number of distinct users/items should be smaller than 2 billion.