pyspark.sql.DataFrame.hint#

DataFrame.hint(name, *parameters)[source]#

Specifies some hint on the current DataFrame.

New in version 2.2.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
namestr

A name of the hint.

parametersstr, list, float or int

Optional parameters.

Returns
DataFrame

Hinted DataFrame

Examples

>>> df = spark.createDataFrame([(2, "Alice"), (5, "Bob")], schema=["age", "name"])
>>> df2 = spark.createDataFrame([Row(height=80, name="Tom"), Row(height=85, name="Bob")])
>>> df.join(df2, "name").explain()  
== Physical Plan ==
...
... +- SortMergeJoin ...
...

Explicitly trigger the broadcast hashjoin by providing the hint in df2.

>>> df.join(df2.hint("broadcast"), "name").explain()
== Physical Plan ==
...
... +- BroadcastHashJoin ...
...