pyspark.sql.functions.regexp_extract_all#

pyspark.sql.functions.regexp_extract_all(str, regexp, idx=None)[source]#

Extract all strings in the str that match the Java regex regexp and corresponding to the regex group index.

New in version 3.5.0.

Parameters

strColumn or str: target column to work on.
regexpColumn or str: regex pattern to apply.
idxint, optional: matched group id.

Returns

Column: all strings in the str that match a Java regex and corresponding to the regex group index.

Examples

>>> df = spark.createDataFrame([("100-200, 300-400", r"(\d+)-(\d+)")], ["str", "regexp"])
>>> df.select(regexp_extract_all('str', lit(r'(\d+)-(\d+)')).alias('d')).collect()
[Row(d=['100', '300'])]
>>> df.select(regexp_extract_all('str', lit(r'(\d+)-(\d+)'), 1).alias('d')).collect()
[Row(d=['100', '300'])]
>>> df.select(regexp_extract_all('str', lit(r'(\d+)-(\d+)'), 2).alias('d')).collect()
[Row(d=['200', '400'])]
>>> df.select(regexp_extract_all('str', col("regexp")).alias('d')).collect()
[Row(d=['100', '300'])]