pyspark.sql.functions.regexp_extract_all#

pyspark.sql.functions.regexp_extract_all(str, regexp, idx=None)[source]#

Extract all strings in the str that match the Java regex regexp and corresponding to the regex group index.

New in version 3.5.0.

Parameters
strColumn or str

target column to work on.

regexpColumn or str

regex pattern to apply.

idxint, optional

matched group id.

Returns
Column

all strings in the str that match a Java regex and corresponding to the regex group index.

Examples

>>> df = spark.createDataFrame([("100-200, 300-400", r"(\d+)-(\d+)")], ["str", "regexp"])
>>> df.select(regexp_extract_all('str', lit(r'(\d+)-(\d+)')).alias('d')).collect()
[Row(d=['100', '300'])]
>>> df.select(regexp_extract_all('str', lit(r'(\d+)-(\d+)'), 1).alias('d')).collect()
[Row(d=['100', '300'])]
>>> df.select(regexp_extract_all('str', lit(r'(\d+)-(\d+)'), 2).alias('d')).collect()
[Row(d=['200', '400'])]
>>> df.select(regexp_extract_all('str', col("regexp")).alias('d')).collect()
[Row(d=['100', '300'])]