pyspark.sql.functions.regexp_extract_all#
- pyspark.sql.functions.regexp_extract_all(str, regexp, idx=None)[source]#
Extract all strings in the str that match the Java regex regexp and corresponding to the regex group index.
New in version 3.5.0.
- Parameters
- Returns
Column
all strings in the str that match a Java regex and corresponding to the regex group index.
Examples
>>> df = spark.createDataFrame([("100-200, 300-400", r"(\d+)-(\d+)")], ["str", "regexp"]) >>> df.select(regexp_extract_all('str', lit(r'(\d+)-(\d+)')).alias('d')).collect() [Row(d=['100', '300'])] >>> df.select(regexp_extract_all('str', lit(r'(\d+)-(\d+)'), 1).alias('d')).collect() [Row(d=['100', '300'])] >>> df.select(regexp_extract_all('str', lit(r'(\d+)-(\d+)'), 2).alias('d')).collect() [Row(d=['200', '400'])] >>> df.select(regexp_extract_all('str', col("regexp")).alias('d')).collect() [Row(d=['100', '300'])]