piper.verbs.str_split

piper.verbs.str_split(df: pandas.core.frame.DataFrame, column: Optional[str] = None, columns: Optional[List] = None, pat: str = ',', n: int = - 1, expand: bool = True, loc: str = 'after', drop: bool = False)pandas.core.frame.DataFrame[source]

split column

Function accepts a column to split, a pattern/delimitter value and optional list of column names to store the result. By default the result is placed just after the specified column.

Note

If one of the target split columns contains the same name as the column to be split and drop=False, the function will append an underscore ‘_’ to the end of the corresponding new split column name.

If drop=True, the the new split column name will NOT be renamed and just replace the original column name. See examples for details.

Parameters
  • df – a pandas dataframe

  • column – column to be split

  • columns – list-like of column names to store the results of the split column values

  • pat

    regular expression pattern. Default is ‘,’

    Note

    For space(s), safer to use r’s’ rather than ‘ ‘.

  • n – default -1. Number of splits to capture. -1 means capture ALL splits

  • loc

    location to place the split column output within the dataframe Default is ‘after’ meaning place the output after the column.

    Valid locations are: ‘before’ or ‘after’ the column. You can also specify ‘first’ or ‘last’ corresponding to the first and last columns of the dataframe.

    For more information about relocating columns - see the relocate() function.

  • drop – drop original column to be split

Returns

Return type

A copy of the pandas dataframe

Examples

An example where the one of the new split column names is the same as the split column.

%%piper

sample_sales()
>> select(['-target_profit', '-actual_sales'])
>> str_split(column='month',
            pat='-',
            columns=['day', 'month', 'year'],
            drop=False)
>> head(tablefmt='plain')

     location product    month        day month_ year target_sales actual_profit
  4  London   Beachwear  2021-01-01  2021     01   01        31749          1753
125  London   Beachwear  2021-01-01  2021     01   01        37833          5448
 21  London   Jeans      2021-01-01  2021     01   01        29485          4417
148  London   Jeans      2021-01-01  2021     01   01        37524          4090

The same example, this time with the drop=True parameter specified.

sample_sales()
>> select(['-target_profit', '-actual_sales'])
>> str_split(column='month',
            pat='-',
            columns=['day', 'month', 'year'],
            drop=True)
>> head(tablefmt='plain')

     location    product      day    month    year    target_sales    actual_profit
  4  London      Beachwear   2021       01      01           31749             1753
125  London      Beachwear   2021       01      01           37833             5448
 21  London      Jeans       2021       01      01           29485             4417
148  London      Jeans       2021       01      01           37524             4090