piper.verbs.str_split¶
-
piper.verbs.str_split(df: pandas.core.frame.DataFrame, column: Optional[str] = None, columns: Optional[List] = None, pat: str = ',', n: int = - 1, expand: bool = True, loc: str = 'after', drop: bool = False) → pandas.core.frame.DataFrame[source]¶ split column
Function accepts a column to split, a pattern/delimitter value and optional list of column names to store the result. By default the result is placed just after the specified column.
Note
If one of the target split columns contains the same name as the column to be split and drop=False, the function will append an underscore ‘_’ to the end of the corresponding new split column name.
If drop=True, the the new split column name will NOT be renamed and just replace the original column name. See examples for details.
- Parameters
df – a pandas dataframe
column – column to be split
columns – list-like of column names to store the results of the split column values
pat –
regular expression pattern. Default is ‘,’
Note
For space(s), safer to use r’s’ rather than ‘ ‘.
n – default -1. Number of splits to capture. -1 means capture ALL splits
loc –
location to place the split column output within the dataframe Default is ‘after’ meaning place the output after the column.
Valid locations are: ‘before’ or ‘after’ the column. You can also specify ‘first’ or ‘last’ corresponding to the first and last columns of the dataframe.
For more information about relocating columns - see the relocate() function.
drop – drop original column to be split
- Returns
- Return type
A copy of the pandas dataframe
Examples
An example where the one of the new split column names is the same as the split column.
%%piper sample_sales() >> select(['-target_profit', '-actual_sales']) >> str_split(column='month', pat='-', columns=['day', 'month', 'year'], drop=False) >> head(tablefmt='plain') location product month day month_ year target_sales actual_profit 4 London Beachwear 2021-01-01 2021 01 01 31749 1753 125 London Beachwear 2021-01-01 2021 01 01 37833 5448 21 London Jeans 2021-01-01 2021 01 01 29485 4417 148 London Jeans 2021-01-01 2021 01 01 37524 4090
The same example, this time with the drop=True parameter specified.
sample_sales() >> select(['-target_profit', '-actual_sales']) >> str_split(column='month', pat='-', columns=['day', 'month', 'year'], drop=True) >> head(tablefmt='plain') location product day month year target_sales actual_profit 4 London Beachwear 2021 01 01 31749 1753 125 London Beachwear 2021 01 01 37833 5448 21 London Jeans 2021 01 01 29485 4417 148 London Jeans 2021 01 01 37524 4090