piper.verbs.across

piper.verbs.across(df: pandas.core.frame.DataFrame, columns: Optional[Union[str, Tuple[str], List[str]]] = None, function: Optional[Callable] = None, series_obj: bool = False, *args, **kwargs)pandas.core.frame.DataFrame[source]

Apply function across multiple columns

Across allows you to apply a function across a number of columns in one statement. Functions can be applied to series values (via apply()) or access pd.Series object methods.

In pandas, to apply the same function (on a Series/columns’ values) you would normally do something like this:

df['column'].apply(function)
df['column2'].apply(function)
df['column3'].apply(function)

Piper equivalent would be:

across(df, ['column1', 'column2', 'column3'], function)

You can also work with Series object functions by passing keyword series_obj=True. In Pandas, if you wanted to change the dtype of a column you would use something like:

df['col'] = df['col'].astype(float)
df['col2'] = df['col2'].astype(float)
df['col3'] = df['col3'].astype(float)

The equivalent with across would be:

df = across(df, ['col', 'col2', 'col3'], function=lambda x: x.astype(float))
Parameters
  • df – pandas dataframe

  • columns

    column(s) to apply function.
    • If a list is provided, only the columns listed are affected by the function.

    • If a tuple is supplied, the first and second values will correspond to the from and to column(s) range used to apply the function to.

  • function – function to be called.

  • series_obj – Default is False. True - Function applied at Series or (DataFrame) ‘object’ level. False - Function applied to each Series row values.

Returns

Return type

A pandas dataframe

Examples

See below, example apply a function applied to each of the columns row values.

%%piper

sample_data()
>> across(['dates', 'order_dates'], to_julian)
# Alternative syntax, passing a lambda...
>> across(['order_dates', 'dates'], function=lambda x: to_julian(x), series_obj=False)
>> head(tablefmt='plain')

      dates    order_dates  countries    regions    ids      values_1    values_2
 0   120001         120007  Italy        East       A             311          26
 1   120002         120008  Portugal     South      D             150         375
 2   120003         120009  Spain        East       A             396          88
 3   120004         120010  Italy        East       B             319         233
%%piper
sample_data()
>> across(['dates', 'order_dates'], fiscal_year, year_only=True)
>> head(tablefmt='plain')

    dates     order_dates    countries    regions    ids      values_1    values_2
 0  FY 19/20  FY 19/20       Italy        East       A             311          26
 1  FY 19/20  FY 19/20       Portugal     South      D             150         375
 2  FY 19/20  FY 19/20       Spain        East       A             396          88
 3  FY 19/20  FY 19/20       Italy        East       B             319         233

Accessing Series object methods - by passing series_obj=True you can also manipulate series object and string vectorized functions (e.g. pd.Series.str.replace())

%%piper
sample_data()
>> select(['-ids', '-regions'])
>> across(columns='values_1', function=lambda x: x.astype(int), series_obj=True)
>> across(columns=['values_1'], function=lambda x: x.astype(int), series_obj=True)
>> head(tablefmt='plain')

    dates                order_dates          countries      values_1    values_2
 0  2020-01-01 00:00:00  2020-01-07 00:00:00  Italy               311          26
 1  2020-01-02 00:00:00  2020-01-08 00:00:00  Portugal            150         375
 2  2020-01-03 00:00:00  2020-01-09 00:00:00  Spain               396          88
 3  2020-01-04 00:00:00  2020-01-10 00:00:00  Italy               319         233