piper.verbs.across¶
-
piper.verbs.across(df: pandas.core.frame.DataFrame, columns: Optional[Union[str, Tuple[str], List[str]]] = None, function: Optional[Callable] = None, series_obj: bool = False, *args, **kwargs) → pandas.core.frame.DataFrame[source]¶ Apply function across multiple columns
Across allows you to apply a function across a number of columns in one statement. Functions can be applied to series values (via apply()) or access pd.Series object methods.
In pandas, to apply the same function (on a Series/columns’ values) you would normally do something like this:
df['column'].apply(function) df['column2'].apply(function) df['column3'].apply(function)
Piper equivalent would be:
across(df, ['column1', 'column2', 'column3'], function)
You can also work with Series object functions by passing keyword series_obj=True. In Pandas, if you wanted to change the dtype of a column you would use something like:
df['col'] = df['col'].astype(float) df['col2'] = df['col2'].astype(float) df['col3'] = df['col3'].astype(float)
The equivalent with across would be:
df = across(df, ['col', 'col2', 'col3'], function=lambda x: x.astype(float))
- Parameters
df – pandas dataframe
columns –
- column(s) to apply function.
If a list is provided, only the columns listed are affected by the function.
If a tuple is supplied, the first and second values will correspond to the from and to column(s) range used to apply the function to.
function – function to be called.
series_obj – Default is False. True - Function applied at Series or (DataFrame) ‘object’ level. False - Function applied to each Series row values.
- Returns
- Return type
A pandas dataframe
Examples
See below, example apply a function applied to each of the columns row values.
%%piper sample_data() >> across(['dates', 'order_dates'], to_julian) # Alternative syntax, passing a lambda... >> across(['order_dates', 'dates'], function=lambda x: to_julian(x), series_obj=False) >> head(tablefmt='plain') dates order_dates countries regions ids values_1 values_2 0 120001 120007 Italy East A 311 26 1 120002 120008 Portugal South D 150 375 2 120003 120009 Spain East A 396 88 3 120004 120010 Italy East B 319 233
%%piper sample_data() >> across(['dates', 'order_dates'], fiscal_year, year_only=True) >> head(tablefmt='plain') dates order_dates countries regions ids values_1 values_2 0 FY 19/20 FY 19/20 Italy East A 311 26 1 FY 19/20 FY 19/20 Portugal South D 150 375 2 FY 19/20 FY 19/20 Spain East A 396 88 3 FY 19/20 FY 19/20 Italy East B 319 233
Accessing Series object methods - by passing series_obj=True you can also manipulate series object and string vectorized functions (e.g. pd.Series.str.replace())
%%piper sample_data() >> select(['-ids', '-regions']) >> across(columns='values_1', function=lambda x: x.astype(int), series_obj=True) >> across(columns=['values_1'], function=lambda x: x.astype(int), series_obj=True) >> head(tablefmt='plain') dates order_dates countries values_1 values_2 0 2020-01-01 00:00:00 2020-01-07 00:00:00 Italy 311 26 1 2020-01-02 00:00:00 2020-01-08 00:00:00 Portugal 150 375 2 2020-01-03 00:00:00 2020-01-09 00:00:00 Spain 396 88 3 2020-01-04 00:00:00 2020-01-10 00:00:00 Italy 319 233