piper.verbs.assign¶
-
piper.verbs.assign(df: pandas.core.frame.DataFrame, *args, **kwargs) → pandas.core.frame.DataFrame[source]¶ Assign new columns to a DataFrame.
Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten.
- Parameters
df – pandas dataframe
args – arbitrary arguments (for future use)
**kwargs –
dict of {str: callable or Series}
The column names are keywords. If the values are callable, they are computed on the DataFrame and assigned to the new columns. The callable must not change input DataFrame (though pandas doesn’t check it). If the values are not callable, (e.g. a Series, scalar, or array), they are simply assigned.
Note
If you wish to apply a function to a columns set of values: pass a tuple with column name and function to call. For example:
assign(reversed=('regions', lambda x: x[::-1]))
is converted to:
assign(reversed=lambda x: x['regions'].apply(lambda x: x[::-1]))
- Returns
A new DataFrame with the new columns in addition to all the existing columns.
- Return type
A pandas DataFrame
Notes
Assigning multiple columns within the same
assignis possible. Later items in ‘**kwargs’ may refer to newly created or modified columns in ‘df’; items are computed and assigned into ‘df’ in order.Examples
You can create a dictionary of column names to corresponding functions, then pass the dictionary to the assign function as shown below:
%%piper --dot sample_data() .. assign(**{'reversed': ('regions', lambda x: x[::-1]), 'v1_x_10': lambda x: x.values_1 * 10, 'v2_div_4': lambda x: x.values_2 / 4, 'dow': lambda x: x.dates.dt.day_name(), 'ref': lambda x: x.v2_div_4 * 5, 'ids': lambda x: x.ids.astype('category')}) .. across(['values_1', 'values_2'], lambda x: x.astype(float), series_obj=True) .. relocate('dow', 'after', 'dates') .. select(['-dates', '-order_dates']) .. head(tablefmt='plain') dow countries regions ids values_1 values_2 reversed v1_x_10 v2_div_4 ref 0 Wednesday Italy East A 311 26 tsaE 3110 6 32 1 Thursday Portugal South D 150 375 htuoS 1500 94 469 2 Friday Spain East A 396 88 tsaE 3960 22 110 3 Saturday Italy East B 319 233 tsaE 3190 58 291
%%piper sample_sales() >> select() >> assign(month_plus_one = lambda x: x.month + pd.Timedelta(1, 'D'), alt_formula = lambda x: x.actual_sales * .2) >> select(['-target_profit']) >> assign(profit_sales = lambda x: x.actual_profit - x.actual_sales, month_value = lambda x: x.month.dt.month, product_added = lambda x: x['product'] + 'TEST', salesthing = lambda x: x.target_sales.sum()) >> select(['-month', '-actual_sales', '-actual_profit', '-month_value', '-location']) >> assign(profit_sales = lambda x: x.profit_sales.astype(int)) >> reset_index(drop=True) >> head(tablefmt='plain') product target_sales month_plus_one alt_formula profit_sales product_added salesthing 0 Beachwear 31749 2021-01-02 5842 -27456 BeachwearTEST 5423715 1 Beachwear 37833 2021-01-02 6810 -28601 BeachwearTEST 5423715 2 Jeans 29485 2021-01-02 6310 -27132 JeansTEST 5423715 3 Jeans 37524 2021-01-02 8180 -36811 JeansTEST 5423715