piper.verbs.assign

piper.verbs.assign(df: pandas.core.frame.DataFrame, *args, **kwargs)pandas.core.frame.DataFrame[source]

Assign new columns to a DataFrame.

Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten.

Parameters
  • df – pandas dataframe

  • args – arbitrary arguments (for future use)

  • **kwargs

    dict of {str: callable or Series}

    The column names are keywords. If the values are callable, they are computed on the DataFrame and assigned to the new columns. The callable must not change input DataFrame (though pandas doesn’t check it). If the values are not callable, (e.g. a Series, scalar, or array), they are simply assigned.

    Note

    If you wish to apply a function to a columns set of values: pass a tuple with column name and function to call. For example:

    assign(reversed=('regions', lambda x: x[::-1]))
    

    is converted to:

    assign(reversed=lambda x: x['regions'].apply(lambda x: x[::-1]))
    

Returns

A new DataFrame with the new columns in addition to all the existing columns.

Return type

A pandas DataFrame

Notes

Assigning multiple columns within the same assign is possible. Later items in ‘**kwargs’ may refer to newly created or modified columns in ‘df’; items are computed and assigned into ‘df’ in order.

Examples

You can create a dictionary of column names to corresponding functions, then pass the dictionary to the assign function as shown below:

%%piper --dot

sample_data()
.. assign(**{'reversed': ('regions', lambda x: x[::-1]),
             'v1_x_10': lambda x: x.values_1 * 10,
             'v2_div_4': lambda x: x.values_2 / 4,
             'dow': lambda x: x.dates.dt.day_name(),
             'ref': lambda x: x.v2_div_4 * 5,
             'ids': lambda x: x.ids.astype('category')})
.. across(['values_1', 'values_2'], lambda x: x.astype(float),
          series_obj=True)
.. relocate('dow', 'after', 'dates')
.. select(['-dates', '-order_dates'])
.. head(tablefmt='plain')

    dow       countries regions  ids values_1 values_2 reversed v1_x_10 v2_div_4  ref
 0  Wednesday Italy     East     A        311       26 tsaE        3110        6   32
 1  Thursday  Portugal  South    D        150      375 htuoS       1500       94  469
 2  Friday    Spain     East     A        396       88 tsaE        3960       22  110
 3  Saturday  Italy     East     B        319      233 tsaE        3190       58  291
%%piper

sample_sales()
>> select()
>> assign(month_plus_one = lambda x: x.month + pd.Timedelta(1, 'D'),
          alt_formula = lambda x: x.actual_sales * .2)
>> select(['-target_profit'])
>> assign(profit_sales = lambda x: x.actual_profit - x.actual_sales,
          month_value = lambda x: x.month.dt.month,
          product_added = lambda x: x['product'] + 'TEST',
          salesthing = lambda x: x.target_sales.sum())
>> select(['-month', '-actual_sales', '-actual_profit', '-month_value', '-location'])
>> assign(profit_sales = lambda x: x.profit_sales.astype(int))
>> reset_index(drop=True)
>> head(tablefmt='plain')

    product   target_sales month_plus_one alt_formula profit_sales product_added salesthing
 0  Beachwear        31749     2021-01-02        5842       -27456 BeachwearTEST    5423715
 1  Beachwear        37833     2021-01-02        6810       -28601 BeachwearTEST    5423715
 2  Jeans            29485     2021-01-02        6310       -27132 JeansTEST        5423715
 3  Jeans            37524     2021-01-02        8180       -36811 JeansTEST        5423715