piper.verbs.select

piper.verbs.select(df: pandas.core.frame.DataFrame, *args, regex: Optional[str] = None, like: Optional[str] = None, include: Optional[str] = None, exclude: Optional[str] = None)pandas.core.frame.DataFrame[source]

select dataframe columns

Inspired by the select() function from R tidyverse.

Select column names from a dataframe

Examples

df = sample_sales()

select(df) # select ALL columns

# select column column listed
select(df, 'location')

# select columns listed
select(df, ['product', 'target_sales'])

# select ALL columns EXCEPT the column listed (identified by - minus sign prefix)
select(df, '-target_profit')

# select ALL columns EXCEPT the column specified with a minus sign within the list
select(df, ['-target_sales', '-target_profit'])

# select column range using a tuple from column up to and including the 'to' column.
select(df, ('month', 'actual_profit'))

# select all number data types
select(df, '-month', include='number')

# exclude all float data types including month column
select(df, '-month', exclude='float')

# select using a regex string
select(df, regex='sales') -> select fields containing 'sales'
select(df, regex='^sales') -> select fields starting with 'sales'
select(df, regex='sales$') -> select fields ending with 'sales'

Note

See the numpy dtype hierarchy. To select:
  • strings use the ‘object’ dtype, but note that this will return all object dtype columns

  • all numeric types, use np.number or ‘number’

  • datetimes, use np.datetime64, ‘datetime’ or ‘datetime64’

  • timedeltas, use np.timedelta64, ‘timedelta’ or ‘timedelta64’

  • Pandas categorical dtypes, use ‘category’

  • Pandas datetimetz dtypes, use ‘datetimetz’ (new in 0.20.0) or ‘datetime64[ns, tz]’

Parameters
  • df – dataframe

  • args

    if not specified, then ALL columns are selected
    • str : single column (in quotes)

    • list : list of column names (in quotes)

    • tuple : specify a range of column names or column positions (1-based)

    Note

    Prefixing column name with a minus sign filters out the column from the returned list of columns e.g. ‘-sales’, ‘-month’

  • regex – Default None. Wrapper for regex keyword in pd.DataFrame.filter() Keep labels from axis for which re.search(regex, label) == True.

  • like – Default None. Wrapper for like keyword in pd.DataFrame.filter() Keep labels from axis for which like in label == True.

  • include – Default None. Wrapper for include keyword in pd.DataFrame.select_dtypes()

  • exclude – Default None. Wrapper for exclude keyword in pd.DataFrame.select_dtypes()

Returns

Return type

pandas DataFrame object