piper.verbs.select¶
-
piper.verbs.select(df: pandas.core.frame.DataFrame, *args, regex: Optional[str] = None, like: Optional[str] = None, include: Optional[str] = None, exclude: Optional[str] = None) → pandas.core.frame.DataFrame[source]¶ select dataframe columns
Inspired by the select() function from R tidyverse.
Select column names from a dataframe
Examples
df = sample_sales() select(df) # select ALL columns # select column column listed select(df, 'location') # select columns listed select(df, ['product', 'target_sales']) # select ALL columns EXCEPT the column listed (identified by - minus sign prefix) select(df, '-target_profit') # select ALL columns EXCEPT the column specified with a minus sign within the list select(df, ['-target_sales', '-target_profit']) # select column range using a tuple from column up to and including the 'to' column. select(df, ('month', 'actual_profit')) # select all number data types select(df, '-month', include='number') # exclude all float data types including month column select(df, '-month', exclude='float') # select using a regex string select(df, regex='sales') -> select fields containing 'sales' select(df, regex='^sales') -> select fields starting with 'sales' select(df, regex='sales$') -> select fields ending with 'sales'
Note
- See the numpy dtype hierarchy. To select:
strings use the ‘object’ dtype, but note that this will return all object dtype columns
all numeric types, use np.number or ‘number’
datetimes, use np.datetime64, ‘datetime’ or ‘datetime64’
timedeltas, use np.timedelta64, ‘timedelta’ or ‘timedelta64’
Pandas categorical dtypes, use ‘category’
Pandas datetimetz dtypes, use ‘datetimetz’ (new in 0.20.0) or ‘datetime64[ns, tz]’
- Parameters
df – dataframe
args –
- if not specified, then ALL columns are selected
str : single column (in quotes)
list : list of column names (in quotes)
tuple : specify a range of column names or column positions (1-based)
Note
Prefixing column name with a minus sign filters out the column from the returned list of columns e.g. ‘-sales’, ‘-month’
regex – Default None. Wrapper for regex keyword in pd.DataFrame.filter() Keep labels from axis for which re.search(regex, label) == True.
like – Default None. Wrapper for like keyword in pd.DataFrame.filter() Keep labels from axis for which like in label == True.
include – Default None. Wrapper for include keyword in pd.DataFrame.select_dtypes()
exclude – Default None. Wrapper for exclude keyword in pd.DataFrame.select_dtypes()
- Returns
- Return type
pandas DataFrame object