2 回答

TA貢獻(xiàn)1921條經(jīng)驗(yàn) 獲得超9個(gè)贊
當(dāng)然,您可以使用任何sklearn操作并將其應(yīng)用于groupby對(duì)象。
首先,一個(gè)方便的包裝器:
import typing
import pandas as pd
class SklearnWrapper:
def __init__(self, transform: typing.Callable):
self.transform = transform
def __call__(self, df):
transformed = self.transform.fit_transform(df.values)
return pd.DataFrame(transformed, columns=df.columns, index=df.index)
這將應(yīng)用sklearn您傳遞給它的任何變換到一個(gè)組。
最后簡(jiǎn)單的用法:
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
data = load_iris()
df = pd.DataFrame(data["data"], columns=data["feature_names"])
df["class"] = data["target"]
df_rescaled = (
df.groupby("class")
.apply(SklearnWrapper(StandardScaler()))
.drop("class", axis="columns")
)
編輯:你幾乎可以用SklearnWrapper. 這是為每個(gè)組轉(zhuǎn)換和反轉(zhuǎn)此操作的示例(例如,不要覆蓋轉(zhuǎn)換對(duì)象) - 每次看到新組時(shí)只需重新擬合對(duì)象(并將其添加到list)。
sklearn's為了更容易使用,我復(fù)制了一些功能(您可以通過將適當(dāng)?shù)膫鬟fstring給_call_with_function內(nèi)部方法來使用您想要的任何功能擴(kuò)展它):
class SklearnWrapper:
def __init__(self, transformation: typing.Callable):
self.transformation = transformation
self._group_transforms = []
# Start with -1 and for each group up the pointer by one
self._pointer = -1
def _call_with_function(self, df: pd.DataFrame, function: str):
# If pointer >= len we are making a new apply, reset _pointer
if self._pointer >= len(self._group_transforms):
self._pointer = -1
self._pointer += 1
return pd.DataFrame(
getattr(self._group_transforms[self._pointer], function)(df.values),
columns=df.columns,
index=df.index,
)
def fit(self, df):
self._group_transforms.append(self.transformation.fit(df.values))
return self
def transform(self, df):
return self._call_with_function(df, "transform")
def fit_transform(self, df):
self.fit(df)
return self.transform(df)
def inverse_transform(self, df):
return self._call_with_function(df, "inverse_transform")
用法(組變換,逆運(yùn)算并再次應(yīng)用):
data = load_iris()
df = pd.DataFrame(data["data"], columns=data["feature_names"])
df["class"] = data["target"]
# Create scaler outside the class
scaler = SklearnWrapper(StandardScaler())
# Fit and transform data (holding state)
df_rescaled = df.groupby("class").apply(scaler.fit_transform)
# Inverse the operation
df_inverted = df_rescaled.groupby("class").apply(scaler.inverse_transform)
# Apply transformation once again
df_transformed = (
df_inverted.groupby("class")
.apply(scaler.transform)
.drop("class", axis="columns")
)

TA貢獻(xiàn)1856條經(jīng)驗(yàn) 獲得超5個(gè)贊
我更新了@Szymon Maszke 代碼:
class SklearnWrapper:
def __init__(self, transformation: typing.Callable):
self.transformation = transformation
self._group_transforms = []
# Start with -1 and for each group up the pointer by one
self._pointer = -1
def _call_with_function(self, df: pd.DataFrame, function: str):
# If pointer >= len we are making a new apply, reset _pointer
if self._pointer == len(self._group_transforms)-1 and function=="inverse_transform":
self._pointer = -1
self._pointer += 1
print(self._pointer)
return pd.DataFrame(
getattr(self._group_transforms[self._pointer], function)(df.values),
columns=df.columns,
index=df.index,
)
def fit(self, df):
scaler = copy(self.transformation)
self._group_transforms.append(scaler.fit(df.values))
return self
def transform(self, df):
return self._call_with_function(df, "transform")
def fit_transform(self, df):
self.fit(df)
return self.transform(df)
def inverse_transform(self, df):
return self._call_with_function(df, "inverse_transform")
StandardScaler()中沒有正確存儲(chǔ)_group_transforms,所以我創(chuàng)建了一個(gè)副本(使用副本庫(kù))并存儲(chǔ)它(也許使用 OOP 有更好的方法來做到這一點(diǎn))。
添加回答
舉報(bào)