2 回答

TA貢獻2021條經(jīng)驗 獲得超8個贊
在這個問題中,我們有一個順序模式,我們可以將“start_no”和“end_no”列轉(zhuǎn)換為所需數(shù)據(jù)幀的列。當(dāng)我們采用類似的值時(start_no0, end_no0, start_no1, end_no1, ...),我們實際上得到了“start_no”和“end_no”所需列的最大部分。通過簡單的修復(fù),我們可以獲得完全相同的列。相同的邏輯可以應(yīng)用于 start_date 和 end_date,因為它們代表相同的事物。
由于您有不同的工作站和機器值,我們可以通過使用 Stat.、Mac.、start_date、end_date 索引來將問題分組。在代碼中,我試圖通過忽略原始數(shù)據(jù)集中的時間字段來獲取當(dāng)天的所有值?;旧衔抑皇菍?shù)據(jù)進行分組并迭代每個組以創(chuàng)建一個包含您想要的信息的新數(shù)據(jù)框。
對于您共享的案例,代碼如下所示:
import numpy as np
import pandas as pd
data = pd.read_excel("sample_2.xlsx")
# transform (start|end)_date as only date without time
data["_sDate"] = data.start_date.apply(lambda x: x.strftime("%Y-%m-%d"))
data["_eDate"] = data.end_date.apply(lambda x: x.strftime("%Y-%m-%d"))
# group the data by following columns
grouped = data.groupby(["Station","Machine","_sDate","_eDate"])
# container for storing result of each group
container = []
# iterate the groups
for name, group in grouped:
# sort them by start_number
group = group.sort_values("start_number")
# get (start|end)_numbers into a flatten array
nums = group[["start_number", "end_number"]].values.flatten()
# get (start|end)_date into a flatten array
dates = group[["start_date", "end_date"]].values.flatten()
## insert required values to nums and dates
# we add the first pause time at index 1 to show first working interval
dates = np.insert(dates, 1 , dates[0] + nums[0]*10**9)
# we add 0 in the beginning of the array to show first working interval
nums = np.insert(nums, 0, 0)
# create df
nrow = nums.size-1 # decrement, because we add one additional element
newdf = pd.DataFrame({
"Station": np.tile(("A"),nrow),
"Machine": np.tile(("B"),nrow),
"start_date": dates[:-1],
"end_date": dates[1:],
"start_no": nums[:-1],
"end_no": nums[1:],
"status": np.tile(["working", "pause"], nrow//2)
})
container.append(newdf)
df_final = pd.concat(container)
df_final.index = range(0,df_final.shape[0])

TA貢獻1891條經(jīng)驗 獲得超3個贊
一種快速但緩慢的方法可能是遍歷所有行并檢查當(dāng)前 + 下一行。您只有 1000 行,所以現(xiàn)在就可以了。這看起來像這樣:
import pandas as pd
df = pd.read_excel("sample_2.xlsx")
df['status'] = 'pause'
df = df.sort_values(['Workcenter','Machine','Error_Reason','Class','start_date','start_time', 'end_date','end_time']).reset_index()
new_df = df.copy()
number_rows = len(df)-1
for i in range(number_rows):
row = df.loc[i]
next_row = df.loc[i+1]
new_row = row
new_row['status'] = 'working'
new_row['start_date'] = row['end_date']
new_row['end_date'] = next_row['start_date']
new_row['start_number'] = row['end_number']
new_row['end_number'] = next_row['start_number']
new_df = new_df.append(new_row)
- 2 回答
- 0 關(guān)注
- 193 瀏覽
添加回答
舉報