1 回答

TA貢獻(xiàn)2019條經(jīng)驗 獲得超9個贊
這是與您的預(yù)期輸出相匹配的起始樣本數(shù)據(jù)
df1
coil_id sample_factor SEQ
0 E101634 10.4066 1
1 E101634 20.8132 2
2 E101634 31.2198 3
3 E101634 41.6264 4
4 E101634 52.0330 5
5 E101634 62.4396 6
6 E101634 5220.0330 449
df2
coil_id SAMPLE GAUGE
0 E101634 10 0.0550
1 E101634 20 0.0568
2 E101634 30 0.0543
3 E101634 40 0.0531
4 E101634 50 0.0529
5 E101634 60 0.0519
第一步是merge_asof將樣本因子帶到最接近的樣本。然后計算new_gauge每一行的列。但是,我們只會在 sample_factor 介于其當(dāng)前行和下一行的值之間并且線圈 ID 與其和下一行的值相同時才實際分配一個值。
import pandas as pd
merged = pd.merge_asof(df2.assign(SAMPLE = df2.SAMPLE.astype('float')).sort_values('SAMPLE'),
df1.sort_values('sample_factor'),
by='coil_id',
left_on='SAMPLE',
right_on='sample_factor',
direction='forward')
print(merged)
# coil_id SAMPLE GAUGE sample_factor SEQ
#0 E101634 10.0 0.0550 10.4066 1
#1 E101634 20.0 0.0568 20.8132 2
#2 E101634 30.0 0.0543 31.2198 3
#3 E101634 40.0 0.0531 41.6264 4
#4 E101634 50.0 0.0529 52.0330 5
#5 E101634 60.0 0.0519 62.4396 6
# Now perform your calculation:
new_gauge = (merged.GAUGE.shift(1)
+ ((merged.GAUGE - merged.GAUGE.shift(1))/10
* (merged.sample_factor - merged.SAMPLE.shift(1))))
# Assign it only where it makes sense
# Assumes df2 was sorted on ['coil_id', 'SAMPLE']
mask = (merged.sample_factor.between(merged.SAMPLE, merged.SAMPLE.shift(-1))
& (merged.coil_id == merged.coil_id.shift(-1)))
merged.loc[mask, 'new_gauge'] = new_gauge[mask]
輸出: merged
coil_id SAMPLE GAUGE sample_factor SEQ new_gauge
0 E101634 10.0 0.0550 10.4066 1 NaN
1 E101634 20.0 0.0568 20.8132 2 0.056946
2 E101634 30.0 0.0543 31.2198 3 0.053995
3 E101634 40.0 0.0531 41.6264 4 0.052905
4 E101634 50.0 0.0529 52.0330 5 0.052859
5 E101634 60.0 0.0519 62.4396 6 NaN
在這種情況下,我們沒有分配最后一行,因為您提供的子集中沒有樣本 > 60。
添加回答
舉報