1 回答

TA貢獻(xiàn)1803條經(jīng)驗(yàn) 獲得超3個贊
如果我理解正確,您的問題是島和間隙問題的變體。每個具有可接受間隙的單調(diào)(增加或減少)子序列將形成一個島。例如,給定一個系列s:
s island
-- ------
0 1
0 1
1 1
3 2 # gap > 1, form new island
4 2
2 3 # stop increasing, form new island
1 3
0 3
概括地說:只要當(dāng)前行和前一行之間的差距超出 [-1, 1] 范圍,就會形成一個新島。
將此間隙島算法應(yīng)用于Query Segment Id和Reference Segment Id:
Query Segment Id Q Island Reference Segment Id R Island Q-R Intersection
---------------- -------- -------------------- -------- ----------------
1 1 1 1 (1, 1)
2 1 2 1 (1, 1)
3 1 3 1 (1, 1)
0 2 4 1 (2, 1)
1 2 5 1 (2, 1)
2 2 6 1 (2, 1)
3 2 7 1 (2, 1)
4 2 8 1 (2, 1)
0 3 9 1 (3, 1)
您正在尋找的qand范圍現(xiàn)在是每個 的開頭和結(jié)尾的and 。最后一個警告:忽略長度為 1 的交叉點(diǎn)(如最后一個交叉點(diǎn))。rQuery Segment IdReference Segment IdQ-R Intersection
代碼:
columns = ['Query Segment Id', 'Reference Segment Id']
df = pd.DataFrame(data_with_multiple_contiguous_sequences, columns=columns)
def get_island(col):
return (~col.diff().between(-1,1)).cumsum()
df[['Q Island', 'R Island']] = df[['Query Segment Id', 'Reference Segment Id']].apply(get_island)
result = df.groupby(['Q Island', 'R Island']) \
.agg(**{
'Q Start': ('Query Segment Id', 'first'),
'Q End': ('Query Segment Id', 'last'),
'R Start': ('Reference Segment Id', 'first'),
'R End': ('Reference Segment Id', 'last'),
'Count': ('Query Segment Id', 'count')
}) \
.replace({'Count': 1}, {'Count': np.nan}) \
.dropna()
result['Q'] = result[['Q Start', 'Q End']].apply(tuple, axis=1)
result['R'] = result[['R Start', 'R End']].apply(tuple, axis=1)
結(jié)果:
Q Start Q End R Start R End Count Q R
Q Island R Island
1 1 1 3 1 3 3 (1, 3) (1, 3)
2 1 0 4 4 8 5 (0, 4) (4, 8)
添加回答
舉報(bào)