1 回答

TA貢獻1844條經(jīng)驗 獲得超8個贊
你的問題是你循環(huán)太多次。至少,您應該計算一個距離矩陣并計算有多少點落在該矩陣的半徑內(nèi)。但是,最快的解決方案是使用 numpy 的向量化函數(shù),它們是高度優(yōu)化的 C 代碼。
與大多數(shù)學習經(jīng)驗一樣,最好從一個小問題開始:
>>> import numpy as np
>>> import pandas as pd
>>> from scipy.spatial import distance_matrix
# Create a dataframe with columns two MID_X and MID_Y assigned at random
>>> np.random.seed(42)
>>> df = pd.DataFrame(np.random.uniform(1, 10, size=(5, 2)), columns=['MID_X', 'MID_Y'])
>>> df.index.name = 'PointID'
MID_X MID_Y
PointID
0 4.370861 9.556429
1 7.587945 6.387926
2 2.404168 2.403951
3 1.522753 8.795585
4 6.410035 7.372653
# Calculate the distance matrix
>>> cols = ['MID_X', 'MID_Y']
>>> d = distance_matrix(df[cols].values, df[cols].values)
array([[0. , 4.51542241, 7.41793942, 2.94798323, 2.98782637],
[4.51542241, 0. , 6.53786001, 6.52559479, 1.53530446],
[7.41793942, 6.53786001, 0. , 6.4521226 , 6.38239593],
[2.94798323, 6.52559479, 6.4521226 , 0. , 5.09021286],
[2.98782637, 1.53530446, 6.38239593, 5.09021286, 0. ]])
# The radii for which you want to measure. They need to be raised
# up 2 extra dimensions to prepare for array broadcasting later
>>> radii = np.array([3,6,9])[:, None, None]
array([[[3]],
[[6]],
[[9]]])
# Count how many points fall within a certain radius from another
# point using numpy's array broadcasting. `d < radii` will return
# an array of `True/False` and we can count the number of `True`
# by `sum` over the last axis.
#
# The distance between a point to itself is 0 and we don't want
# to count that hence the -1.
>>> count = (d < radii).sum(axis=-1) - 1
array([[2, 1, 0, 1, 2],
[3, 2, 0, 2, 3],
[4, 4, 4, 4, 4]])
# Putting everything together for export
>>> result = pd.DataFrame(count, index=radii.flatten()).stack().to_frame('Count')
>>> result.index.names = ['Radius', 'PointID']
Count
Radius PointID
3 0 2
1 1
2 0
3 1
4 2
6 0 3
1 2
2 0
3 2
4 3
9 0 4
1 4
2 4
3 4
4 4
最終結(jié)果意味著在半徑 3 內(nèi),點 #0 有 2 個鄰居,點 #1 有 1 個鄰居,點 #2 有 0 個鄰居,依此類推。根據(jù)您的喜好重塑和格式化框架。
將其擴展到數(shù)千個點應該沒有問題。
添加回答
舉報