2 回答

TA貢獻(xiàn)1995條經(jīng)驗(yàn) 獲得超2個(gè)贊
數(shù)據(jù)調(diào)用自:
POST http://alta.registries.gov.ab.ca/SpinII/mapserver.aspx
內(nèi)容在被OpenLayers 庫使用之前以自定義格式進(jìn)行編碼。所有的解碼都位于這個(gè)JS文件中。如果你美化了,你可以找一下它的WayTo.Wtb.Format.WTB
解碼OpenLayers.Class
。二進(jìn)制文件按照 JS 中的如下所示逐字節(jié)解碼:
switch(elementType){
case 1:
var lineColor = new WayTo.Wtb.Element.LineColor();
byteOffset = lineColor.parse(dataReader, byteOffset);
outputElement = lineColor;
break;
case 2:
var lineStyle = new WayTo.Wtb.Element.LineStyle();
byteOffset = lineStyle.parse(dataReader, byteOffset);
outputElement = lineStyle;
break;
case 3:
var ellipse = new WayTo.Wtb.Element.Ellipse();
byteOffset = ellipse.parse(dataReader, byteOffset);
outputElement = ellipse;
break;
........
}
我們必須重現(xiàn)這個(gè)解碼算法才能獲得原始數(shù)據(jù)。我們不需要解碼所有對(duì)象,我們只想獲得正確的偏移量并strings
正確提取。這里有一個(gè)Python解碼部分的腳本,用于解碼文件中的數(shù)據(jù)(輸出卷曲):
with open("wtb.bin", mode='rb') as file:
encodedData = file.read()
offset = 0
objects = []
while offset < len(encodedData):
elementSize = encodedData[offset]
offset+=1
elementType = encodedData[offset]
offset+=1
if elementType == 0:
break
curElemSize = elementSize
curElemType = elementType
if elementType== 114:
largeElementSize = int.from_bytes(encodedData[offset:offset + 4], "big")
offset+=4
largeElementType = int.from_bytes(encodedData[offset:offset+2], "little")
offset+=2
curElemSize = largeElementSize
curElemType = largeElementType
print(f"type {curElemType} | size {curElemSize}")
offsetInit = offset
if curElemType == 1:
offset+=4
elif curElemType == 2:
offset+=2
elif curElemType == 3:
offset+=20
elif curElemType == 4:
offset+=28
elif curElemType == 5:
offset+=12
elif curElemType == 6:
textLength = curElemSize - 3
objects.append({
"type": "Text",
"x_position": int.from_bytes(encodedData[offset:offset+2], "little"),
"y_position": int.from_bytes(encodedData[offset+2:offset+4], "little"),
"rotation": int.from_bytes(encodedData[offset+4:offset+6], "little"),
"text": encodedData[offset+6:offset+6+(textLength*2)].decode("utf-8").replace('\x00','')
})
offset+=6+(textLength*2)
elif curElemType == 7:
numPoint = int(curElemSize / 2)
offset+=4*numPoint
elif curElemType == 27:
numPoint = int(curElemSize / 4)
offset+=8*numPoint
elif curElemType == 8:
numPoint = int(curElemSize / 2)
offset+=4*numPoint
elif curElemType == 28:
numPoint = int(curElemSize / 4)
offset+=8*numPoint
elif curElemType == 13:
offset+=4
elif curElemType == 14:
offset+=2
elif curElemType == 15:
offset+=2
elif curElemType == 100:
pass
elif curElemType == 101:
offset+=20
elif curElemType == 102:
offset+=2
elif curElemType == 103:
pass
elif curElemType == 104:
highShort = int.from_bytes(encodedData[offset+2:offset+4], "little")
lowShort = int.from_bytes(encodedData[offset+4:offset+6], "little")
objects.append({
"type": "StartNumericCell",
"entity": int.from_bytes(encodedData[offset:offset+2], "little"),
"occurrence": (highShort << 16) + lowShort
})
offset+=6
elif curElemType == 105:
#end cell
pass
elif curElemType == 109:
textLength = curElemSize - 1
objects.append({
"type": "StartAlphanumericCell",
"entity": int.from_bytes(encodedData[offset:offset+2], "little"),
"occurrence":encodedData[offset+2:offset+2+(textLength*2)].decode("utf-8").replace('\x00','')
})
offset+=2+(textLength*2)
elif curElemType == 111:
offset+=40
elif curElemType == 112:
objects.append({
"type": "CoordinatePlane",
"projection_code": encodedData[offset+48:offset+52].decode("utf-8").replace('\x00','')
})
offset+=52
elif curElemType == 113:
offset+=24
elif curElemType == 256:
nameLength = int.from_bytes(encodedData[offset+14:offset+16], "little")
objects.append({
"type": "LargePolygon",
"name": encodedData[offset+16:offset+16+nameLength].decode("utf-8").replace('\x00',''),
"occurence": int.from_bytes(encodedData[offset+2:offset+6], "little")
})
if nameLength > 0:
offset+= 16 + nameLength
if encodedData[offset] == 0:
offset+=1
else:
offset+= 16
numberOfPoints = int.from_bytes(encodedData[offset:offset+2], "little")
offset+=2
offset+=numberOfPoints*8
elif curElemType == 257:
pass
else:
offset+= curElemSize*2
print(f"offset diff {offset-offsetInit}")
print("--------------------------------")
print(objects)
print(len(encodedData))
print(offset)
(旁注:請(qǐng)注意,元素大小采用大端字節(jié)序,所有其他值均采用小端字節(jié)序)
運(yùn)行這個(gè) repl.it以查看它如何解碼文件
從那里我們構(gòu)建了抓取數(shù)據(jù)的步驟,為了清楚起見,我將描述所有步驟(甚至是您已經(jīng)弄清楚的步驟):
登錄
使用以下命令登錄網(wǎng)站:
GET https://alta.registries.gov.ab.ca/spinii/logon.aspx
抓取輸入名稱/值并添加uctrlLogon:cmdLogonGuest.x
,uctrlLogon:cmdLogonGuest.y
然后調(diào)用
POST https://alta.registries.gov.ab.ca/spinii/logon.aspx
法律聲明
法律聲明調(diào)用對(duì)于獲取地圖值不是必需的,但對(duì)于獲取項(xiàng)目信息是必需的(帖子中的最后一步)
GET https://alta.registries.gov.ab.ca/spinii/legalnotice.aspx
抓取input
標(biāo)簽名稱/值并設(shè)置cmdYES.x
然后cmdYES.y
調(diào)用
POST https://alta.registries.gov.ab.ca/spinii/legalnotice.aspx
地圖數(shù)據(jù)
調(diào)用服務(wù)器地圖API:
POST http://alta.registries.gov.ab.ca/SpinII/mapserver.aspx
有以下數(shù)據(jù):
{
"mt":"titleresults",
"qt":"lincNo",
"LINCNumber": lincNumber,
"rights": "B", #not required
"cx": 1920, #screen definition
"cy": 1080,
}
cx
/xy
是畫布尺寸
使用上述方法對(duì)編碼數(shù)據(jù)進(jìn)行解碼。你會(huì)得到 :
[{'type': 'LargePolygon', 'name': '0010495134 8722524;1;162', 'entity': 23, 'occurence': 628079167, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0012170859 8022146;8;99', 'entity': 23, 'occurence': 628048595, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0010691822 8722524;1;163', 'entity': 23, 'occurence': 628222354, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0012169736 8022146;8;89', 'entity': 23, 'occurence': 628021327, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0010694454 8722524;1;179', 'entity': 23, 'occurence': 628191678, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0010694362 8722524;1;178', 'entity': 23, 'occurence': 628307403, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0010433381 8722524;1;177', 'entity': 23, 'occurence': 628209696, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0012169710 8022146;8;88A', 'entity': 23, 'occurence': 628021328, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0010694355 8722524;1;176', 'entity': 23, 'occurence': 628315826, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0012170866 8022146;8;100', 'entity': 23, 'occurence': 628163431, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0010694347 8722524;1;175', 'entity': 23, 'occurence': 628132810, 'line_color_green': 0, 'line_color_red': 129,
提取信息
如果您想針對(duì)特定的目標(biāo),lincNumber
則需要查找多邊形的樣式,因?yàn)閷?duì)于“多個(gè)”值(例如具有多個(gè)項(xiàng)目的值),沒有提及lincNumber
響應(yīng)的 id,只有鏈接引用。以下將獲取所選項(xiàng)目:
selectedZone = [
t
for t in objects
if t.get("fill_color_green", 255) < 255 and t.get("line_color_red") == 255
][0]
print(selectedZone)
調(diào)用您在帖子中提到的網(wǎng)址來獲取數(shù)據(jù)并提取表:
GET https://alta.registries.gov.ab.ca/SpinII/popupTitleSearch.aspx?title={selectedZone["occurence"]}
完整代碼:
import requests
from bs4 import BeautifulSoup
import pandas as pd
lincNumber = "0030278592"
#lincNumber = "0010661156"
s = requests.Session()
# 1) login
r = s.get("https://alta.registries.gov.ab.ca/spinii/logon.aspx")
soup = BeautifulSoup(r.text, "html.parser")
payload = dict([
(t["name"], t.get("value", ""))
for t in soup.findAll("input")
])
payload["uctrlLogon:cmdLogonGuest.x"] = 76
payload["uctrlLogon:cmdLogonGuest.y"] = 25
s.post("https://alta.registries.gov.ab.ca/spinii/logon.aspx",data=payload)
# 2) legal notice
r = s.get("https://alta.registries.gov.ab.ca/spinii/legalnotice.aspx")
soup = BeautifulSoup(r.text, "html.parser")
payload = dict([
(t["name"], t.get("value", ""))
for t in soup.findAll("input")
])
payload["cmdYES.x"] = 82
payload["cmdYES.y"] = 3
s.post("https://alta.registries.gov.ab.ca/spinii/legalnotice.aspx", data = payload)
# 3) map data
r = s.post("http://alta.registries.gov.ab.ca/SpinII/mapserver.aspx",
data= {
"mt":"titleresults",
"qt":"lincNo",
"LINCNumber": lincNumber,
"rights": "B", #not required
"cx": 1920, #screen definition
"cy": 1080,
})
def decodeWtb(encodedData):
offset = 0
objects = []
iteration = 0
while offset < len(encodedData):
elementSize = encodedData[offset]
offset+=1
elementType = encodedData[offset]
offset+=1
if elementType == 0:
break
curElemSize = elementSize
curElemType = elementType
if elementType== 114:
largeElementSize = int.from_bytes(encodedData[offset:offset + 4], "big")
offset+=4
largeElementType = int.from_bytes(encodedData[offset:offset+2], "little")
offset+=2
curElemSize = largeElementSize
curElemType = largeElementType
offsetInit = offset
if curElemType == 1:
offset+=4
elif curElemType == 2:
offset+=2
elif curElemType == 3:
offset+=20
elif curElemType == 4:
offset+=28
elif curElemType == 5:
offset+=12
elif curElemType == 6:
textLength = curElemSize - 3
offset+=6+(textLength*2)
elif curElemType == 7:
numPoint = int(curElemSize / 2)
offset+=4*numPoint
elif curElemType == 27:
numPoint = int(curElemSize / 4)
offset+=8*numPoint
elif curElemType == 8:
numPoint = int(curElemSize / 2)
offset+=4*numPoint
elif curElemType == 28:
numPoint = int(curElemSize / 4)
offset+=8*numPoint
elif curElemType == 13:
offset+=4
elif curElemType == 14:
offset+=2
elif curElemType == 15:
offset+=2
elif curElemType == 100:
pass
elif curElemType == 101:
offset+=20
elif curElemType == 102:
offset+=2
elif curElemType == 103:
pass
elif curElemType == 104:
offset+=6
elif curElemType == 105:
pass
elif curElemType == 109:
textLength = curElemSize - 1
offset+=2+(textLength*2)
elif curElemType == 111:
offset+=40
elif curElemType == 112:
offset+=52
elif curElemType == 113:
offset+=24
elif curElemType == 256:
nameLength = int.from_bytes(encodedData[offset+14:offset+16], "little")
objects.append({
"type": "LargePolygon",
"name": encodedData[offset+16:offset+16+nameLength].decode("utf-8").replace('\x00',''),
"entity": int.from_bytes(encodedData[offset:offset+2], "little"),
"occurence": int.from_bytes(encodedData[offset+2:offset+6], "little"),
"line_color_green": encodedData[offset + 8],
"line_color_red": encodedData[offset + 7],
"line_color_blue": encodedData[offset + 9],
"fill_color_green": encodedData[offset + 10],
"fill_color_red": encodedData[offset + 11],
"fill_color_blue": encodedData[offset + 13]
})
if nameLength > 0:
offset+= 16 + nameLength
if encodedData[offset] == 0:
offset+=1
else:
offset+= 16
numberOfPoints = int.from_bytes(encodedData[offset:offset+2], "little")
offset+=2
offset+=numberOfPoints*8
elif curElemType == 257:
pass
else:
offset+= curElemSize*2
return objects
# 4) decode custom format
objects = decodeWtb(r.content)
# 5) get the selected area
selectedZone = [
t
for t in objects
if t.get("fill_color_green", 255) < 255 and t.get("line_color_red") == 255
][0]
print(selectedZone)
# 6) get the info about item
r = s.get(f'https://alta.registries.gov.ab.ca/SpinII/popupTitleSearch.aspx?title={selectedZone["occurence"]}')
df = pd.read_html(r.content, attrs = {'class': 'bodyText'}, header =0)[0]
del df['Add to Cart']
del df['View']
print(df[:-1])
輸出
Title Number Type LINC Number Short Legal Rights Registration Date Change/Cancel Date
0 052400228 Current Title 0030278592 0420091;16 Surface 19/09/2005 13/11/2019
1 072294084 Current Title 0030278551 0420091;12 Surface 22/05/2007 21/08/2007
2 072400529 Current Title 0030278469 0420091;3 Surface 05/07/2007 28/08/2007
3 072498228 Current Title 0030278501 0420091;7 Surface 18/08/2007 08/02/2008
4 072508699 Current Title 0030278535 0420091;10 Surface 23/08/2007 13/12/2007
5 072559500 Current Title 0030278477 0420091;4 Surface 17/09/2007 19/11/2007
6 072559508 Current Title 0030278576 0420091;14 Surface 17/09/2007 09/01/2009
7 072559521 Current Title 0030278519 0420091;8 Surface 17/09/2007 07/11/2007
8 072559530 Current Title 0030278493 0420091;6 Surface 17/09/2007 25/08/2008
9 072559605 Current Title 0030278485 0420091;5 Surface 17/09/2007 23/12/2008
objects如果您想獲得更多條目,可以查看該字段。如果您想獲得有關(guān)坐標(biāo)等項(xiàng)目的更多信息,您可以改進(jìn)解碼器......
還可以通過查看包含 lincNumber 的字段來匹配目標(biāo)周圍的其他 lincNumber,name除非其中存在“多個(gè)”名稱。

TA貢獻(xiàn)1893條經(jīng)驗(yàn) 獲得超10個(gè)贊
有兩種選擇可以獲取您正在尋找的信息,其中一種是您可能已經(jīng)知道的硒。
當(dāng)您將鼠標(biāo)懸停在地圖上時(shí),打開網(wǎng)絡(luò)選項(xiàng)卡并監(jiān)視瀏覽器傳遞的請(qǐng)求是否向服務(wù)器發(fā)出請(qǐng)求。對(duì)于請(qǐng)求和 BS4,您最好的選擇是如果數(shù)據(jù)已經(jīng)加載,那么下面的解決方案可能會(huì)起作用
import re print(re.findall(r’628086906’, r.text) )
如果它打印出數(shù)字,則意味著數(shù)據(jù)在 json 中可用并隨頁面一起加載,您可以加載 json 或使用正則表達(dá)式查找。否則你唯一的選擇是硒
- 2 回答
- 0 關(guān)注
- 182 瀏覽
添加回答
舉報(bào)