2 回答
TA貢獻(xiàn)1900條經(jīng)驗(yàn) 獲得超5個(gè)贊
你的正則表達(dá)式幾乎是正確的,但你必須考慮.到點(diǎn)后面的數(shù)字和數(shù)字可能不存在。這可以這樣實(shí)現(xiàn):
\s+(\d+(?:\.\d+)?)\s+
不同之處在于,您可以通過(guò)在組后使用問(wèn)號(hào)將其添加到可能存在或不存在\.\d+的非捕獲組中:(?:xxxx)(?:xxxx)?
TA貢獻(xiàn)1827條經(jīng)驗(yàn) 獲得超9個(gè)贊
我建議使用
res = re.match(r'^(?:(?!.*\d\.\d)(.*?)\s*\b(\d+(?:\s*mg)?)\b\s*(.*)|((?:(?!\d+\.\d).)*?)\s*\b(\d+\.\d+(?:\s*mg)?)\b\s*(.*))$', i)
if res:
all_extract.append(list(filter(None, res.groups())))
請(qǐng)參閱正則表達(dá)式演示。
沒(méi)有注釋代碼的完整Python 演示:
import re
def show():
newresult = ['Naproxen 500 Active ingredient Ph Eur','Croscarmellose sodium 22.0 mg Disintegrant Ph Eur','Povidone K90 11.0 Binder 56 Ph Eur','Water, purifieda','Silica, colloidal anhydrous 2.62 Glidant Ph Eur','Water purified 49 Solvent Ph Eur','Magnesium stearate 1.38 Lubricant Ph Eur']
all_extract = []
for i in newresult:
res = re.match(r'^(?:(?!.*\d\.\d)(.*?)\s*\b(\d+(?:\s*mg)?)\b\s*(.*)|((?:(?!\d+\.\d).)*?)\s*\b(\d+\.\d+(?:\s*mg)?)\b\s*(.*))$', i)
if res:
all_extract.append(list(filter(None, res.groups())))
else:
print("ONLY INTEGER")
regex_integer_part = re.split(r'\s+(\d+(?:\.\d+)?)\s+', i, 1)
all_extract.append(regex_integer_part)
return all_extract
print(show())
產(chǎn)量
[['Naproxen', '500', 'Active ingredient Ph Eur'], ['Croscarmellose sodium', '22.0 mg', 'Disintegrant Ph Eur'], ['Povidone K90', '11.0', 'Binder 56 Ph Eur'], ['Water, purifieda'], ['Silica, colloidal anhydrous', '2.62', 'Glidant Ph Eur'], ['Water purified', '49', 'Solvent Ph Eur'], ['Magnesium stearate', '1.38', 'Lubricant Ph Eur']]
添加回答
舉報(bào)
