首頁猿問根據(jù)行開頭的時(shí)間戳過濾文本文件

根據(jù)行開頭的時(shí)間戳過濾文本文件

Python

千巷貓影 2023-06-13 10:44:17

我有這個(gè)巨大的文本文件，我想在分鐘的頂部獲取具有關(guān)聯(lián)數(shù)據(jù)的行。這是來自該文本文件的幾行。這是超過 36 小時(shí)的數(shù)據(jù)片段。我所說的關(guān)聯(lián)是指時(shí)間戳后面的 8 個(gè)數(shù)據(jù)點(diǎn)。2020-08-03 22:17:12,0,0,4803,4800,91,28.05,24.05,58.89172020-08-03 22:17:13,0,0,4802,4800,91,28.05,24.05,58.89252020-08-03 22:17:14,0,0,4805,4800,91,28.05,24.05,58.93412020-08-03 22:17:15,0,0,4802,4800,91,28.05,24.05,58.96832020-08-03 22:17:18,0,0,4802,4800,91,28.05,23.05,58.978...我找不到一種方法讓 python 查看時(shí)間戳的秒部分，然后創(chuàng)建一個(gè)僅包含與“:00”秒相關(guān)聯(lián)的數(shù)據(jù)的新列表。for line in fh: line = line.rstrip("\n") line = line.split(",") masterlist.extend(line) #this is putting the information into one list altmasterlist.append(line) #this is putting the lines of information into a listfor line in altmasterlist: if ":00" in line: finalmasterlist.extend(line) #Nothing is entering this if statementprint(finalmasterlist)我什至在這兩個(gè) for 循環(huán)的正確區(qū)域嗎？

查看完整描述

3 回答

MMTTMM

TA貢獻(xiàn)1869條經(jīng)驗(yàn) 獲得超4個(gè)贊

使用熊貓
- 主要區(qū)別在于，pandas 已將所有數(shù)據(jù)轉(zhuǎn)換為正確的dtype,（例如datetime,?int, 和float），并且代碼更簡潔。
- 此外，數(shù)據(jù)現(xiàn)在采用了一種有用的格式來執(zhí)行時(shí)間序列分析和繪圖，但我建議添加列名稱。
- df.columns = ['datetime', ..., 'price']
- 這可以通過 1 行矢量化操作來完成。
- 如timeit測試所示，對于 1M 行數(shù)據(jù)，使用 pandas 比使用讀取文件with open和str查找:00.
讀取文件并pandas.read_csv解析第 0 列中的日期。
- 使用header=None，因?yàn)闇y試數(shù)據(jù)中沒有提供標(biāo)題
使用布爾索引選擇秒為 0 的日期
- 使用.dt訪問器獲取.second.

import pandas as pd

# read the file which apparently has no header and parse the date column

df = pd.read_csv('test.csv', header=None, parse_dates=[0])

# using Boolean indexing to select data when seconds = 00

top_of_the_minute = df[df[0].dt.second == 0]

# save the data

top_of_the_minute.to_csv('clean.csv', header=False, index=False)

# display(top_of_the_minute)

? ? ? ? ? ? ? ? ? ? 0? 1? 2? ? ?3? ? ?4? ?5? ? ? 6? ? ? 7? ? ? ? 8

5 2020-08-03 22:17:00? 0? 0? 4803? 4800? 91? 28.05? 24.05? 58.8917

6 2020-08-03 22:17:00? 0? 0? 4802? 4800? 91? 28.05? 24.05? 58.8925

7 2020-08-03 22:17:00? 0? 0? 4805? 4800? 91? 28.05? 24.05? 58.9341

8 2020-08-03 22:17:00? 0? 0? 4802? 4800? 91? 28.05? 24.05? 58.9683

9 2020-08-03 22:17:00? 0? 0? 4802? 4800? 91? 28.05? 23.05? 58.9780

# example: rename columns

top_of_the_minute.columns = ['datetime', 'v1', 'v2', 'v3', 'v4', 'v5', 'p1', 'p2', 'p3']

# example: plot the data

p = top_of_the_minute.plot('datetime', 'p3')

p.legend(bbox_to_anchor=(1.05, 1), loc='upper left')

p.set_xlim('2020-08', '2020-09')

test.csv

2020-08-03 22:17:12,0,0,4803,4800,91,28.05,24.05,58.8917

2020-08-03 22:17:13,0,0,4802,4800,91,28.05,24.05,58.8925

2020-08-03 22:17:14,0,0,4805,4800,91,28.05,24.05,58.9341

2020-08-03 22:17:15,0,0,4802,4800,91,28.05,24.05,58.9683

2020-08-03 22:17:18,0,0,4802,4800,91,28.05,23.05,58.978

2020-08-03 22:17:00,0,0,4803,4800,91,28.05,24.05,58.8917

2020-08-03 22:17:00,0,0,4802,4800,91,28.05,24.05,58.8925

2020-08-03 22:17:00,0,0,4805,4800,91,28.05,24.05,58.9341

2020-08-03 22:17:00,0,0,4802,4800,91,28.05,24.05,58.9683

2020-08-03 22:17:00,0,0,4802,4800,91,28.05,23.05,58.978

%%timeit測試

創(chuàng)建測試數(shù)據(jù)

# read test.csv

df = pd.read_csv('test.csv', header=None, parse_dates=[0])

# create a dataframe with 1M rows?

df = pd.concat([df] * 100000)

# save the new test data

df.to_csv('test.csv', index=False, header=False)

test_sk

def test_sk(path: str):

? ? zero_entries = []

? ? with open(path, "r") as file:

? ? ? ? for line in file:

? ? ? ? ? ? semi_index = line.index(',')

? ? ? ? ? ? if line[:semi_index].endswith(':00'):

? ? ? ? ? ? ? ? zero_entries.append(line)

? ? return zero_entries

%%timeit

result_sk = test_sk('test.csv')

[out]:

668 ms ± 5.69 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

test_tm

def test_tm(path: str):

? ? df = pd.read_csv(path, header=None, parse_dates=[0])

? ? return df[df[0].dt.second == 0]

%%timeit

result_tm = test_tm('test.csv')

[out]:

774 ms ± 7.27 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

反對回復(fù) 2023-06-13

慕桂英4014372

TA貢獻(xiàn)1871條經(jīng)驗(yàn) 獲得超13個(gè)贊

試試這個(gè)

finalmasterlist2 = []

for i in range(len(altmasterlist)):

if ":00" in altmasterlist[i][0]:

finalmasterlist2.extend(altmasterlist[i])

print("finalemasterlist_2")

print(finalmasterlist2)

輸入：

2020-08-03 22:17:12,0,0,4803,4800,91,28.05,24.05,58.8917

2020-08-03 22:17:13,0,0,4802,4800,91,28.05,24.05,58.8925

2020-08-03 22:17:00,0,0,4805,4800,91,28.05,24.05,58.9341

2020-08-03 22:17:15,0,0,4802,4800,91,28.05,24.05,58.9683

2020-08-03 22:17:18,0,0,4802,4800,91,28.05,23.05,58.978

輸出：

['2020-08-03 22:17:00', '0', '0', '4805', '4800', '91', '28.05', '24.05', '58.9341']

反對回復(fù) 2023-06-13

長風(fēng)秋雁

TA貢獻(xiàn)1757條經(jīng)驗(yàn) 獲得超7個(gè)贊

你說你的文件很大？也許最好在閱讀時(shí)拆分?jǐn)?shù)據(jù)。

您可以在沒有庫的情況下這樣做：

zero_entries = []

with open(path_to_file, "r") as file:

# iterates over every line

for line in file:

# finds the end if the first cell

timestamp_end = line.index(',')

# checks if the timestamp ends on zero seconds and adds it to a list.

if line[:timestamp_end].endswith(':00'):

zero_entries.append(line)

print(zero_entries)

我假設(shè)您的時(shí)間戳將始終是該行的第一個(gè)元素。

根據(jù)您的文件大小，這將比 Trenton 的解決方案快得多（我用 ~58k 行對其進(jìn)行了測試）：

import time

import pandas as pd

path = r"txt.csv"

start = time.time()

zero_entries = []

with open(path, "r") as file:

for line in file:

semi_index = line.index(',')

if line[:semi_index].endswith(':00'):

zero_entries.append(line)

end = time.time()

print(end-start)

start = time.time()

df = pd.read_csv(path, header=None, parse_dates=[0])

# using Boolean indexing to select data when seconds = 00

top_of_the_minute = df[df[0].dt.second == 0]

end = time.time()

print(end-start)

0.04886937141418457 # built-in

0.27971720695495605 # pandas

反對回復(fù) 2023-06-13

3 回答
0 關(guān)注
198 瀏覽

關(guān)注

添加回答

舉報(bào)

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購課補(bǔ)貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動(dòng)學(xué)習(xí)伙伴

公眾號

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

根據(jù)行開頭的時(shí)間戳過濾文本文件

根據(jù)行開頭的時(shí)間戳過濾文本文件

3 回答

添加回答