首頁猿問過濾后的 twitter api 流示例

過濾后的 twitter api 流示例

Python

RISEBY 2023-09-26 14:58:53

我需要獲取 Twitter 流的過濾樣本我正在使用 tweepy 我檢查了 Stream 類的函數(shù)來獲取樣本流并進行過濾但我不明白我應該如何設置班級應該是stream.filter(track=['']).sample() stream.sample().filter(track=[''])或者每一個都排成一行或者什么如果您有另一個想法如何根據(jù)關鍵字過濾器獲取示例流，請幫助提前致謝

查看完整描述

2 回答

侃侃無極

TA貢獻2051條經(jīng)驗獲得超10個贊

Twitter v2 API 包括用于隨機采樣的端點和用于過濾推文的端點。

import requests

import os

import json

import pandas as pd

# To set your enviornment variables in your terminal run the following line:

# export 'BEARER_TOKEN'='<your_bearer_token>'

data = []

counter = 0

def create_headers(bearer_token):

? ? headers = {"Authorization": "Bearer {}".format(bearer_token)}

? ? return headers

def get_rules(headers, bearer_token):

? ? response = requests.get(

? ? ? ? "https://api.twitter.com/2/tweets/search/stream/rules", headers=headers

? ? )

? ? if response.status_code != 200:

? ? ? ? raise Exception(

? ? ? ? ? ? "Cannot get rules (HTTP {}): {}".format(response.status_code, response.text)

? ? ? ? )

? ? print(json.dumps(response.json()))

? ? return response.json()

def delete_all_rules(headers, bearer_token, rules):

? ? if rules is None or "data" not in rules:

? ? ? ? return None

? ? ids = list(map(lambda rule: rule["id"], rules["data"]))

? ? payload = {"delete": {"ids": ids}}

? ? response = requests.post(

? ? ? ? "https://api.twitter.com/2/tweets/search/stream/rules",

? ? ? ? headers=headers,

? ? ? ? json=payload

? ? )

? ? if response.status_code != 200:

? ? ? ? raise Exception(

? ? ? ? ? ? "Cannot delete rules (HTTP {}): {}".format(

? ? ? ? ? ? ? ? response.status_code, response.text

? ? ? ? ? ? )

? ? ? ? )

? ? print(json.dumps(response.json()))

def set_rules(headers, delete, bearer_token):

? ? # You can adjust the rules if needed

? ? sample_rules = [

? ? ? ? {"value": "dog has:images", "tag": "dog pictures"},

? ? ? ? {"value": "cat has:images -grumpy", "tag": "cat pictures"},

? ? ]

? ? payload = {"add": sample_rules}

? ? response = requests.post(

? ? ? ? "https://api.twitter.com/2/tweets/search/stream/rules",

? ? ? ? headers=headers,

? ? ? ? json=payload,

? ? )

? ? if response.status_code != 201:

? ? ? ? raise Exception(

? ? ? ? ? ? "Cannot add rules (HTTP {}): {}".format(response.status_code, response.text)

? ? ? ? )

? ? print(json.dumps(response.json()))

def get_stream(headers, set, bearer_token):

? ? global data, counter

? ? response = requests.get(

? ? ? ? "https://api.twitter.com/2/tweets/search/stream", headers=headers, stream=True,

? ? )

? ? print(response.status_code)

? ? if response.status_code != 200:

? ? ? ? raise Exception(

? ? ? ? ? ? "Cannot get stream (HTTP {}): {}".format(

? ? ? ? ? ? ? ? response.status_code, response.text

? ? ? ? ? ? )

? ? ? ? )

? ? for response_line in response.iter_lines():

? ? ? ? if response_line:

? ? ? ? ? ? json_response = json.loads(response_line)

? ? ? ? ? ? print(json.dumps(json_response, indent=4, sort_keys=True))

? ? ? ? ? ? data.append(json_response['data'])

? ? ? ? ? ? if len(data) % 100 == 0:

? ? ? ? ? ? ? ? print('storing data')

? ? ? ? ? ? ? ? pd.read_json(json.dumps(data), orient='records').to_json(f'tw_example_{counter}.json', orient='records')

? ? ? ? ? ? ? ? data = []

? ? ? ? ? ? ? ? counter +=1

def main():

? ? bearer_token = os.environ.get("BEARER_TOKEN")

? ? headers = create_headers(bearer_token)

? ? rules = get_rules(headers, bearer_token)

? ? delete = delete_all_rules(headers, bearer_token, rules)

? ? set = set_rules(headers, delete, bearer_token)

? ? get_stream(headers, set, bearer_token)

if __name__ == "__main__":

? ? main()

然后，將 pandas dataframe 中的數(shù)據(jù)加載為 df = pd.read_json('tw_example.json',? orient='records').

反對回復 2023-09-26

叮當貓咪

TA貢獻1776條經(jīng)驗獲得超12個贊

我建議閱讀 tweepy 的 api 文檔。

通過閱讀其他代碼片段，我相信應該這樣做：

stream.filter(track=['Keyword'])
print(stream.sample())

反對回復 2023-09-26

千萬里不及你

TA貢獻1784條經(jīng)驗獲得超9個贊

據(jù)我了解，tweepy使用 twitter v1.1 API，該 API 有單獨的 API 用于實時采樣和過濾推文。

Twitter API 參考。?v1 實時采樣?v1 實時過濾

方法一：可以使用stream.filter(track=['Keyword1', 'keyord2'])等方法獲取過濾后的流數(shù)據(jù)，然后從收集的數(shù)據(jù)中采樣記錄。

class StreamListener(tweepy.StreamListener):

? ? def on_status(self, status):

? ? ? ? # do data processing and storing here

方法 2：可以編寫以隨機時間間隔啟動和停止流式傳輸?shù)某绦颍ɡ?，?15 分鐘間隔 3 分鐘隨機采樣）。

方法三：可以使用采樣API來收集數(shù)據(jù)，然后用關鍵字過濾來存儲相關數(shù)據(jù)。

反對回復 2023-09-26

2 回答
0 關注
187 瀏覽

關注

添加回答

舉報

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購課補貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動學習伙伴

公眾號

掃描二維碼
關注慕課網(wǎng)微信公眾號

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

過濾后的 twitter api 流示例

過濾后的 twitter api 流示例

2 回答

添加回答