我正在使用 Pyathena 運行基本查詢:from pyathena import connect as pyathena_connect #to distinguish from other connect methodsimport pandas as pdclass AthenaDataConnection(): def __init__(self, S3_STAGING_DIR, SEP=';', REGION='us-east-1', ACCESS_KEY=None, S_KEY=None): self.S3_STAGING_DIR = S3_STAGING_DIR self.REGION = REGION self.SEP = SEP if ACCESS_KEY and S_KEY: self.athena_conn = pyathena_connect(s3_staging_dir=self.S3_STAGING_DIR, region_name=self.REGION, aws_access_key_id=ACCESS_KEY, aws_secret_access_key=S_KEY) else: self.athena_conn = pyathena_connect(s3_staging_dir=self.S3_STAGING_DIR, region_name=self.REGION) def get_athena_data(self, sql_dict): print(f"Athena connection established; starting to query data using pd-sql integration") sql_results = {} for filename, sql in sql_dict.items(): try: load_data = pd.read_sql(sql,self.athena_conn) print(f"{filename} data fetched from Athena but not saved (returned in dict only).") sql_results[filename] = load_data except: print(f"Reading {filename} failed") return sql_resultsathena = AthenaDataConnection('s3://athena-staging/',ACCESS_KEY=ACCESS_KEY, S_KEY=S_KEY)sql_dict = {'foobar':"select * from foo.bar where foo='bar'"}df_dict = athena.get_athena_data(sql_dict)df = df_dict.get('foobar')#assume this is the end of the script; i.e., I did NOT save the query results myself因此,當查詢執(zhí)行時,一個文件會出現(xiàn)在暫存文件夾中,例如:s3://athena-staging/abc123_45678_91011.csv我希望我的代碼能夠捕獲該文件名并將其保存用于其他目的。但如何呢?我在 Pyathena 文檔中找不到任何內(nèi)容。更新- 我剛剛了解到文件名是查詢 ID + .csv!所以我現(xiàn)在正在尋找一種獲取 Athena 查詢 ID 的方法。
1 回答

千萬里不及你
TA貢獻1784條經(jīng)驗 獲得超9個贊
好的,一旦我了解到文件名不是隨機的,而是 Athena 的查詢 ID,我就能夠進行更好的搜索并找到解決方案。使用我上面已經(jīng)創(chuàng)建的對象:
cursor = athena.athena_conn.cursor() cursor.execute(sql) cursor.query_id
返回 query_id,它是末尾不帶 .csv 的文件名?,F(xiàn)在我可以根據(jù)需要獲取文件了。
添加回答
舉報
0/150
提交
取消