第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

為了賬號(hào)安全,請(qǐng)及時(shí)綁定郵箱和手機(jī)立即綁定
已解決430363個(gè)問題,去搜搜看,總會(huì)有你想問的

unity ml 代理 python api 的奇怪結(jié)果

unity ml 代理 python api 的奇怪結(jié)果

森欄 2023-10-05 16:25:25
我正在使用 3DBall 示例環(huán)境,但我得到了一些非常奇怪的結(jié)果,我不明白它們?yōu)槭裁磿?huì)發(fā)生。到目前為止,我的代碼只是一個(gè) for range 循環(huán),用于查看獎(jiǎng)勵(lì)并用隨機(jī)值填充所需的輸入。然而,當(dāng)我這樣做時(shí),從未顯示出負(fù)面獎(jiǎng)勵(lì),并且隨機(jī)不會(huì)有決策步驟,這是有道理的,但它不應(yīng)該繼續(xù)模擬直到有決策步驟嗎?任何幫助將不勝感激,因?yàn)槌宋臋n之外,幾乎沒有任何資源。env = UnityEnvironment()env.reset()behavior_names = env.behavior_specsfor i in range(50):    arr = []    behavior_names = env.behavior_specs    for i in behavior_names:        print(i)    DecisionSteps = env.get_steps("3DBall?team=0")    print(DecisionSteps[0].reward,len(DecisionSteps[0].reward))    print(DecisionSteps[0].action_mask) #for some reason it returns action mask as false when Decisionsteps[0].reward is empty and is None when not    for i in range(len(DecisionSteps[0])):        arr.append([])        for b in range(2):            arr[-1].append(random.uniform(-10,10))    if(len(DecisionSteps[0])!= 0):        env.set_actions("3DBall?team=0",numpy.array(arr))        env.step()    else:        env.step()env.close()
查看完整描述

1 回答

?
白板的微信

TA貢獻(xiàn)1883條經(jīng)驗(yàn) 獲得超3個(gè)贊

我認(rèn)為您的問題是,當(dāng)模擬終止并需要重置時(shí),代理不會(huì)返回 adecision_step而是返回terminal_step. 這是因?yàn)榇硪呀?jīng)丟球了,terminal_step 中返回的獎(jiǎng)勵(lì)將為 -1.0。我已經(jīng)獲取了你的代碼并做了一些更改,現(xiàn)在它運(yùn)行良好(除了你可能想要更改,這樣你就不會(huì)在每次代理之一掉球時(shí)重置)。


import numpy as np

import mlagents

from mlagents_envs.environment import UnityEnvironment


# -----------------

# This code is used to close an env that might not have been closed before

try:

    unity_env.close()

except:

    pass

# -----------------


env = UnityEnvironment(file_name = None)

env.reset()


for i in range(1000):

    arr = []

    behavior_names = env.behavior_specs


    # Go through all existing behaviors

    for behavior_name in behavior_names:

        decision_steps, terminal_steps = env.get_steps(behavior_name)


        for agent_id_terminated in terminal_steps:

            print("Agent " + behavior_name + " has terminated, resetting environment.")

            # This is probably not the desired behaviour, as the other agents are still active. 

            env.reset()


        actions = []

        for agent_id_decisions in decision_steps:

            actions.append(np.random.uniform(-1,1,2))


        # print(decision_steps[0].reward)

        # print(decision_steps[0].action_mask)


        if len(actions) > 0:

            env.set_actions(behavior_name, np.array(actions))

    try:

        env.step()

    except:

        print("Something happend when taking a step in the environment.")

        print("The communicatior has probably terminated, stopping simulation early.")

        break

env.close()


查看完整回答
反對(duì) 回復(fù) 2023-10-05
  • 1 回答
  • 0 關(guān)注
  • 133 瀏覽
慕課專欄
更多

添加回答

舉報(bào)

0/150
提交
取消
微信客服

購課補(bǔ)貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動(dòng)學(xué)習(xí)伙伴

公眾號(hào)

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號(hào)