第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

為了賬號(hào)安全,請(qǐng)及時(shí)綁定郵箱和手機(jī)立即綁定
已解決430363個(gè)問(wèn)題,去搜搜看,總會(huì)有你想問(wèn)的

如何在 python 中實(shí)現(xiàn) EM-GMM?

如何在 python 中實(shí)現(xiàn) EM-GMM?

森欄 2023-05-23 14:53:00
如下所示:import numpy as npdef PDF(data, means, variances):? ? return 1/(np.sqrt(2 * np.pi * variances) + eps) * np.exp(-1/2 * (np.square(data - means) / (variances + eps)))def EM_GMM(data, k, iterations):? ? weights = np.ones((k, 1)) / k # shape=(k, 1)? ? means = np.random.choice(data, k)[:, np.newaxis] # shape=(k, 1)? ? variances = np.random.random_sample(size=k)[:, np.newaxis] # shape=(k, 1)? ? data = np.repeat(data[np.newaxis, :], k, 0) # shape=(k, n)? ? for step in range(iterations):? ? ? ? # Expectation step? ? ? ? likelihood = PDF(data, means, np.sqrt(variances)) # shape=(k, n)? ? ? ? # Maximization step? ? ? ? b = likelihood * weights # shape=(k, n)? ? ? ? b /= np.sum(b, axis=1)[:, np.newaxis] + eps? ? ? ? # updage means, variances, and weights? ? ? ? means = np.sum(b * data, axis=1)[:, np.newaxis] / (np.sum(b, axis=1)[:, np.newaxis] + eps)? ? ? ? variances = np.sum(b * np.square(data - means), axis=1)[:, np.newaxis] / (np.sum(b, axis=1)[:, np.newaxis] + eps)? ? ? ? weights = np.mean(b, axis=1)[:, np.newaxis]? ? ? ??? ? return means, variances我認(rèn)為這是錯(cuò)誤的,因?yàn)檩敵鍪莾蓚€(gè)向量,其中一個(gè)代表means值,另一個(gè)代表variances值。讓我對(duì)實(shí)現(xiàn)產(chǎn)生懷疑的模糊點(diǎn)是它返回0.00000000e+000大部分可以看到的輸出,并且不需要真正可視化這些輸出。順便說(shuō)一句,輸入數(shù)據(jù)是時(shí)間序列數(shù)據(jù)。我已經(jīng)檢查了所有內(nèi)容并進(jìn)行了多次跟蹤,但沒(méi)有出現(xiàn)錯(cuò)誤。
查看完整描述

2 回答

?
心有法竹

TA貢獻(xiàn)1866條經(jīng)驗(yàn) 獲得超5個(gè)贊

我看到的關(guān)鍵點(diǎn)是means初始化。按照sklearn Gaussian Mixture的默認(rèn)實(shí)現(xiàn),我切換到 KMeans,而不是隨機(jī)初始化。

import numpy as np

import seaborn as sns

import matplotlib.pyplot as plt

plt.style.use('seaborn')


eps=1e-8?


def PDF(data, means, variances):

? ? return 1/(np.sqrt(2 * np.pi * variances) + eps) * np.exp(-1/2 * (np.square(data - means) / (variances + eps)))


def EM_GMM(data, k=3, iterations=100, init_strategy='kmeans'):

? ? weights = np.ones((k, 1)) / k # shape=(k, 1)

? ??

? ? if init_strategy=='kmeans':

? ? ? ? from sklearn.cluster import KMeans

? ? ? ??

? ? ? ? km = KMeans(k).fit(data[:, None])

? ? ? ? means = km.cluster_centers_ # shape=(k, 1)

? ? ? ??

? ? else: # init_strategy=='random'

? ? ? ? means = np.random.choice(data, k)[:, np.newaxis] # shape=(k, 1)

? ??

? ? variances = np.random.random_sample(size=k)[:, np.newaxis] # shape=(k, 1)


? ? data = np.repeat(data[np.newaxis, :], k, 0) # shape=(k, n)


? ? for step in range(iterations):

? ? ? ? # Expectation step

? ? ? ? likelihood = PDF(data, means, np.sqrt(variances)) # shape=(k, n)


? ? ? ? # Maximization step

? ? ? ? b = likelihood * weights # shape=(k, n)

? ? ? ? b /= np.sum(b, axis=1)[:, np.newaxis] + eps


? ? ? ? # updage means, variances, and weights

? ? ? ? means = np.sum(b * data, axis=1)[:, np.newaxis] / (np.sum(b, axis=1)[:, np.newaxis] + eps)

? ? ? ? variances = np.sum(b * np.square(data - means), axis=1)[:, np.newaxis] / (np.sum(b, axis=1)[:, np.newaxis] + eps)

? ? ? ? weights = np.mean(b, axis=1)[:, np.newaxis]

? ? ? ??

? ? return means, variances

這似乎更一致地產(chǎn)生所需的輸出:


s = np.array([25.31? ? ? , 24.31? ? ? , 24.12? ? ? , 43.46? ? ? , 41.48666667,

? ? ? ? ? ? ? 41.48666667, 37.54? ? ? , 41.175? ? ?, 44.81? ? ? , 44.44571429,

? ? ? ? ? ? ? 44.44571429, 44.44571429, 44.44571429, 44.44571429, 44.44571429,

? ? ? ? ? ? ? 44.44571429, 44.44571429, 44.44571429, 44.44571429, 44.44571429,

? ? ? ? ? ? ? 44.44571429, 44.44571429, 39.71? ? ? , 26.69? ? ? , 34.15? ? ? ,

? ? ? ? ? ? ? 24.94? ? ? , 24.75? ? ? , 24.56? ? ? , 24.38? ? ? , 35.25? ? ? ,

? ? ? ? ? ? ? 44.62? ? ? , 44.94? ? ? , 44.815? ? ?, 44.69? ? ? , 42.31? ? ? ,

? ? ? ? ? ? ? 40.81? ? ? , 44.38? ? ? , 44.56? ? ? , 44.44? ? ? , 44.25? ? ? ,

? ? ? ? ? ? ? 43.66666667, 43.66666667, 43.66666667, 43.66666667, 43.66666667,

? ? ? ? ? ? ? 40.75? ? ? , 32.31? ? ? , 36.08? ? ? , 30.135? ? ?, 24.19? ? ? ])

k=3

n_iter=100


means, variances = EM_GMM(s, k, n_iter)

print(means,variances)

[[44.42596231]

?[24.509301? ]

?[35.4137508 ]]?

[[0.07568723]

?[0.10583743]

?[0.52125856]]


# Plotting the results

colors = ['green', 'red', 'blue', 'yellow']

bins = np.linspace(np.min(s)-2, np.max(s)+2, 100)


plt.figure(figsize=(10,7))

plt.xlabel('$x$')

plt.ylabel('pdf')


sns.scatterplot(s, [0.05] * len(s), color='navy', s=40, marker=2, label='Series data')


for i, (m, v) in enumerate(zip(means, variances)):

? ? sns.lineplot(bins, PDF(bins, m, v), color=colors[i], label=f'Cluster {i+1}')


plt.legend()

plt.plot()

http://img1.sycdn.imooc.com/646c63170001c9ba06070419.jpg

最后我們可以看到純隨機(jī)初始化產(chǎn)生了不同的結(jié)果;讓我們看看結(jié)果means:


for _ in range(5):

? ? print(EM_GMM(s, k, n_iter, init_strategy='random')[0], '\n')


[[44.42596231]

?[44.42596231]

?[44.42596231]]


[[44.42596231]

?[24.509301? ]

?[30.1349997 ]]


[[44.42596231]

?[35.4137508 ]

?[44.42596231]]


[[44.42596231]

?[30.1349997 ]

?[44.42596231]]


[[44.42596231]

?[44.42596231]

?[44.42596231]]

可以看出這些結(jié)果有多么不同,在某些情況下,結(jié)果均值是恒定的,這意味著初始化選擇了 3 個(gè)相似的值并且在迭代時(shí)沒(méi)有太大變化。在 中添加一些打印語(yǔ)句EM_GMM將澄清這一點(diǎn)。


查看完整回答
反對(duì) 回復(fù) 2023-05-23
?
一只萌萌小番薯

TA貢獻(xiàn)1795條經(jīng)驗(yàn) 獲得超7個(gè)贊

# Expectation step

likelihood = PDF(data, means, np.sqrt(variances))

我們?yōu)槭裁匆猻qrt過(guò)去variances?pdf 函數(shù)接受差異。所以這應(yīng)該是PDF(data, means, variances)。

另一個(gè)問(wèn)題,


# Maximization step

b = likelihood * weights # shape=(k, n)

b /= np.sum(b, axis=1)[:, np.newaxis] + eps

上面第二行應(yīng)該是b /= np.sum(b, axis=0)[:, np.newaxis] + eps

同樣在 的初始化中variances,


variances = np.random.random_sample(size=k)[:, np.newaxis] # shape=(k, 1)

為什么我們要隨機(jī)初始化方差?我們有data和means,為什么不像 中那樣計(jì)算當(dāng)前估計(jì)方差vars = np.expand_dims(np.mean(np.square(data - means), axis=1), -1)?

通過(guò)這些更改,這是我的實(shí)現(xiàn),


import numpy as np

import seaborn as sns

import matplotlib.pyplot as plt

plt.style.use('seaborn')


eps=1e-8



def pdf(data, means, vars):

    denom = np.sqrt(2 * np.pi * vars) + eps

    numer = np.exp(-0.5 * np.square(data - means) / (vars + eps))

    return numer /denom



def em_gmm(data, k, n_iter, init_strategy='k_means'):

    weights = np.ones((k, 1), dtype=np.float32) / k

    if init_strategy == 'k_means':

        from sklearn.cluster import KMeans

        km = KMeans(k).fit(data[:, None])

        means = km.cluster_centers_

    else:

        means = np.random.choice(data, k)[:, np.newaxis]

    data = np.repeat(data[np.newaxis, :], k, 0)

    vars = np.expand_dims(np.mean(np.square(data - means), axis=1), -1)

    for step in range(n_iter):

        p = pdf(data, means, vars)

        b = p * weights

        denom = np.expand_dims(np.sum(b, axis=0), 0) + eps

        b = b / denom

        means_n = np.sum(b * data, axis=1)

        means_d = np.sum(b, axis=1) + eps

        means = np.expand_dims(means_n / means_d, -1)

        vars = np.sum(b * np.square(data - means), axis=1) / means_d

        vars = np.expand_dims(vars, -1)

        weights = np.expand_dims(np.mean(b, axis=1), -1)


    return means, vars



def main():

    s = np.array([25.31, 24.31, 24.12, 43.46, 41.48666667,

                  41.48666667, 37.54, 41.175, 44.81, 44.44571429,

                  44.44571429, 44.44571429, 44.44571429, 44.44571429, 44.44571429,

                  44.44571429, 44.44571429, 44.44571429, 44.44571429, 44.44571429,

                  44.44571429, 44.44571429, 39.71, 26.69, 34.15,

                  24.94, 24.75, 24.56, 24.38, 35.25,

                  44.62, 44.94, 44.815, 44.69, 42.31,

                  40.81, 44.38, 44.56, 44.44, 44.25,

                  43.66666667, 43.66666667, 43.66666667, 43.66666667, 43.66666667,

                  40.75, 32.31, 36.08, 30.135, 24.19])

    k = 3

    n_iter = 100


    means, vars = em_gmm(s, k, n_iter)

    y = 0

    colors = ['green', 'red', 'blue', 'yellow']

    bins = np.linspace(np.min(s) - 2, np.max(s) + 2, 100)

    plt.figure(figsize=(10, 7))

    plt.xlabel('$x$')

    plt.ylabel('pdf')

    sns.scatterplot(s, [0.0] * len(s), color='navy', s=40, marker=2, label='Series data')

    for i, (m, v) in enumerate(zip(means, vars)):

        sns.lineplot(bins, pdf(bins, m, v), color=colors[i], label=f'Cluster {i + 1}')

    plt.legend()

    plt.plot()


    plt.show()

    pass

這是我的結(jié)果。

http://img1.sycdn.imooc.com//646c633100012b5106580449.jpg

查看完整回答
反對(duì) 回復(fù) 2023-05-23
  • 2 回答
  • 0 關(guān)注
  • 218 瀏覽
慕課專欄
更多

添加回答

舉報(bào)

0/150
提交
取消
微信客服

購(gòu)課補(bǔ)貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動(dòng)學(xué)習(xí)伙伴

公眾號(hào)

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號(hào)