首頁(yè) 猿問(wèn) 在特定值的連續(xù)運(yùn)行中創(chuàng)建計(jì)數(shù)器

在特定值的連續(xù)運(yùn)行中創(chuàng)建計(jì)數(shù)器

R語(yǔ)言

慕斯王 2019-08-13 15:15:37

在特定值的連續(xù)運(yùn)行中創(chuàng)建計(jì)數(shù)器我有小時(shí)價(jià)值。我想計(jì)算自上一次非零以來(lái)該值連續(xù)多少小時(shí)。對(duì)于電子表格或循環(huán)來(lái)說(shuō)，這是一項(xiàng)簡(jiǎn)單的工作，但我希望有一個(gè)快速的矢量化單行程來(lái)完成任務(wù)。x <- c(1, 0, 1, 0, 0, 0, 1, 1, 0, 0)df <- data.frame(x, zcount = NA)df$zcount[1] <- ifelse(df$x[1] == 0, 1, 0)for(i in 2:nrow(df)) df$zcount[i] <- ifelse(df$x[i] == 0, df$zcount[i - 1] + 1, 0)期望的輸出：R> df x zcount1 1 02 0 13 1 04 0 15 0 26 0 37 1 08 1 09 0 110 0 2

查看完整描述

3 回答

森欄

TA貢獻(xiàn)1810條經(jīng)驗(yàn) 獲得超5個(gè)贊

這里有一個(gè)方法，建立在約書亞的rle方法：（編輯以使用seq_len和lapply按馬立克的建議）

> (!x) * unlist(lapply(rle(x)$lengths, seq_len))

[1] 0 1 0 1 2 3 0 0 1 2

更新。只是為了踢，這是另一種方法，大約快5倍：

cumul_zeros <- function(x) {

x <- !x

rl <- rle(x)

len <- rl$lengths

v <- rl$values

cumLen <- cumsum(len)

z <- x

# replace the 0 at the end of each zero-block in z by the

# negative of the length of the preceding 1-block....

iDrops <- c(0, diff(v)) < 0

z[ cumLen[ iDrops ] ] <- -len[ c(iDrops[-1],FALSE) ]

# ... to ensure that the cumsum below does the right thing.

# We zap the cumsum with x so only the cumsums for the 1-blocks survive:

x*cumsum(z)

}

試試一個(gè)例子：

> cumul_zeros(c(1,1,1,0,0,0,0,0,1,1,1,0,0,1,1))

[1] 0 0 0 1 2 3 4 5 0 0 0 1 2 0 0

現(xiàn)在比較百萬(wàn)長(zhǎng)度向量的時(shí)間：

> x <- sample(0:1, 1000000,T)

> system.time( z <- cumul_zeros(x))

user system elapsed

0.15 0.00 0.14

> system.time( z <- (!x) * unlist( lapply( rle(x)$lengths, seq_len)))

user system elapsed

0.75 0.00 0.75

故事的道德：?jiǎn)涡懈?，更容易理解，但并不總是最快?/p>

反對(duì) 回復(fù) 2019-08-13

千萬(wàn)里不及你

TA貢獻(xiàn)1784條經(jīng)驗(yàn) 獲得超9個(gè)贊

William Dunlap關(guān)于R-help的帖子是尋找與跑步長(zhǎng)度相關(guān)的所有事情的地方。他在這篇文章中的f7 是

f7 <- function(x){ tmp<-cumsum(x);tmp-cummax((!x)*tmp)}

在目前的情況下f7(!x)。在性能方面有

> x <- sample(0:1, 1000000, TRUE)> system.time(res7 <- f7(!x))
   user  system elapsed 
  0.076   0.000   0.077 > system.time(res0 <- cumul_zeros(x))
   user  system elapsed 
  0.345   0.003   0.349 > identical(res7, res0)[1] TRUE

反對(duì) 回復(fù) 2019-08-13