snippetpythonMinor
Long format data - fill episode based on conditional previous episode
Viewed 0 times
formatpreviousepisodelongconditionalbaseddatafill
Problem
The data are organised as long format data. 4 individuals are observed during 4 or 5 days (
The data:
```
data = structure(list(BCSID = c("B10001N", "B10001N", "B10001N", "B10001N",
"B10001N", "B10001N", "B10001N", "B10001N", "B10001N", "B10001N",
"B10001N", "B10001N", "B10001N", "B10001N", "B10001N", "B10001N",
"B10001N", "B10001N", "B10001N", "B10001N", "B10001N", "B10001N",
"B10001N", "B10001N", "B10001N", "B10001N", "B10001N", "B10001N",
"B10001N", "B10001N", "B10001N", "B10001N", "B10001N", "B10001N",
"B10001N", "B10001N", "B10001N", "B10001N", "B10001N", "B10001N",
"B10001N", "B10001N", "B10001N", "B10001N", "B10001N", "B10001N",
"B10001N", "B10001N", "B10001N", "B10001N", "B10001N", "B10001N",
"B10001N", "B10001N", "B10001N", "B10004R", "B10004R", "B10004R",
"B10004R", "B10004R", "B10004R", "B10004R", "B10004R", "B10004R",
"B10004R", "B10004R", "B10004R", "B10004R", "B10004R", "B10004R",
"B10004R", "B10004R", "B10004R", "B10004R", "B10004R", "B10004R",
"B10004R", "B10004R", "B10004R", "B10004R", "B10004R", "B10004R",
"B10004R", "B10004R", "B10004R", "B10004R", "B10004R", "B10004R",
"B10004R", "B10004R", "B10004R", "B10004R", "B10004R", "B10004R",
"B10004R", "B10004R", "B10004R", "B10004R", "B10004R", "B10004R",
"B10004R", "B10004R", "B10004R", "B10004R", "B10004R", "B10004R",
"B10004R", "B10004R", "B10004R", "B10004R", "B10004R", "B10004R",
"B10004R", "B10004R",
BCSID is the name of the unique key). Basically, the data describe activities performed during these 4-5 days. START describe the start time of activities and MAINACT the activities.The data:
```
data = structure(list(BCSID = c("B10001N", "B10001N", "B10001N", "B10001N",
"B10001N", "B10001N", "B10001N", "B10001N", "B10001N", "B10001N",
"B10001N", "B10001N", "B10001N", "B10001N", "B10001N", "B10001N",
"B10001N", "B10001N", "B10001N", "B10001N", "B10001N", "B10001N",
"B10001N", "B10001N", "B10001N", "B10001N", "B10001N", "B10001N",
"B10001N", "B10001N", "B10001N", "B10001N", "B10001N", "B10001N",
"B10001N", "B10001N", "B10001N", "B10001N", "B10001N", "B10001N",
"B10001N", "B10001N", "B10001N", "B10001N", "B10001N", "B10001N",
"B10001N", "B10001N", "B10001N", "B10001N", "B10001N", "B10001N",
"B10001N", "B10001N", "B10001N", "B10004R", "B10004R", "B10004R",
"B10004R", "B10004R", "B10004R", "B10004R", "B10004R", "B10004R",
"B10004R", "B10004R", "B10004R", "B10004R", "B10004R", "B10004R",
"B10004R", "B10004R", "B10004R", "B10004R", "B10004R", "B10004R",
"B10004R", "B10004R", "B10004R", "B10004R", "B10004R", "B10004R",
"B10004R", "B10004R", "B10004R", "B10004R", "B10004R", "B10004R",
"B10004R", "B10004R", "B10004R", "B10004R", "B10004R", "B10004R",
"B10004R", "B10004R", "B10004R", "B10004R", "B10004R", "B10004R",
"B10004R", "B10004R", "B10004R", "B10004R", "B10004R", "B10004R",
"B10004R", "B10004R", "B10004R", "B10004R", "B10004R", "B10004R",
"B10004R", "B10004R",
Solution
I would first write (or find) a function for shifting a vector
Then, you can create a boolean vector telling if each row meets all the conditions or not:
And finally, do the substitution:
A few more comments:
x by a given number of observations k. The stat package has a lag function but it only allows to shift in one direction (k has to be >= 0)... Here is such a function that will work both ways, with positive or negative k:LAG 0) {
c(rep(NA, k), head(x, -k))
} else {
c(tail(x, k), rep(NA, -k))
}
}Then, you can create a boolean vector telling if each row meets all the conditions or not:
need_replace <- with(data, eorder2 == 1 &
DAY != 1 &
MAINACT == '-11' &
LAG(MAINACT, +1) %in% c('1301', '1302') &
LAG(MAINACT, -1) == '1302')And finally, do the substitution:
data$MAINACT[need_replace] <- '1606'A few more comments:
- I created a vector of TRUE/FALSE rather than a vector of indices like you did with
which. Both work but it is less typing withoutwhich.
- See that I used
with(data, ...)so I did not have to typedata$over and over. This also makes your code shorter and easier to read.
- I used
%in%instead of two==statements separated by|. That's another good function to know (imagine having many more than two allowed values...)
- Be careful that
&has higher priority than|so what you had written was equivalent tostatement1 | (statement2 & statement 3)which is not the same as what I think you had in mind:(statement1 | statement2) & statement3. Priority rules are documented under?Syntax.
- As it stands, none of the rows in your example data match all the conditions you have specified so please let me know if I misunderstood something, I am sure it will be a simple fix.
Code Snippets
LAG <- function(x, k) {
if (k == 0) {
x
} else if (k > 0) {
c(rep(NA, k), head(x, -k))
} else {
c(tail(x, k), rep(NA, -k))
}
}need_replace <- with(data, eorder2 == 1 &
DAY != 1 &
MAINACT == '-11' &
LAG(MAINACT, +1) %in% c('1301', '1302') &
LAG(MAINACT, -1) == '1302')data$MAINACT[need_replace] <- '1606'Context
StackExchange Code Review Q#95594, answer score: 3
Revisions (0)
No revisions yet.