R - Create multiple new columns using conditional statements
R - Create multiple new columns using conditional statements
R - Create multiple new columns using conditional statements
I am wondering if there is a way to create multiple columns based on a condition(s).
For example below, I have a dataframe with data and I want to create two columns based on the based on ccy. One column is a gbp conversion rate of the ccy and the other is a cad conversion.
If I pipe the mutates I can get it to work but there is repetition (and in my real problem, I have a complex list of ifelse so repeating the code for every column would create a lot of repetition).
df <- structure(list(product = c('option', 'forward', 'forward', 'option'),
ccy = c('usd', 'usd', 'eur', 'usd'),
amount = c(1000, 2000, 1000, 5000)),
.Names = c('product', 'ccy', 'amount'),
row.names = c(NA, 4L),
class = "data.frame")
df
product ccy amount
1 option usd 1000
2 forward usd 2000
3 forward eur 1000
4 option usd 5000
df %>% mutate(gbp_amount =
ifelse(ccy == 'usd', round(amount / 1.8, 2),
ifelse(ccy == 'eur', round(amount / 1.3, 2),
'not_converted'))) %>%
mutate(cad_amount =
ifelse(ccy == 'usd', round(amount / 0.85, 2),
ifelse(ccy == 'eur', round(amount / .7, 2),
'not_converted')))
product ccy amount gbp_amount cad_amount
1 option usd 1000 555.56 1176.47
2 forward usd 2000 1111.11 2352.94
3 forward eur 1000 769.23 1428.57
4 option usd 5000 2777.78 5882.35
Is there a way to create multiple columns based on a single if condition?
E.g., something like this pseudo code...
df %>% ifelse(df$ccy == 'usd',
(mutate(gbp_amount = round(amount / 1.8, 2)),
mutate(cad_amount = round(amount / 0.85, 2))),
ifelse(df$ccy == 'eur',
(mutate(gbp_amount = round(amount / 1.3, 2)),
mutate(cad_amount = round(amount / 0.7, 2))),
'not_converted'))
3 Answers
3
You can use SQL-alike joins if you have many "equals" conditions.
I am using the data.table
syntax but you could do this dplyr
too:
data.table
dplyr
library(data.table)
setDT(df)
# add a row which cannot be found ("joined") to demonstrate missing rates
df <- rbind(df, data.table(product = "option", ccy = "aud", amount = 3000))
df
lookup <- data.table(ccy = c("usd", "eur"),
gbp_rate = c( 1.8, 1.3),
cad_rate = c( 0.85, 0.7))
lookup
# ccy gbp_rate cad_rate
# 1: usd 1.8 0.85
# 2: eur 1.3 0.70
df[lookup, `:=`(gbp_amount = round(amount / gbp_rate, 2),
cad_amount = round(amount / cad_rate, 2)),
on = "ccy"]
df
# product ccy amount gbp_amount cad_amount
# 1: option usd 1000 555.56 1176.47
# 2: forward usd 2000 1111.11 2352.94
# 3: forward eur 1000 769.23 1428.57
# 4: option usd 5000 2777.78 5882.35
# 5: option aud 3000 NA NA
You have to sort the result as you like and mark the lookup errors (missing conversion rates) with another value than NA
if you want (but not with the string "not_converted"
like in your question since this would mix up the data type of the column - double vs character).
NA
"not_converted"
Consider building a rates data set and merge with your original, avoiding nested ifelse
:
ifelse
rates_df <- data.frame(ccy = c('usd', 'eur'),
type = c('gbp', 'gbp', 'cad', 'cad'),
rate = c(1.8, 1.3, 0.85, 0.7),
stringsAsFactors = FALSE)
rates_df
df %>%
inner_join(rates_df, by="ccy") %>%
mutate(gbp_amount = ifelse(type=="gbp", round(amount / rate, 2), 0),
cad_amount = ifelse(type=="cad", round(amount / rate, 2), 0)) %>%
select(product, ccy, matches("amount")) %>%
group_by(product, ccy, amount) %>%
summarise_all(sum)
# # A tibble: 4 x 5
# # Groups: product, ccy [?]
# product ccy amount gbp_amount cad_amount
# <chr> <chr> <dbl> <dbl> <dbl>
# 1 forward eur 1000 769.23 1428.57
# 2 forward usd 2000 1111.11 2352.94
# 3 option usd 1000 555.56 1176.47
# 4 option usd 5000 2777.78 5882.35
You'll have to use a for-loop
if you want to do multiple actions. @R Yoda's solution is probably better though. Like he said, I'd use NA rather than a character string so you aren't mixing data types in a vector, otherwise it'll default to character.
for-loop
for (i in 1:nrow(df)) {
if(df$ccy[i] == "usd") {
df$gbp_amount[i] <- round(df$amount[i] / 1.8, 2);
df$cad_amount[i] <- round(df$amount[i] / 0.85, 2);
} else {
NA
}
if(df$ccy[i] == "eur") {
df$gbp_amount[i] <- round(df$amount[i] / 1.3, 2);
df$cad_amount[i] <- round(df$amount[i] / 0.7, 2);
} else {
NA
}
}
Heresy, I know! User asked for a single
if
statement though, careful what you wish for.– Anonymous coward
Jun 29 at 21:51
if
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
You just converted a vectorized process to rowwise iteration!
– Parfait
Jun 29 at 21:27