0%

R - Replace NA with Zero and Empty String in Multiple Columns

This casual note is to record how to use R to replace the NA with 0 or any string. Generally, NA can be generated for different reasons, like unclean data, data transformation, or missing values. Otherwise, we have to convert NA to zero or other stings in order to present them in tables or listings.

In R the most simple function to replace NA is replace() or is.na() functions.

library(tidyverse)

data <- tibble(
  a = c(1, 2, NA, 3, 4),
  b = c(5, NA, 6, 7, 8),
  c = c(9, 10, 11, NA, 12)
)

For instance, if we want to replace NAs in all columns, the simple functions can be used like:

data[is.na(data)] <- 0
replace(data, is.na(data), 0)

In the more factual scenario, we will have both numeric and character columns at the same time, not only the numeric in the above example. It seems the prior method is not convenient as we must select the numeric or character columns first and then replace NA with any appropriate value. Through searching on Google, I suppose the more simple way is to use dplyr::mutate_if() to check and select the specific type of columns, and replace_na() to replace the NAs.

data <- tibble(
  num1 = c(NA, 1, NA),
  num2 = c(2, NA, 3),
  chr1 = c("a", NA, "b"),
  chr2 = c("c", "d", NA)
)

data %>%
  mutate_if(is.numeric, ~replace_na(., 0)) %>%
  mutate_if(is.character, ~replace_na(., "xx"))

To be honest, I prefer the combo functions as I got used to applying the pipe %>% code in R, so the relevant functions like mutate_if(), mutate_all(), mutate_at() functions in tidyverse R package are very convenient for me.

For instance, if you want to replace NAs with 0 on selected column names or indexes, as shown below.

data %>%
  mutate_at(c(1,2), ~replace_na(., 0))

Besides the dplyr::coalesce() function can also be used to replace the NAs in a very tricky way, although it’s used to find the first non-missing element in common.

data %>%
  mutate(num1 = coalesce(num1, 0))

Reference

R – Replace NA with Empty String in a DataFrame
R – Replace NA with 0 in Multiple Columns