0%

Four ways to split the column in mutate

Here is just a trick note to demonstrate how to split the column when you use the mutate function from the dplyr package in R.

All the ways are referred to in this discussion in Stackoverflow. I keep a record of this due to the convenience for next reference.

First of all, I show one wrong way that I’ve done before. Given you have a dummy data below, and would like to split and get the first half of the string with _ delimiter.

library(tidyverse)
data <- tibble(
  label = c("a_1", "b_2", "c_3", "d_4", "e_5")
)

As per my past experience, I got used to splitting the label column by str_split(label, "_")[[1]][1]. But that is unable to give the correct output where the values are all “a”. You can see below or try it by yourself.

data %>% 
  mutate(sublabel = str_split(label, "_")[[1]][1])

# A tibble: 5 × 2
  label sublabel
  <chr> <chr>   
1 a_1   a       
2 b_2   a       
3 c_3   a       
4 d_4   a       
5 e_5   a  

Obviously you can see that’s definitely wrong. The correct way you can use has been listed below and I summarize them from that article in Stackoverflow.

  • Add the simplify = T argument that can return the data frame instead of a list, so that I can use [,1] to extract the first half one.

      data %>% mutate(sublabel = str_split(label, "_", simplify = T)[,1])
  • Use separate() function instead of str_split() through a very clever way to avoid the error.

      data %>% separate(label, c("sublabel1", "sublabel2"))
  • Similar to the first one, but use a more straight and explicit way to extract the first half one with the map_chr() function that can apply a function to each element of a list. So if I want to select the first one in one list, just using map_chr(.,1).

      data %>% mutate(sublabel = str_split(label, "_") %>% map_chr(., 1)) 

This is a brief post, and I hope it will be a reminder for me when I forget something.