0%

COVID-19可视化-Plotly

最近工作上接触到一些Covid-19新冠病毒一些信息,平时也常常看到国内外各个地区确诊患者不断的增加,但是还有一些我想了解的数据并没有在一些主流媒体上呈现出来;因此想通过自己的方式来可视化这些数据,依次解决以下问题:

  • 下载整理每天确诊/死亡/治愈的患者数据
  • 通过地图可视化方式来呈现各国各地区的最新患者数目
  • 使用常规图标来呈现各国患者的变化趋势 #### Covid-19数据下载

如果对于自己爬虫比较熟练的人,可以考虑爬取国内腾讯或者丁香园上的数据(我未尝试。。。);还可以用github上整理并每日更新的Covid-19数据,网址:https://github.com/CSSEGISandData/COVID-19,在此Github上提供了一个Covid-19数据可视化网站https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6,值得学习下~

我选择了直接读取github上的数据(省事并且效率),其大致分了三大类数据:

  • 确诊数据:time_series_covid19_confirmed_US.csvtime_series_covid19_confirmed_global.csv,两者区别在于前者是详细的美国各个州区的数据,后者的全球的数据(也包括美国的总数据),以下类似
  • 死亡数据:time_series_covid19_deaths_US.csvtime_series_covid19_deaths_global.csv
  • 治愈数据:time_series_covid19_recovered_global.csv

接着通过data.table::fread直接读取github上的数据文件,由于考虑到美国确诊数据地区过于详细,需要先对于处理再和global数据合并,我的方式如下(将每个Provinc_State下的所有数据进行加和):

library(dplyr)
library(plotly)
library(forcats)
library(tidyr)

#-------------------------------------------------------------------
# Covid19 data url
url <- "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/"
url_comfirmed_global <- paste0(url, "time_series_covid19_confirmed_global.csv")
url_comfirmed_US <- paste0(url, "time_series_covid19_confirmed_US.csv")
url_death_global <- paste0(url, "time_series_covid19_deaths_global.csv")
url_recovered_global <- paste0(url, "time_series_covid19_recovered_global.csv")

#-------------------------------------------------------------------
# Input data
data_comfirmed_global <- data.table::fread(url_comfirmed_global) %>%
  rename(Province_State = `Province/State`, Country_Region = `Country/Region`)

data_comfirmed_US_all <- data.table::fread(url_comfirmed_US) %>%
  rename(Long = Long_) %>%
  select(one_of(names(data_comfirmed_global)))
data_comfirmed_US_part1 <- data_comfirmed_US_all %>%
  group_by(Country_Region, Province_State) %>%
  summarise_at((5 - 2):(ncol(.) - 2), sum)
data_comfirmed_US_part2 <- data_comfirmed_US_all %>%
  group_by(Country_Region, Province_State) %>%
  summarise_at(1:2, median)
data_comfirmed_US <- left_join(data_comfirmed_US_part2, data_comfirmed_US_part1)

data_comfirmed <- bind_rows(filter(data_comfirmed_global, Country_Region != "US"), data_comfirmed_US) %>%
  mutate(Total = .[[ncol(.)]])

data_death_global <- data.table::fread(url_death_global) %>%
  rename(Province_State = `Province/State`, Country_Region = `Country/Region`)

data_recovered_global <- data.table::fread(url_recovered_global) %>%
  rename(Province_State = `Province/State`, Country_Region = `Country/Region`)

以下示例以确诊数据为例

Plotly map visualization

Map可视化在R中有不少的包可以实现,简单罗列下有如下几个:

  • plotly
  • leaflet
  • mapdeck
  • mapview
  • mapdeck
  • ggmap
  • tmap

基于我需要有一定的交互并且能与shiny结合的功能,我选择了plotly,其关于map的示例可看:https://plotly.com/r/maps/

以scattermapbox为例,此可视化方式需要Mapbox的账户,并且加载其token,参考:https://plotly.com/r/scattermapbox/

#-------------------------------------------------------------------
# Mapbox Setup
Sys.setenv("MAPBOX_TOKEN" = "your token")

#-------------------------------------------------------------------
data_comfirmed$Size <- ifelse(data_comfirmed$Total > 200000, 90,
  ifelse(data_comfirmed$Total > 100000, 60,
    ifelse(data_comfirmed$Total > 50000, 45,
      ifelse(data_comfirmed$Total > 15000, 30,
        ifelse(data_comfirmed$Total > 8000, 24,
          ifelse(data_comfirmed$Total > 4000, 16,
            ifelse(data_comfirmed$Total > 2000, 8,
              ifelse(data_comfirmed$Total > 1000, 4,
                ifelse(data_comfirmed$Total > 500, 2, 1)
              )
            )
          )
        )
      )
    )
  )
)


data_comfirmed$Size_title <- ifelse(data_comfirmed$Size == 90, "> 200000",
  ifelse(data_comfirmed$Size == 60, "> 100000-200000",
    ifelse(data_comfirmed$Size == 45, "> 50000-100000",
      ifelse(data_comfirmed$Size == 30, "> 15000-50000",
        ifelse(data_comfirmed$Size == 24, "> 8000-15000",
          ifelse(data_comfirmed$Size == 16, "> 4000-8000",
            ifelse(data_comfirmed$Size == 8, "> 2000-4000",
              ifelse(data_comfirmed$Size == 4, "> 1000-2000",
                ifelse(data_comfirmed$Size == 2, "> 500-1000", "0-500")
              )
            )
          )
        )
      )
    )
  )
)

data_comfirmed$Size_title <- fct_relevel(data_comfirmed$Size_title, c("0-500", "> 500-1000", "> 1000-2000", "> 2000-4000", "> 4000-8000", "> 8000-15000", "> 15000-50000", "> 50000-100000", "> 100000-200000", "> 200000"))

fig <- data_comfirmed %>%
  plot_mapbox(
    lat = ~Lat, lon = ~Long,
    split = ~Size_title,
    color = I("red"),
    size = ~Size,
    text = ~ paste(Total, Country_Region, Province_State, sep = ", "),
    mode = "scattermapbox", hoverinfo = "text"
  ) %>%
  layout(
    title = list(text = "COVID19 by Country", x = 0),
    font = list(color = "white"),
    plot_bgcolor = "#191A1A", paper_bgcolor = "#191A1A",
    mapbox = list(style = "dark"), # carto-positron, dark
    # showlegend = F,
    legend = list(
      orientation = "h",
      font = list(size = 8)
    ),
    margin = list(
      l = 25, r = 25,
      b = 25, t = 25,
      pad = 2
    )
  ) %>%
  config(mapboxAccessToken = Sys.getenv("MAPBOX_TOKEN"))
fig

show


通过Mapbox Choropleth Maps可以展示每个国家的新增确诊人数(类似于heatmap),参考:https://plotly.com/r/mapbox-county-choropleth/;另还需要对确诊数据做些转化,然后增加国家code以及新增人数变量;增加国家code可以用到countrycode

data_comfirmed_newcase <- data.table::fread(url_comfirmed_global) %>%
  rename(Province_State = `Province/State`, Country_Region = `Country/Region`) %>%
  group_by(Country_Region) %>%
  summarise_at((5 - 1):(ncol(.) - 1), sum) %>%
  mutate(
    Code = countrycode(Country_Region, origin = "country.name", destination = "iso3c"),
    Total = .[[ncol(.)]],
    Newcase = .[[ncol(.)]] - .[[ncol(.) - 1]]
  )

对于Mapbox Choropleth Maps而言,我们还需要一个全球的geojson文件,对于此文件的获取,我直接从kaggle一个比赛的数据中下载并调用了,下载地址: https://github.com/datasets/geo-countries

#----------------------------------------------------------------
cty <- rjson::fromJSON(file = "../Desktop/world-countries/world-countries.json")
fig <- plot_ly() %>%
  add_trace(
    type = "choroplethmapbox",
    geojson = cty,
    locations = data_comfirmed_newcase$Code,
    z = data_comfirmed_newcase$Newcase,
    colorscale = "Portland",
    zmin = 0,
    zmax = 10000,
    marker = list(
      line = list(
        width = 0
      ),
      opacity = 0.5
    )
  ) %>%
  layout(
    title = list(text = "COVID19 by Country", x = 0),
    font = list(color = "white"),
    plot_bgcolor = "#191A1A", paper_bgcolor = "#191A1A",
    mapbox = list(
      style = "dark",
      zoom = 1,
      center = list(lon = -30, lat = 30)
    ),
    legend = list(
      x = 0.3, y = -2,
      orientation = "h",
      font = list(size = 8)
    ),
    margin = list(
      l = 25, r = 25,
      b = 25, t = 25,
      pad = 2
    )
  ) %>%
  config(mapboxAccessToken = Sys.getenv("MAPBOX_TOKEN"))

fig

show


如果不用mapbox的话,也可以展示一个比较简洁的map图(Choropleth maps),参考:https://plotly.com/r/choropleth-maps/,此时不一定需要geojson文件,只需要经纬度坐标和国家code即可

fig <- plot_geo(data_comfirmed_newcase) %>%
  add_trace(
    z = ~Newcase, color = ~Newcase, colors = "Blues",
    text = ~Country_Region, locations = ~Code, marker = list(line = list(color = toRGB("grey"), width = 0.5))
  ) %>%
  layout(
    title = 'COVID19 by Country',
    geo = list(
      showframe = FALSE,
      showcoastlines = FALSE,
      projection = list(type = "Mercator")
    )
  )

fig

show


如果想看下城市之间的确诊人数密度分布,可以参考:https://plotly.com/r/mapbox-density-heatmaps/,主要用到城市对应的经纬度参数以及城市名称

fig <- data_comfirmed %>%
  plot_ly(
    type = 'densitymapbox',
    lat = ~Lat,
    lon = ~Long,
    coloraxis = 'coloraxis',
    radius = 12
  ) %>%
  layout(
    mapbox = list(
      style="stamen-terrain",
      center= list(lon=180)), coloraxis = list(colorscale = "Viridis")
  )

fig

show

Data visualization-animation

比如我想看下随着时间的变化,各国确诊人数以及新增确诊人数的变化趋势,以气泡图展示;对于此问题,可以通过动态气泡图来实现,刚好plotly也可以实现

首先需要数据进行转化:

  • 每日确诊总人数转化为每日新增人数,用diff()函数即可
  • 挑选出确诊人数Top10的国家
  • 将Year变量从行变成列
  • 合并确诊人数和新增确诊人数数据

数据转化如下:

#---------------------------------------------------------------------
data_allcountry_tf <- data.table::fread(url_comfirmed_global) %>%
  rename(Province_State = `Province/State`, Country_Region = `Country/Region`) %>%
  group_by(Country_Region) %>%
  summarise_at((5 - 1):(ncol(.) - 1), sum)

top_country <- data_allcountry_tf %>%
  mutate(Total = .[[ncol(.)]]) %>%
  arrange(desc(Total)) %>%
  head(10) %>% .$Country_Region

data_allcountry_comfirmed <- data_allcountry_tf %>% gather("year", "comfirm", colnames(.)[2:ncol(.)])

data_allcountry_newcase <- plyr::ddply(data_allcountry_tf, "Country_Region", function(x) {
  diff(c(0, as.numeric(x[2:ncol(x)])))
})
names(data_allcountry_newcase)[2:ncol(data_allcountry_newcase)] <- colnames(data_allcountry_tf)[2:ncol(data_allcountry_tf)]
data_allcountry_newcase <- data_allcountry_newcase %>% gather("year", "newcase", colnames(.)[2:ncol(.)])

data_allcountry <- left_join(data_allcountry_comfirmed, data_allcountry_newcase) %>%
  filter(Country_Region %in% top_country)
data_allcountry$year <- factor(data_allcountry$year, unique(data_allcountry$year))

data_allcountry$Size <- ifelse(data_allcountry$comfirm > 200000, 180,
  ifelse(data_allcountry$comfirm > 100000, 120,
    ifelse(data_allcountry$comfirm > 50000, 90,
      ifelse(data_allcountry$comfirm > 15000, 60,
        ifelse(data_allcountry$comfirm > 8000, 45,
          ifelse(data_allcountry$comfirm > 4000, 35,
            ifelse(data_allcountry$comfirm > 2000, 16,
              ifelse(data_allcountry$comfirm > 1000, 8,
                ifelse(data_allcountry$comfirm > 500, 4, 2)
              )
            )
          )
        )
      )
    )
  )
)

然后通过plotlyanimation_opts()animation_button()以及animation_slider()函数绘制动画气泡图

pty <- data_allcountry %>%
  plot_ly(x = ~comfirm, y = ~newcase) %>%
  layout(showlegend = F) %>%
  add_text(
    x = ~comfirm,
    y = ~newcase,
    text = ~Country_Region,
    frame = ~year,
    size = I(18),
    textposition = "top right"
  ) %>%
  add_markers(size = ~ I(comfirm / 1000), sizes = c(10000, 1000), color = ~Country_Region, frame = ~year, ids = ~Country_Region) %>%
  animation_opts(400, easing = "linear", redraw = FALSE) %>%
  animation_button(
    x = 1, xanchor = "right", y = 0, yanchor = "bottom"
  ) %>%
  animation_slider(
    currentvalue = list(prefix = "YEAR", font = list(color = "red"))
  )
pty

show

看起来确实略微有点简陋,但是Plotly的R库中还有其他可视化方式,比如我想用饼图来展示各个大洲下各国的死亡人数,如下:

library(plotly)
library(dplyr)
library(countrycode)

# Country to continent code
cty2ctt <- data.table::fread("https://pkgstore.datahub.io/JohnSnowLabs/country-and-continent-codes-list/country-and-continent-codes-list-csv_csv/data/b7876b7f496677669644f3d1069d3121/country-and-continent-codes-list-csv_csv.csv")

# Input world death data
death_df <- data.table::fread(url_death_global) %>%
  rename(Province_State = `Province/State`, Country_Region = `Country/Region`) %>%
  group_by(Country_Region) %>%
  summarise_at((5 - 1):(ncol(.) - 1), sum) %>%
  mutate(Total = .[[ncol(.)]],
         Code = countrycode(Country_Region, origin = "country.name", destination = "iso2c")) %>%
  left_join(cty2ctt, by = c("Code" = "Two_Letter_Country_Code")) %>%
  filter(!is.na(Continent_Name)) %>%
  dplyr::distinct(Country_Region, .keep_all = TRUE) %>%
  select(one_of(c("Country_Region", "Continent_Name", "Total")))

death_top <- arrange(death_df, desc(Total)) %>% head(40)

# Combined data for pie graph visual
temp <- dplyr::group_by(death_top, Continent_Name) %>% 
  dplyr::summarise(Total = sum(as.numeric(Total)))
temp <- data.frame(labels = temp$Continent_Name, parents = "World", values = temp$Total, stringsAsFactors = F)
world <- data.frame(labels = "World", parents = "", values = sum(temp$values), stringsAsFactors = F)
names(death_top) <- c("labels", "parents", "values")

death_res <- rbind(world, temp, death_top)
death_res <- as.data.frame(sapply(death_res, function(x){as.factor(x)} ))

fig <- plot_ly(
  labels = death_res$labels,
  parents = death_res$parents,
  values = death_res$values,
  type = 'sunburst',
  branchvalues = 'total'
)
fig

show

其他还有很多可视化的方法,比如展现各国确诊人数的时序变化曲线图,各国峰值变化图等等;以及不同的数据来源,如死亡数据/治愈数据等等,均可参考Plotly的R库:https://plotly.com/r/

本文出自于http://www.bioinfo-scrounger.com转载请注明出处