Introduction

We will use rvest and magrittr to scrape elemental properties from periodictable.com, and then plot the data in the form of the periodic table using ggplot2.

It is very common to visualise periodic trends in the elemental properties by using the well-established IUPAC periodic table as a canvas. In this post we will demonstrate how common R tools can make the job quickly and near effortless, once the data is at hand.

library(ggplot2)
library(dplyr)
library(grid)
library(XML)
library(httr)
library(rvest)
library(magrittr)
library(stringr)
library(knitr)

Collecting elemental data

The website periodictable.com lists a large number of properties for each element, and the data is displayed as not overly complicated HTML tables. The website states that it was created with Mathematica (by Wolfram Research), but even so, the quality of the data on the website is not too good. It appears to have suffered from whatever conversion was applied from the original Mathematica format.

In any case, in this post the quality of the data is secondary to the goal of achieving a working proof of concept. Plus, our own scraping will inevatibly degrade data quality even more.

# html' is deprecated. Use 'read_html' instead.
#element_data <- html("http://periodictable.com/Elements/001/data.html")
element_data <- read_html("http://periodictable.com/Elements/001/data.html")

Starting from this URL, we crawl the page and collect the URLs to all property pages, along with the name of each property. We put them together in a dataframe, so it is clear which URL belongs to which elemental property.

# extract raw links and place it in a dataframe together with the link text
elemental_properties <-
   data.frame(property = element_data %>%
                 html_nodes(xpath = "//a") %>%
                 html_text())
elemental_properties$links_raw <-
   element_data %>%
   html_nodes(xpath = "//a") %>%
   html_attr("href")
# drop any rows that do not begin "../../"
elemental_properties <-
   elemental_properties[
      grep(pattern = "^\\.\\./\\.\\./",
           x = elemental_properties$links_raw), ]
# remove the leading ../.. (relative link signifier)
elemental_properties$links_trail <-
   gsub(pattern = "^\\.\\./\\.\\./",
        replacement = "",
        x = elemental_properties$links_raw)
# keep only the rows that begin with "Properties"
elemental_properties <-
   elemental_properties[
      grep(pattern = "^Properties",
           x = elemental_properties$links_trail), ]
# make full urls
elemental_properties$url <-
   paste0("http://periodictable.com/",
          elemental_properties$links_trail)
# modify the URL so it sorts the table by atomic number
# (this is a feature of periodictable.com, they offer lists sortered in different ways
# for each property)
elemental_properties$url <-
   sub(pattern = "\\.html$",
       replacement = ".an.html",
       x = elemental_properties$url)
# we are done, go ahead and drop the no longer required columns
# links_raw, links_trail
elemental_properties <-
   elemental_properties[, -c(2,3)]
# oh, and add a sanitised version of the property names
elemental_properties$sanitized <-
   # replace spaces with underscores
   gsub("\\s+", "_",
        # replace "%" with "percent"
        gsub("%", "Percent",
             # remove parentheses and dashes
             gsub("[()-]", "",
                  # remove apostrophes
                  gsub("'", "", elemental_properties$property))))
# NOTE: Melting point and Boiling point are duplicated because they are displayed twice
#       on the data page for Hydrogen, both under "Overview" and "Thermal properties".
#       So we should deduplicate the dataframe.
elemental_properties <- unique(elemental_properties)
# reset the row numbering
row.names(elemental_properties) <- seq(1, dim(elemental_properties)[1])

All elemental properties

The list of elemental properties available from periodictable.com. As collected (may contain some duplicates, these will be removed later).

print(elemental_properties$property)
##  [1] "Name"                           "Symbol"                        
##  [3] "Atomic Number"                  "Atomic Weight"                 
##  [5] "Density"                        "Melting Point"                 
##  [7] "Boiling Point"                  "Phase"                         
##  [9] "Absolute Melting Point"         "Absolute Boiling Point"        
## [11] "Critical Pressure"              "Critical Temperature"          
## [13] "Heat of Fusion"                 "Heat of Vaporization"          
## [15] "Heat of Combustion"             "Specific Heat"                 
## [17] "Adiabatic Index"                "Neel Point"                    
## [19] "Thermal Conductivity"           "Thermal Expansion"             
## [21] "Density (Liquid)"               "Molar Volume"                  
## [23] "Brinell Hardness"               "Mohs Hardness"                 
## [25] "Vickers Hardness"               "Bulk Modulus"                  
## [27] "Shear Modulus"                  "Young Modulus"                 
## [29] "Poisson Ratio"                  "Refractive Index"              
## [31] "Speed of Sound"                 "Valence"                       
## [33] "Electronegativity"              "ElectronAffinity"              
## [35] "Ionization Energies"            "Autoignition Point"            
## [37] "Flashpoint"                     "DOT Hazard Class"              
## [39] "DOT Numbers"                    "EU Number"                     
## [41] "NFPA Fire Rating"               "NFPA Hazards"                  
## [43] "NFPA Health Rating"             "NFPA Reactivity Rating"        
## [45] "RTECS Number"                   "NFPA Label"                    
## [47] "Alternate Names"                "Names of Allotropes"           
## [49] "Block"                          "Group"                         
## [51] "Period"                         "Electron Configuration"        
## [53] "Color"                          "Discovery"                     
## [55] "Gas phase"                      "CAS Number"                    
## [57] "CID Number"                     "Gmelin Number"                 
## [59] "NSC Number"                     "Electrical Type"               
## [61] "Electrical Conductivity"        "Resistivity"                   
## [63] "Superconducting Point"          "Magnetic Type"                 
## [65] "Curie Point"                    "Mass Magnetic Susceptibility"  
## [67] "Molar Magnetic Susceptibility"  "Volume Magnetic Susceptibility"
## [69] "% in Universe"                  "% in Sun"                      
## [71] "% in Meteorites"                "% in Earth's Crust"            
## [73] "% in Oceans"                    "% in Humans"                   
## [75] "Atomic Radius"                  "Covalent Radius"               
## [77] "Van der Waals Radius"           "Crystal Structure"             
## [79] "Lattice Angles"                 "Lattice Constants"             
## [81] "Space Group Name"               "Space Group Number"            
## [83] "Half-Life"                      "Lifetime"                      
## [85] "Decay Mode"                     "Quantum Numbers"               
## [87] "Neutron Cross Section"          "Neutron Mass Absorption"       
## [89] "Known Isotopes"                 "Stable Isotopes"               
## [91] "Isotopic Abundances"

Collect the values for each property

Each property, e.g., density, will have a value for each element of the periodic table. This value is just a string, and depending on the type of the property, it may be just a number, or a quantity with a unit, or some text with various attributes.

At this stage, we don’t mind the internal structure of the value, we just want to collect it. We will collect all property values into one dataframe.

Looking slightly ahead, you will realise that the only way to allow different types (character, numeric, etc.) in different columns is if each property is mapped onto one column. Thus, we will build the dataframe with properties as columns, and elements as rows.

# get the name of all the elements (two-step process)
element_names <-
   read_html(elemental_properties$url[1]) %>%
   html_nodes("table") %>%
   extract2(1) %>%
   html_nodes("table") %>%
   extract2(8) %>%
   html_nodes("td") %>%
   html_text()
element_names <-
   element_names[
      read_html(elemental_properties$url[1]) %>%
         html_nodes("table") %>%
         extract2(1) %>%
         html_nodes("table") %>%
         extract2(8) %>%
         html_nodes("td") %>%
         html_attr("align") == "left"
      ]
# create a properly oriented elements dataframe
elements_raw <-
   data.frame(matrix(data = "",
                     ncol = dim(elemental_properties)[1],
                     nrow = length(element_names),
                     byrow = TRUE))
names(elements_raw) <- elemental_properties$sanitized
elements_raw$Name <- element_names

Some property pages (listed below) are difficult to parse (due to the way the data is presented on the webpage). For now, we just skip those pages (no big loss).

skip_properties <-
   c("Ionization_Energies",
     "NFPA_Hazards",
     "NFPA_Label",
     "Names_of_Allotropes",
     "Discovery",
     "Crystal_Structure",
     "Lattice_Angles",
     "Lattice_Constants",
     "Known_Isotopes",
     "Stable_Isotopes",
     "Isotopic_Abundances")
# drop those rows from elemental_properties
elemental_properties <-
   elemental_properties[-which(elemental_properties$sanitized %in% skip_properties), ]
# drop those columns from elements_raw
elements_raw <-
   elements_raw[, -which(names(elements_raw) %in% skip_properties)]

Ok, now we have the skeleton for a dataframe. Let’s populate it.

# re-read only if rda file does not exist (saves time when compiling)
if (!file.exists("element-data-raw.rda")) {
   for (k in 2:length(elemental_properties$property)) {
      message(paste0("Reading property page (", k-1, " of ", length(elemental_properties$property)-1, "): ", elemental_properties$property[k]))
      property <-
         read_html(elemental_properties$url[k]) %>%
         html_nodes("table") %>%
         extract2(1) %>%
         html_nodes("table") %>%
         extract2(8) %>%
         html_nodes("td") %>%
         html_text()
      property <-
         property[
            read_html(elemental_properties$url[k]) %>%
               html_nodes("table") %>%
               extract2(1) %>%
               html_nodes("table") %>%
               extract2(8) %>%
               html_nodes("td") %>%
               html_attr("align") == "left"
            ]
      # for debugging purposes
      # print(paste(elemental_properties$property[k], ":", length(property)))
      # assign to elements_raw
      elements_raw[, k] <- property
   }
   # save this dataframe to file
   # (it is not large, but re-scraping the contents takes time)
   save(elements_raw, file = "element-data-raw.rda")
} else {
   load(file = "element-data-raw.rda")
}

Elemental property data collected! Next, we need to sanitize the data. As we mentioned earlier, the values may be of several different types. The biggest job is to deal with quantities and units.

There might well be an easier way to do this, but to make the coding easy, we will create two empty dataframes (based on the existing elements_raw, with identical dimensions and column names), and place quantities (and unit-less values) into one, and units into the other.

# creating new dataframes, templated on elements_raw
# notes is not used
values <- units <- notes <-
   data.frame(matrix(nrow = dim(elements_raw)[1],
                     ncol = dim(elements_raw)[2],
                     dimnames = list(seq(1, dim(elements_raw)[1]),
                                     names(elements_raw)),
                     byrow = TRUE))
# create a work-copy of elements_raw
elements_tmp <- elements_raw

The tricky part will be to correctly determine what part of the values string is a quantity and what part is a unit. We will have to resort to regular expressions. Because of that, the following code is probably the most likely to break if anything should change at the data source.

Clean up all the value strings

We will use regular expressions to identify the different parts of each value string. See comments in the code below. Unfortunately, we found no way to clean up the value strings of each type of column (non-numeric, numbers-only, and quantity+unit) without using explicit, static assignments, as seen in the following three chunks. It is ugly, but it works, as long as no properties (i.e., columns) are added, removed, or renamed at the source.

# non-numeric columns to copy requiring only minor cleanup
# col numbers: 1  2  8 17 39 43 44 45 48 49 50 51 52 53 54 55 59 73 77 78
# 73 (Space_Group_Name, may require some fixing of notation)
cols <-
   c("Name",
     "Symbol",
     "Phase",
     "Adiabatic_Index",
     "EU_Number",
     "RTECS_Number",
     "Alternate_Names",
     "Block",
     "Group",
     "Period",
     "Electron_Configuration",
     "Color",
     "Gas_phase",
     "CAS_Number",
     "CID_Number",
     "Gmelin_Number",
     "NSC_Number",
     "Electrical_Type",
     "Magnetic_Type",
     "Space_Group_Name",
     "Decay_Mode",
     "Quantum_Numbers")
for (k in 1:length(cols)) {
   values[, which(names(values) == cols[k])] <-
      gsub("^None$", "",
           gsub("^N/A$", "",
                gsub("\\[note\\]",
                     "",
                     elements_tmp[, which(names(elements_tmp) == cols[k])])
                )
           )
   # set empty strings to NA (proper NA, not the string "NA")
   values[which(values[, which(names(values) == cols[k])] == ""),
          which(names(values) == cols[k])] <- NA
}
# numbers only cols
cols <-
   c("Atomic_Number",
     "Atomic_Weight",
     "Molar_Volume",
     "Poisson_Ratio",
     "Refractive_Index",
     "Valence",
     "Electronegativity",
     "DOT_Hazard_Class",
     "DOT_Numbers",
     "NFPA_Fire_Rating",
     "NFPA_Health_Rating",
     "NFPA_Reactivity_Rating",
     "Superconducting_Point",
     "Mass_Magnetic_Susceptibility",
     "Molar_Magnetic_Susceptibility",
     "Volume_Magnetic_Susceptibility",
     "Space_Group_Number",
     "Neutron_Cross_Section",
     "Neutron_Mass_Absorption")
for (k in 1:length(cols)) {
   values[, which(names(values) == cols[k])] <-
      as.numeric(gsub("×10", "E",
                      gsub("\\[note\\]",
                           "",
                           elements_tmp[, which(names(elements_tmp) == cols[k])])
                      )
                 )
}
# numbers and units cols
cols <-
   c("Density",
     "Melting_Point",
     "Boiling_Point",
     "Absolute_Melting_Point",
     "Absolute_Boiling_Point",
     "Critical_Pressure", # contains converted values in parentheses
     "Critical_Temperature",
     "Heat_of_Fusion",
     "Heat_of_Vaporization",
     "Heat_of_Combustion",
     "Specific_Heat", # complex unit J/(Kg K)
     "Neel_Point",
     "Thermal_Conductivity", # complex unit W/(m K)
     "Thermal_Expansion",
     "Density_Liquid",
     "Brinell_Hardness",
     "Mohs_Hardness",
     "Vickers_Hardness",
     "Bulk_Modulus",
     "Shear_Modulus",
     "Young_Modulus",
     "Speed_of_Sound",
     "ElectronAffinity",
     "Autoignition_Point",
     "Flashpoint",
     "Electrical_Conductivity",
     "Resistivity",
     "Curie_Point",
     "Percent_in_Universe",
     "Percent_in_Sun", # makes use of "None" in lieu of zero
     "Percent_in_Meteorites", # makes use of "None" in lieu of zero
     "Percent_in_Earths_Crust", # makes use of "None" in lieu of zero
     "Percent_in_Oceans", # makes use of "None" in lieu of zero
     "Percent_in_Humans", # makes use of "None" in lieu of zero
     "Atomic_Radius",
     "Covalent_Radius",
     "Van_der_Waals_Radius",
     "HalfLife", # mix between "Stable" and num + unit
     "Lifetime") # mix between "Stable" and num + unit
for (k in 1:length(cols)) {
   for (i in 1:length(elements_tmp[, which(names(elements_tmp) %in% cols[k])])) {
      # replace the string "None" with actual zero
      elements_tmp[, which(names(elements_tmp) %in% cols)][i, k] <-
         ifelse(elements_tmp[, which(names(elements_tmp) %in% cols)][i, k] == "None",
                "0",
                elements_tmp[, which(names(elements_tmp) %in% cols)][i, k])
      # replace the string "Stable" with "Inf"
      elements_tmp[, which(names(elements_tmp) %in% cols)][i, k] <-
         ifelse(elements_tmp[, which(names(elements_tmp) %in% cols)][i, k] == "Stable",
                "Inf",
                elements_tmp[, which(names(elements_tmp) %in% cols)][i, k])
      # look for numbers (also for "Inf")
      mtch <-
         regexpr(pattern = "Inf|[-×\\.0-9]+",
                 text = elements_tmp[, which(names(elements_tmp) %in% cols)][i, k])
      # assign numeric value to "values"
      values[i, which(names(values) == cols[k])] <-
         as.numeric(
            sub(
               pattern = "×10",
               replacement = "E",
               x = substr(x = elements_tmp[, which(names(elements_tmp) %in% cols)][i, k],
                          start = 1,
                          stop = attr(mtch, "match.length"))))
      # assign unit part to "units" by eliminating the numeric part
      units[i, which(names(values) == cols[k])] <-
         # remove leading or trailing spaces
         sub("^\\s+", "", sub("\\s+$", "",
                 # remove the numeric part
                 sub("Inf|[-×\\.0-9]+", "",
                     # remove any numbers+units within parentheses
                     sub("\\([.0-9]+\\s[A-Za-z]+\\)", "",
                         # remove notes
                         sub("\\[note\\]", "",
                             # remove "N/A" strings
                             sub("N/A",
                                 "",
                                 elements_tmp[, which(
                                    names(elements_tmp) %in% cols)][i, k]
                                 )
                             )
                         )
                     )
                 )
             )
   }
   # set empty strings to NA (proper NA, not the string "NA")
   values[which(values[, which(names(values) == cols[k])] == ""),
          which(names(values) == cols[k])] <- NA
}

Cleaning up the units

Now that we have separated the units and the quantities into separate strings (dataframes, actually), let’s have a look at the units, to see if there’s any fixing needed.

# Units by property (looking at the first 10)
tail(sapply(units, unique), 10)
## $Covalent_Radius
## [1] "pm" ""  
## 
## $Van_der_Waals_Radius
## [1] "pm" ""  
## 
## $Space_Group_Name
## [1] NA
## 
## $Space_Group_Number
## [1] NA
## 
## $HalfLife
## [1] ""   "y"  "h"  "d"  "m"  "ms"
## 
## $Lifetime
## [1] ""   "y"  "h"  "d"  "m"  "ms"
## 
## $Decay_Mode
## [1] NA
## 
## $Quantum_Numbers
## [1] NA
## 
## $Neutron_Cross_Section
## [1] NA
## 
## $Neutron_Mass_Absorption
## [1] NA

As you can see, some properties (for example HalfLife) use more than one unit. This is problematic, since we will only be plotting against one y-axis, not several. So we will have to convert all such occurrences to their standard units, which means we have to take the numerical conversion of the quantity into consideration as well. Let’s do it.

First, let’s see all units in the set.

# All units in the dataset
cat(paste(sort(unique(unlist(sapply(units, unique)))), collapse = "\n"))
## 
## %
## °C
## d
## g/cc
## g/cm3
## g/l
## GPa
## h
## J/(Kg K)
## K
## K-1
## kJ/mol
## KJ/mol
## m
## m Ω
## MPa
## ms
## m/s
## pm
## S/m
## W/(m K)
## y

Some are equivalent to each other, and others can be reduced to base SI units, according to the following list, which we put together manually after inspecting the output above.

## %
## GPa         => Pa
## J/(Kg K)
## K
## K-1
## KJ/mol      => kJ/mol (wrong case)
## MPa         => Pa
## S/m
## W/(m K)
## d           => s
## g/cc        => kg/m^3
## g/cm3       => kg/m^3
## g/l         => kg/m^3
## h           => s
## kJ/mol
## m
## m Ω
## m/s
## ms          => s
## pm          => m
## y           => s
## °C          => K

Someone should probably tell periodictable.com that they have a typo in some of their values with units of KJ/mol instead of kJ/mol.

Building up a match-and-replace dataframe to convert non-standard or simplifiable units to standard units. This will help simplify any visualisations as well as possible comparisons between properties.

# Build up a match-and-convert dataframe based on the list above
pcf <-
   # find any values with these units...
   data.frame(pattern = c("GPa",
                          "KJ/mol",
                          "MPa",
                          "d",
                          "g/cc",
                          "g/cm3",
                          "g/l",
                          "h",
                          "ms",
                          "pm",
                          "y",
                          "°C"),
              # ... and convert them into these units
              convert = c("Pa",
                          "kJ/mol",
                          "Pa",
                          "s",
                          "kg/m^3",
                          "kg/m^3",
                          "kg/m^3",
                          "s",
                          "s",
                          "m",
                          "s",
                          "K"),
              # using these conversion factors
              factor = c("1E9",
                         "1",
                         "1E6",
                         "86400",
                         "1E3",
                         "1E3",
                         "1",
                         "3600",
                         "1E-3",
                         "1E-12",
                         "3.154E7",
                         "+273.15"))
# Here we replace the values and the units according to pcf
for (k in 1:dim(units)[2]) {
   # if the entire column is NA, move to the next one
   if (all(is.na(units[, k]))) {next}
   for (i in 1:dim(units)[1]) {
      # jump to the next cell immediately if unit is empty
      if (units[i, k] == "") {next}
      # for each cell, compare the unit to pcf$pattern,
      # and if they match, replace it with pcf$convert
      # and apply the conversion factor on the value
      # match() returns the position of the match
      # in pcf$pattern, or NA if no match was found
      hit <- match(units[i, k], pcf$pattern)
      if (!is.na(hit)) {
         # go ahead and replace unit and convert value
         units[i, k] <- pcf$convert[hit]
         # had to find a way to handle the odd addition operation,
         # opted to do it with a string operation and this if-else
         if (substr(pcf$factor[hit], 1, 1) %in% c("+", "-")) {
            if (substr(pcf$factor[hit], 1, 1) == "-") {
               values[i, k] <-
                  values[i, k] + as.numeric(pcf$factor[hit])
            } else {
               values[i, k] <-
                  values[i, k] + as.numeric(sub("^\\+", "", pcf$factor[hit]))
            }
         } else {
            # not addition/subtraction operation
            values[i, k] <-
               values[i, k] * as.numeric(pcf$factor[hit])
         }
      }
   }
}

Visualising elemental data

First, an abridged look at the data itself.

# show the first 10 elemental property quantities/values
str(values, list.len = 10)
## 'data.frame':    118 obs. of  80 variables:
##  $ Name                          : chr  "Hydrogen" "Niobium" "Thallium" "Helium" ...
##  $ Symbol                        : chr  "H" "Nb" "Tl" "He" ...
##  $ Atomic_Number                 : num  1 41 81 2 42 82 3 43 83 4 ...
##  $ Atomic_Weight                 : num  1.01 92.91 204.38 4 95.94 ...
##  $ Density                       : num  8.99e-02 8.57e+03 1.18e+04 1.78e-01 1.03e+04 ...
##  $ Melting_Point                 : num  14 2750 577 NA 2896 ...
##  $ Boiling_Point                 : num  20.28 5017.15 1746.15 4.22 4912.15 ...
##  $ Phase                         : chr  "Gas" "Solid" "Solid" "Gas" ...
##  $ Absolute_Melting_Point        : num  14 2750 577 NA 2896 ...
##  $ Absolute_Boiling_Point        : num  20.28 5017 1746 4.22 4912 ...
##   [list output truncated]
# show the first 10 elemental units
str(units, list.len = 10)
## 'data.frame':    118 obs. of  80 variables:
##  $ Name                          : logi  NA NA NA NA NA NA ...
##  $ Symbol                        : logi  NA NA NA NA NA NA ...
##  $ Atomic_Number                 : logi  NA NA NA NA NA NA ...
##  $ Atomic_Weight                 : logi  NA NA NA NA NA NA ...
##  $ Density                       : chr  "kg/m^3" "kg/m^3" "kg/m^3" "kg/m^3" ...
##  $ Melting_Point                 : chr  "K" "K" "K" "" ...
##  $ Boiling_Point                 : chr  "K" "K" "K" "K" ...
##  $ Phase                         : logi  NA NA NA NA NA NA ...
##  $ Absolute_Melting_Point        : chr  "K" "K" "K" "" ...
##  $ Absolute_Boiling_Point        : chr  "K" "K" "K" "K" ...
##   [list output truncated]

Next, let’s try a typical plot of a property against atomic number.

Atomic weight vs Z.

Another plot of a property (density, this time) against atomic number.

Density vs Z.

Ok, it looks as one would expect. Not very exciting, though.

We are now ready to attempt to overlay elemental properties on top of the periodic table. I think this is a worthwhile enterprise, because I have so far yet to see a way to programmatically create a periodic table and overlay data visually on it.

As a long-time LaTeX user myself, the drawing of the periodic table using TikZ by Ivan Griffin was at the same time brilliant and frustrating.

Brilliant, because for the first time it gave writers (well, at least writers familiar with LaTeX and TikZ) the ability to easily create our own periodic tables, as well as to customise them. Frustrating, because there was no easy way to tie the generation of the TikZ-based periodic table into available periodic trend data. Of course, this was an inherent shortcoming of TikZ/LaTeX more than anything else.

In this work, I have used R with the ggplot2 plotting system to achieve this the visualisation of arbitrary elemental data over the canvas of the periodic table.

Final preparations before plotting

The periodic table places the elements next to each other, organised in rows (periods) and columns (groups) on a two-dimensional plot. Obviously, each element’s position on this plot is completely specified by its group and period.

We will therefore introduce new group and new period variables, specifically for the purpose of plotting. We will not use the original group and period data for two reasons: some elements were not assigned a group and period in the original dataset, and we want some flexibility to adjust the coordinates to control the aesthetics of the plot.

# rearrange values dataframe by atomic number
# simply to make assignments based on atomic number (below) possible
values <- arrange(values, Atomic_Number)
values$Graph.Period <- values$Period
units$Graph.Period <- "" # to maintain same dims as values
values$Graph.Group <- values$Group
units$Graph.Group <- "" # to maintain same dims as values
# lanthanoids 57-71: Period = 8, Group = seq(3, 17)
values$Graph.Period[seq(57,71)] <- 8.5
values$Graph.Group[seq(57,71)] <- seq(3, 17)
# actinoids 89-103: Period = 9, Group = seq(3, 17)
# increase period to increase the gap up to the transition block
values$Graph.Period[seq(89,103)] <- 9.5
values$Graph.Group[seq(89,103)] <- seq(3, 17)
# make graphical Group and Period numeric
values$Graph.Period <- as.numeric(values$Graph.Period)
values$Graph.Group <- as.numeric(values$Graph.Group)

With just a few lines, we can make a very quick-and-dirty periodic table (sans data) using ggplot2:

ggplot() +
   scale_y_reverse() +
   geom_point(data = values,
              size = 14,
              shape = 0,
              aes(y = Graph.Period,
                  x = Graph.Group)) +
   geom_text(data = values,
             aes(label = Symbol,
                 y = Graph.Period,
                 x = Graph.Group))

plot of chunk periodic-ggplot2-prototype

Plotting continuous elemental properties

Let’s use the periodic table we created with ggplot2 to plot some of the continuous variables.

ggplot() +
   geom_point(data = values,
              # size 14 fits well with fig.width = 9, fig.height = 5.25
              size = 14,
              # shape #22 allows both fill and colour to be
              # shape #15 only registers colour (is always filled)
              # shape #0 only registers colour (is always empty)
              shape = 15,
              aes(y = Graph.Period, x = Graph.Group, colour = Density)) +
   geom_text(data = values,
             colour = "white",
             size = 3,
             fontface = "bold",
             aes(label = Symbol, y = Graph.Period, x = Graph.Group)) +
   scale_x_continuous(breaks = seq(min(values$Graph.Group),
                                   max(values$Graph.Group)),
                      limits = c(min(values$Graph.Group) - 1,
                                 max(values$Graph.Group) + 1),
                      expand = c(0,0)) +
   scale_y_continuous(trans = "reverse",
                      breaks = seq(min(values$Graph.Period),
                                   max(values$Graph.Period)),
                      limits = c(max(values$Graph.Period) + 1,
                                 min(values$Graph.Period) - 1.5),
                      expand = c(0,0)) +
   # set breaks and labels in the colourbar legend
   scale_colour_continuous(breaks = c(5E3, 10E3, 15E3, 20E3),
                           labels = c("5", "10", "15", "20"),
                           # range of colour
                           low = "#56B1F7", high = "#132B43",
                           # colour if value is NA
                           na.value = "grey70") +
   # plot title (usually property and unit)
   annotate("text", x = 8, y = 0.6,
            vjust = 0,
            label = paste("Density/", "10^3*~kg~m", "^-3", sep=""),
            # parse required if using superscripts or subscripts
            parse = TRUE) +
   theme(panel.grid.major = element_blank(),
         panel.grid.minor = element_blank(),
         plot.margin = unit(c(0, 0, -0.85, -0.85), "line"),
         axis.title = element_blank(),
         axis.text = element_blank(),
         axis.ticks = element_blank(),
         # center (roughly) over transition metal block
         legend.position = c(0.42, 0.91),
         legend.justification = c(0.5, 1),
         legend.direction = "horizontal",
         # make the legend colourbar a little longer
         legend.key.width = unit(2.5, "line"),
         legend.title = element_blank(),
         legend.background = element_rect(fill = "transparent"))

plot of chunk periodic-ggplot2-continuous

A short explanation is in order. The boxes are inherently square, but we adjusted the plot dimensions (fig.width and fig.height in knitr parlance) to make sure the final appearance of the boxes is indeed square. Colour is mapped to the numeric values of the plotted property, using the built-in ggplot2::scale_colour_continuous() function. And we positioned the required legend over the transition metal block, to conform with most other periodic tables out there. Grayed-out elements lack data for the shown property.

Here are some more periodic tables overlaid with other continuous elemental data.

plot of chunk periodic-ggplot2-melting

plot of chunk periodic-ggplot2-boiling

plot of chunk periodic-ggplot2-thermalconductivity

plot of chunk periodic-ggplot2-electricalconductivity

plot of chunk periodic-ggplot2-electronegativity

We could easily change the print size of these plots, as well as export them to most common image formats. We could also easily switch from the current knitr and markdown document system to knitr and LaTeX to take advantage of the excellent math and symbol support of LaTeX.

Plotting discrete elemental properties

ggplot() +
   geom_point(data = values,
              # size 14 fits well with fig.width = 9, fig.height = 5.25
              size = 14,
              # shape #22 allows both fill and colour to be
              # shape #15 only registers colour (is always filled)
              # shape #0 only registers colour (is always empty)
              shape = 15,
              aes(y = Graph.Period, x = Graph.Group, colour = Magnetic_Type)) +
   geom_text(data = values,
             colour = "white",
             size = 3,
             fontface = "bold",
             aes(label = Symbol, y = Graph.Period, x = Graph.Group)) +
   scale_x_continuous(breaks = seq(min(values$Graph.Group),
                                   max(values$Graph.Group)),
                      limits = c(min(values$Graph.Group) - 1,
                                 max(values$Graph.Group) + 1),
                      expand = c(0,0)) +
   scale_y_continuous(trans = "reverse",
                      breaks = seq(min(values$Graph.Period),
                                   max(values$Graph.Period)),
                      limits = c(max(values$Graph.Period) + 1,
                                 min(values$Graph.Period) - 1.5),
                      expand = c(0,0)) +
   scale_colour_discrete(na.value = "grey70") +
   # set the size of the legend boxes independent of geom_point's aes
   guides(colour = guide_legend(override.aes = list(size = 5),
                                title = "Magnetic type",
                                nrow = 2,
                                title.hjust = 0.5)) +
   theme(panel.grid.major = element_blank(),
         panel.grid.minor = element_blank(),
         plot.margin = unit(c(0, 0, -0.85, -0.85), "line"),
         axis.ticks = element_blank(),
         axis.title = element_blank(),
         axis.text = element_blank(),
         # center (roughly) over transition metal block
         legend.position = c(0.42, 0.93),
         legend.justification = c(0.5, 1),
         legend.direction = "vertical",
         legend.key.size = unit(0.8, "line"),
         legend.title = element_text(size = 15,
                                     face = "plain"),
         legend.key = element_rect(fill = "transparent",
                                   colour = "transparent"),
         legend.background = element_rect(fill = "transparent"))

plot of chunk periodic-ggplot2-discrete

Some more with discrete variables:

plot of chunk periodic-ggplot2-phase

plot of chunk periodic-ggplot2-electricaltype

plot of chunk periodic-ggplot2-decaymode

Conclusion

We have successfully demonstrated how ggplot2 may be used to programmatically generate plots of elemental properties in the form of the typical periodic table.

To do this, we also had to scrape and collect a database of elemental property data from public webpages.

Our hope is that this will make it easier for chemists and others to generate periodic tables of whatever trend they wish to visualise.

To make it easier to repeat this code, we have included an MWE below.

Minimal working example

library(ggplot2)
library(dplyr)
library(grid)
library(XML)
library(httr)
library(rvest)
library(magrittr)
library(stringr)
library(knitr)
load(url("http://files.figshare.com/1873544/element_data.rda"))

Use this ggplot2 code to generate a periodic table visualising a continuous variable.

ggplot() +
   geom_point(data = values,
              # size 14 fits well with fig.width = 9, fig.height = 5.25
              size = 14,
              # shape #22 allows both fill and colour to be
              # shape #15 only registers colour (is always filled)
              # shape #0 only registers colour (is always empty)
              shape = 15,
              aes(y = Graph.Period, x = Graph.Group, colour = Density)) +
   geom_text(data = values,
             colour = "white",
             size = 3,
             fontface = "bold",
             aes(label = Symbol, y = Graph.Period, x = Graph.Group)) +
   scale_x_continuous(breaks = seq(min(values$Graph.Group),
                                   max(values$Graph.Group)),
                      limits = c(min(values$Graph.Group) - 1,
                                 max(values$Graph.Group) + 1),
                      expand = c(0,0)) +
   scale_y_continuous(trans = "reverse",
                      breaks = seq(min(values$Graph.Period),
                                   max(values$Graph.Period)),
                      limits = c(max(values$Graph.Period) + 1,
                                 min(values$Graph.Period) - 1.5),
                      expand = c(0,0)) +
   # set breaks and labels in the colourbar legend
   scale_colour_continuous(breaks = c(5E3, 10E3, 15E3, 20E3),
                           labels = c("5", "10", "15", "20"),
                           # range of colour
                           low = "#56B1F7", high = "#132B43",
                           # colour if value is NA
                           na.value = "grey70") +
   # plot title (usually property and unit)
   annotate("text", x = 8, y = 0.6,
            vjust = 0,
            label = paste("Density/", "10^3*~kg~m", "^-3", sep=""),
            # parse required if using superscripts or subscripts
            parse = TRUE) +
   theme(panel.grid.major = element_blank(),
         panel.grid.minor = element_blank(),
         plot.margin = unit(c(0, 0, -0.85, -0.85), "line"),
         axis.title = element_blank(),
         axis.text = element_blank(),
         axis.ticks = element_blank(),
         # center (roughly) over transition metal block
         legend.position = c(0.42, 0.91),
         legend.justification = c(0.5, 1),
         legend.direction = "horizontal",
         # make the legend colourbar a little longer
         legend.key.width = unit(2.5, "line"),
         legend.title = element_blank(),
         legend.background = element_rect(fill = "transparent"))

Or this code to visualise a discrete variable.

ggplot() +
   geom_point(data = values,
              # size 14 fits well with fig.width = 9, fig.height = 5.25
              size = 14,
              # shape #22 allows both fill and colour to be
              # shape #15 only registers colour (is always filled)
              # shape #0 only registers colour (is always empty)
              shape = 15,
              aes(y = Graph.Period, x = Graph.Group, colour = Magnetic_Type)) +
   geom_text(data = values,
             colour = "white",
             size = 3,
             fontface = "bold",
             aes(label = Symbol, y = Graph.Period, x = Graph.Group)) +
   scale_x_continuous(breaks = seq(min(values$Graph.Group),
                                   max(values$Graph.Group)),
                      limits = c(min(values$Graph.Group) - 1,
                                 max(values$Graph.Group) + 1),
                      expand = c(0,0)) +
   scale_y_continuous(trans = "reverse",
                      breaks = seq(min(values$Graph.Period),
                                   max(values$Graph.Period)),
                      limits = c(max(values$Graph.Period) + 1,
                                 min(values$Graph.Period) - 1.5),
                      expand = c(0,0)) +
   scale_colour_discrete(na.value = "grey70") +
   # set the size of the legend boxes independent of geom_point's aes
   guides(colour = guide_legend(override.aes = list(size = 5),
                                title = "Magnetic type",
                                nrow = 2,
                                title.hjust = 0.5)) +
   theme(panel.grid.major = element_blank(),
         panel.grid.minor = element_blank(),
         plot.margin = unit(c(0, 0, -0.85, -0.85), "line"),
         axis.ticks = element_blank(),
         axis.title = element_blank(),
         axis.text = element_blank(),
         # center (roughly) over transition metal block
         legend.position = c(0.42, 0.93),
         legend.justification = c(0.5, 1),
         legend.direction = "vertical",
         legend.key.size = unit(0.8, "line"),
         legend.title = element_text(size = 15,
                                     face = "plain"),
         legend.key = element_rect(fill = "transparent",
                                   colour = "transparent"),
         legend.background = element_rect(fill = "transparent"))

Appendix

Data sources

  1. Ahmed, T. (2014, November). Properties of the elements. https://doi.org/10.6084/m9.figshare.1295585

sessionInfo()

## R version 3.3.2 (2016-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.5 LTS
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=sv_SE.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=sv_SE.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=sv_SE.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=sv_SE.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] methods   grid      stats     graphics  grDevices utils     datasets  base     
## 
## other attached packages:
## [1] stringr_1.1.0 magrittr_1.5  rvest_0.3.2   xml2_1.0.0    httr_1.2.1   
## [6] XML_3.98-1.5  dplyr_0.5.0   ggplot2_2.2.0 knitr_1.15.1 
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.8      munsell_0.4.3    colorspace_1.3-2 R6_2.2.0        
##  [5] highr_0.6        plyr_1.8.4       tools_3.3.2      gtable_0.2.0    
##  [9] DBI_0.5-1        selectr_0.3-1    lazyeval_0.2.0   assertthat_0.1  
## [13] digest_0.6.10    tibble_1.2       curl_2.3         evaluate_0.10   
## [17] labeling_0.3     stringi_1.1.2    scales_0.4.1

Todo

  • Record notes and save them along with values and units.