The Shape of Innovation: Mathematical Properties of the New as Drawn from the Adjacent Possible

Recently, I came across an intriguing working paper. The goal of the paper is to offer a mathematical formula that describes the emergence of empirical observations of innovation in the world—i.e., how likely innovations are to be observed and how innovations make the observation of further innovations possible. As the authors write, they model how an innovation changes the shape of the “adjacent possible,” faciliting the emergence/discovery/observation of new innovations.

Mathematically we consider an ordered sequence S, constructed by picking elements (or balls) from a reservoir (or urn), U, initially containing \(N_0\) distinct elements. Both the reservoir and the sequence increase their size according to the following procedure. At each time step: - (i) an element is randomly extracted from U with uniform probability and added to S; - (ii) the extracted element is put back into U together with \(rho\) copies of it; - (iii) if the extracted element has never been used before in S (it is a new element in this respect), then ν + 1 different brand new distinct elements are added to U.

It turns out that the above process can accurately describe innovations such as “the edit events of Wikipedia pages, the emergence of tags in social annotation systems, the sequence of words in texts, and listening to new songs in on-line music catalogs.”1

I thought it would be fun to visualize this process.


Code

MAKE_NEW_FIGURE <- TRUE

if (MAKE_NEW_FIGURE) {
  library(ggplot2)
  library(magrittr)
  library(dplyr)
  
  set.seed(10204)
  # make a df representing the urn, U
  df <- data.frame(Item = 1,
                   Count = 1)
  # make a df representing the ordered sequence, S, of elements drawn from the urn, U
  df.selection <- data.frame(Time = 0,
                             Selection = NA)
  
  # if element, k, drawn, the increase in n_k for drawing same ball again
  rho <- 4
  # if element, k, drawn is an innovation, increase in "new balls (innovations) - 1"
  v <- 3
  # starting number of innovations
  innovations <- 0
  
  # help make an animated .gif
  jpeg("/tmp/foo%02d.jpg")
  # mark the progress
  p <- progress_estimated(n = 10000)
  # count the time elapsed
  start_time <- Sys.time()
  
  # start the for loop
  for(t in 1:230) {
    # first time period, you must select the only element in the urn, U
    if (t == 1) {
    selection <- 1
    } else {
      # subsequent time periods, you select an element weighted by count of elements in the urn, U
      selection <- sample(x = df$Item, size = 1, prob = df$Count)
    }
    # if this is an innovation
    if (!selection %in% df.selection$Selection) {
      # count the innovations so we can plot and display the percent of draws that are innovations
      innovations <- innovations + 1
      # add new elements to the urn, U---i.e., expand the "adjacent possible"
      df <- rbind(df,
                  data.frame(Item = seq(max(df$Item) + 1, 
                                        max(df$Item) + v + 1, 
                                        1),
                             Count = 1))
    }
    # replace element, k, into the urn and add rho more copies of element, k
    df$Count[df$Item == selection] <- df$Count[df$Item == selection] + rho
    
    # start the ordered sequence, S, at t = 1
    if (t == 1) {
      df.selection <- data.frame(Time = max(df.selection$Time) + 1,
                                     Selection = selection)
    } else {
      # append element, k, to the ordered sequence, S
      df.selection <- rbind(df.selection,
                          data.frame(Time = max(df.selection$Time) + 1,
                                     Selection = selection))
    }
    
    # what percent of all elements drawn have been previously unobserved elements
    unique.selections.percent.point <- round(length(unique(df.selection$Selection)) / nrow(df.selection), 3) * 100
    # minimum and maximum values of selected elements help to control master plot (according to width)
    min.selection <- min(df.selection$Selection)
    max.selection <- max(df.selection$Selection)
    # most commonly observed draw helps to control master plot (according to height)
    max.common.selection <- max(df$Count)
    
    # show plots frequently for low numbers but less frequently an t increases
    # if (t %% 10^(floor(log10(t)) + 1 - 1) == 0) {
    if (t %%1 == 0) {
      gg <- df.selection %>%
        ggplot() +
        geom_dotplot(aes(x = Selection, 
                         fill = factor(Selection)),
                     binwidth = 1,
                     # adjust dotsize by tallest column
                     dotsize = (25 / max.common.selection) * 
                       # adjust dotsize by difference between leftmost and rightmost columns
                       (max.selection / 8)) +
        geom_text(aes(x = 1, 
                      y = 1), 
                  label = paste0("Innovations: ", 
                                 format(unique.selections.percent.point, nsmall = 1), 
                                 "% of total"), 
                  size = 5, 
                  hjust = -0.05) +
        geom_text(aes(x = max.selection, 
                      y = 1), 
                  label = paste0("N: ", 
                                 t), 
                  size = 5, 
                  hjust = 2) +
        # make bars to represent proportion of innovations to all drawn elements
        geom_segment(aes(x = min.selection, 
                         xend = min.selection + 
                           (unique.selections.percent.point / 100) * 
                           (max.selection - min.selection), 
                         y = 0.9, 
                         yend = 0.9), 
                     size = 5, 
                     color = "#4caf50") +
        geom_segment(aes(x =  min.selection + 
                           (unique.selections.percent.point / 100) * 
                           (max.selection - min.selection), 
                         xend = max.selection, 
                         y = 0.9, 
                         yend = 0.9), 
                     size = 5, 
                     color = "#2b94c3") +
        theme_classic() + 
        theme(axis.text = element_blank()) +
        guides(fill = FALSE)
    
      print(gg)
    }
    p$pause(0.01)$tick()$print()
  }
  end_time <- Sys.time()
  end_time - start_time
  dev.off()
  
  system("ffmpeg -y -r 8 -i /tmp/foo%02d.jpg -qscale 2 featured.gif")
  system("rm /tmp/foo*.jpg")
}

Edit this page

Avatar
Joseph de la Torre Dwyer
Researcher

My research interests include distributive justice; the principles of responsibility, desert, and control; and reproducible research with R.

Related