Skip to main content

Biomedical and Electrical Engineer with interests in information theory, evolution, genetics, abstract mathematics, microbiology, big history, Indieweb, and the entertainment industry including: finance, distribution, representation







@drmichaellevin Important to your definition collection are not only the people defining (and their fields), but the year as well. Cybernetics, to a great extent, also suffered from what Claude Shannon wrote about information theory in "The Bandwagon".


Important words, particularly when they come from a professor of cognitive science and linguistics:


@khurtwilliams The important question for my morning tests is did you get one or two wembentions to that post from me this morning? Somehow I'm not seeing either display. If you only got one, which was it?


Replied to a post on :

I think the biggest hurdle to wider adoption is simply the fact that there are so many individual plugins and this takes up far more mental space for the user than it should.

So, another option which I'd like to suggest and advocate for is to **bundle all the plugins into one big single plugin** instead of sub-plugins. You could almost sell it as "the part of WordPress core you always wished you had" and now you can with two clicks: download and activate. (That's got to sound good, even to your mom who's still figuring out how to upload her profile picture.)

From the user's standpoint, this wouldn't require much more than some slightly better UI/descriptions. (And I'm more than happy to write them.) This could consist of a single main settings page with on/off toggles for Post Kinds, Syndication Links, Webactions(?), Micropub, Hum, and IndieAuth. A tabbed interface on this same page with tabs primarily for settings/set up and usage description for all of these (except for maybe Webactions?) would complete the cycle.

Most of the sub plugins don't have many (if any) actual settings other than installation/activation right now which is creating the biggest part of the (mostly mental) hurdle for every day users. I think the average WordPress user probably wouldn't know that they had Webmentions, Semantic Linkbacks, or Webmentions for Threaded Comments installed because they "just work", require no configuration, but are far prettier than any of their predecessors. Why make them carry the mental overhead of what they are and what they do aside from a few subtle lines that they exist? In fact, treating them as if they should have been in WordPress core all along may actually make it more likely to happen.

Additionally things like Micropub which would only have an on/off toggle wouldn't be noticed or used by many unless they had interest in alternate posting interfaces. (And based on the popularity and growth in Twitter interfaces/apps a few years ago, I'm surprised WordPress didn't do this, though perhaps it's part of the reason they're adding a more robust API over the past few years?)

It also means having slightly better or more intuitive explanations of what the individual pieces are (mostly Syndication Links and Post Kinds) near their on/off toggles to better explain what is being activated. Much of this can be taken from the current interface or from the WordPress wiki pages, or added on the individual tabs for the settings for these portions.

I would suggest that doing this would not only make it easier on end users who then wouldn't have to spend the mental space and capacity to keep track of what 10 individual plugins are doing (in addition to the space these take up on the plugin admin page and the fact that, once activated, they disappear from the IndieWeb plugin's list of plugins), but that it would actually dramatically increase the uptake of the single big plugin and its functionality and simultaneously the use of the all the sub plugins individually.

I'd argue that bigger plugins like Yoast SEO or something like PressForward have huge numbers of options and settings and could have been done as separate sub-plugins (the way IndieWeb Plugin is now), but that their value proposition is such that it's well worth spending the handful of minutes reading through the interface to know what the options are, what they mean, and using them to their fullest advantage. I think that Indieweb (and the suite of tools offered on WordPress) is at this tipping point in terms of offering must-have functionality for the future web and that having a simpler integrated set up would help to push it over the edge to broader adoption. (Certainly simpler than the old WP-Social, which users have indicated that they thought was far simpler than Indieweb plugin, though Social actually required more set up.) Additionally all of the seemingly dense text in the "getting started" page could be moved into smaller bit-sized chunks relating to individual portions on a tabbed-interface, for example.

I come to this in part after having spent part of the weekend revamping a bit of the documentation on getting started with WordPress and setting up Bridgy with WordPress. A lot of the description is "get this plugin, install, and activate" which takes up a big piece of mental space for the user as well--particularly for the Gen2, 3, 4 users who want a plug and play experience. Far better would be to install one plugin and then modify these handful of settings.

If this is done, then the only remaining (small) hurdle is making sure that the underpinning rel-me data input required of the user is done in a more explicit manner, because this seems to be the lynch-pin holding a lot of it together and making it work. As a result, I'd recommend unbundling the reliance on the User Profile page and put all the rel-me URL fields on their own page in the settings interface for such a single plugin (with all important links just underneath them to encourage users to visit, for example, Twitter's edit profile page to include their website URL in either the website field or in the bio field to enable the bi-directional rel-me.)

Finally a "Tools" tab in the settings page could provide pointer links to additional things like the H-Card Widget or the IndieWeb-PressThis bookmarklets.

When all of this is done, it could also be a simple manner of adding another settings tab to the interface to set up Bridgy with one button links from the plugin to the set up pages for each of the main backfeed services there. Bridgy then automatically checks for the webmention endpoint and checks for rel-me to do it's work, so that part is already automated and relatively user friendly too.

The one caveat I can imagine is that making it all into one big plugin potentially means some small added overhead in development with maintaining some of them as stand alone pieces. I'd recommend keeping them as standalone objects as I honestly believe that pieces like webmentions and micropub are so fundamental to the web, that they should be part of WordPress core and maintaining them separately could help speed this along.


Malcolm, thanks for the reminder of those social RSS feeds. I'd seen them a few months back and meant to set up a reader specifically for consuming content from Twitter, Facebook, et al. I'll have to delve back into it in the new year as having a better reader is becoming more important/interesting to me.


@Pocket #FeatureRequest: Ability to immediately mark as read with bookmarklet? #UX #UI

In some odd sense, I often use Pocket as a personal linkblog of sorts, so nearly everything I'd either like to read later or have read online ends up there. My Pocket bookmarklet, by far, has the senior-most pride of place in my browser toolbar.

While the vast majority of things I bookmark are done so to read later, every now and then, something is so compelling that I read it immediately. But I still want to save it to Pocket (almost as a bookmark functionality, particularly as it was so compelling). After using the bookmarklet to add tags and save it to my account, I'd love it if the following popup banner had a "Mark article as read" button in addition to the "View List". 

If you want to go a step further, you could add the ability to favorite that particular link as well. This would allow people to immediately highlight either important things, or additional extra-functionality for those using services like IFTTT or Zapier to API Pocket content to other platforms based on tags, favorites, or marked as read items.

Internally, with this type of functionality enabled, Pocket could also use the incoming data as an additional indicator that articles that are even more compelling than others instead of simply relying on bulk numbers of articles bookmarked to be tagged on the system as "Best of" or similar  trending types.


Replied to a post on :

@snarfed, no worries; I know you're certainly doing better/more important work in other quarters than this. Besides, I don't just hold my breath waiting for things like this, sometimes I turn blue and fall off my chair... :) maybe I'll finish dusting off some of my rusty coding skills and get around to this myself after a few other itches.

Some of these types of niceties are going to be more "necessary" as IndieWeb Generation 2 starts onboarding heavily...


17w5131: Statistical & Computational Challenges in Large Scale Molecular Biology Workshop @BIRS_Math 3/2017 #ITBio

Arriving in Banff, Alberta Sunday, March 26 and departing Friday March 31, 2017


  • Barbara Engelhardt (Princeton University)
  • Anna Goldenberg (University of Toronto)
  • Manolis Kellis (Massachusetts Institute of Technology)
  • Jacob Laurent (Centre national de la recherche scientifique)
  • Jeff Leek (John Hopkins University)
  • Stephen Montgomery (Stanford University)


Over the past few years, an increasing number of large scale data sets have been made available in molecular biology. GTEx, for example, produced more than 18,000 RNA-Seq assays for multiple tissues in 900 individuals, Mindact generated gene expression data from about 7000 breast tumors in a single study, and 23andMe claims to have sequenced about 900,000 genomes. This growth in the available genomic data is expected to increase our capacity to identify cancer subtypes, regulatory genes, SNPs associated with phenotypes of interest, and biomarkers for many human traits. It also suggests exploring more complex feature representations when analyzing these datasets.

However, increasing the number of samples and features leads to a set of \textbf{interrelated statistical and computational problems}. Accordingly, the objectives of our workshop will be to:

Systematically identify the statistical and computational

problems arising during the analysis of large scale data in molecular biology;

Bring together experts in computational biology, molecular

biology, computer science, and statistics to propose innovative solutions to these problems, by leveraging recent advances in each of these fields.

Relevance, importance and timeliness

A number of studies generating high throughput molecular data for a large number of biological samples have been completed over the past five years. \textbf{Our workshop is important because the availability of these datasets holds great promises in terms of health improvement and understanding of molecular biology}. First of all, if exploited correctly, larger sample sizes should improve our ability to predict phenotypes of interest from molecular data. This entails very important applications such as improving the survival of cancer patients by better predicting which treatment they should receive, or decreasing bacterial resistances by predicting which antibiotic is efficient against a new strain. Correctly exploiting large scale datasets should also allow us to \textbf{better identify genetic and epigenetic determinants of these phenotypes, yielding a better understanding of human diseases and potentially guiding the development of new treatments and prevention policies}. In particular, more samples should allow the detection of less frequent variants in the human genome, or more complex features involving several modalities (copy number, expression, methylation, etc) associated with diseases. Finally, larger sample sizes should help with essential unsupervised tasks such as the \textbf{inference of regulation networks, or the identification of cancer subtypes}.

Our workshop is relevant because \textbf{all of these promises are conditioned on our solving of new statistical and computational challenges}. First (Challenge 1), we need to build new feature spaces and estimators whose complexity is adapted to these larger sample sizes, which involves designing novel, potentially more complex descriptors of the samples but still controlling the bias/variance trade-off. Second (Challenge 2), we need to build models which correctly integrate different modalities, such as copy number variation and gene expression. Third (Challenge 3), larger scale studies are more prone to unwanted variations, because they typically involve different labs and technical changes which can affect the measurements and become confounders in retrospective analyses. Similar or worse problems arise when trying to combine several existing datasets. We need methods which take this unwanted variation into account. Finally, (Challenge 4), we need new algorithms that make existing statistical tools scalable to the new sample sizes, and make estimation over the larger and more complex features of Challenge 1 tractable.

We also believe our workshop is very timely because \textbf{some of these statistical and computational challenges are starting to be addressed in other application fields} of statistics. It is crucial to recognize that the orders of magnitude are still very different in molecular biology and other data science application fields because of the cost and complexity of the data generation process: current large scale high throughput sequencing data sets typically contain a few thousand of samples but millions of features while computer vision, web, or astronomy datasets can involve billions or trillions of samples and relatively fewer features. A first consequence is that not all recent developments in machine learning are immediately transferable to computational biology. For example, so called deep learning methods have gained a lot of popularity and now represent the state of the art in computer vision but may not be the most appropriate tool for prediction of cancer outcome from molecular data. However, the fact that other fields already have much larger sample sizes also means that they had to develop efficient and scalable algorithms for basic tasks like feature selection, classification or clustering. \textbf{These recent developments are a great source of inspiration for computational biology, where large scale computation is still an emerging challenge}.

We believe \textbf{having a small scale workshop involving international experts in machine learning, statistics, computational biology and molecular biology is of utmost importance} for three main reasons. The first reason is that the technical advances we are referring to are very recent, often unknown to computational biologists and involve paradigms such as online optimization, accelerated gradient methods and network flow optimization, with which they are sometimes unfamiliar. The second reason is that it is not always obvious to non-statisticians which novel methods are appropriate given the current n/p regime. Conversely, the third reason is that statisticians do not know what the recent challenges are in molecular biology. Having them work on abstract versions of the problems is often not satisfactory as it is necessary to be aware of technical realities and of the underlying biology of the problem to come up with useful solutions.


17w5104: Mathematical Approaches to Evolutionary Trees and Networks Workshop @BIRS_math 2/12/17 #ITBio

Arriving in Banff, Alberta Sunday, February 12 and departing Friday February 17, 2017



  • Leonid Chindelevitch (Simon Fraser University)
  • Caroline Colijn (Imperial College London)
  • Amaury Lambert (University Pierre and Marie Curie, Paris)
  • Marta Luksza (Institute of Advanced Study, Princeton University)
  • Vincent Moulton (University of East Anglia)
  • Tandy Warnow (University of Illinois)


The objectives of the workshop are to bring mathematicians working in three key areas together to make progress in these problems. We will also invite several biologists who are keen to engage with mathematicians on the challenges posed by new data on evolutionary processes. Key challenges in the field at the moment are focused around the following emerging inter-related areas, each of which is raising mathematically interesting problems:

1. Inference with evolutionary trees and networks: Ultimately it is necessary not just to obtain evolutionary trees from data using standard methods, but to infer aspects of an underlying biological process. This requires understanding the likelihood of an evolutionary tree or network, or at least some of its informative features, using some stochastic process as the underlying ecological model. In principle, this approach allows simultaneous inference of both evolutionary trees and parameters of the ecological model. Coalescent theory has made considerable progress, for example, in obtaining tree likelihoods for sparsely sampled populations with geographical structure or with known past demographics (see for just one example [5]). In some simplified cases, epidemiological inference methods can estimate transmission trees [2], branching rates through time [5] and other aspects of epidemic spread [7]. However, none of these approaches is currently applicable if there is non-tree-like evolution, or where datasets are large. Furthermore, the range of models for which we can write down a tree likelihood is very limited. This raising nice new problems in probability, statistical inference and ecological modelling. Recently, more general processes (e.g. Lambda-coalescents, which allow multiple rather than strictly pairwise coalescent events) are beginning to be used to model populations with large offspring variance, or even to model selection in a non-parametric fashion [3]. This is potentially a powerful tool particularly for bacteria, which may acquire resistance to antibiotics and spread rapidly as a consequence, yielding both highly variable effective offspring numbers and a need to model selection carefully.

2. Understanding spaces of evolutionary trees: There are a large number of possible labelled, rooted binary trees for a given set of nn tips (ie for a given set of sequence data): (2n−3)!!=(2n−3)(2n−5)...(3)(1)(2n−3)!!=(2n−3)(2n−5)...(3)(1). This works out to 1018410184 trees on 100 tips; in contrast, current datasets for evolving bacteria contain thousands of tips. Not even the tools of Bayesian inference, the natural approach in such situations, can systematically explore spaces this big. This motivates the development of mathematical approaches for the exploration of tree space. These include new approaches to continuous tree spaces, including those from tropical geometry [8], and the use of tree metrics [1]. These in turn can lead to tools for averaging trees , and for navigating tree space in efficient ways [6] -- with profound applications in statistical inference from sequence data. Generalizing metrics to the case of evolutionary networks (for example tree-based networks) is another natural and important question. 

3. Summarising trees and networks using combinatorial tools: Uncovering shape features, spectral features and other ways to describe trees using quantities that are mathematically tractable will be of considerable interest [4]. As one example, where likelihoods are truly intractable, rapid tools for likelihood-free inference can be used to infer evolutionary processes from sequence data, but only where there are informative ways to summarize key features of the data. Trees are natural combinatorial structures with connections to data; for example, a binary tree is a sequence of partitions of the set of tips (sequences in a dataset), where each partition is one block smaller than the previous one, moving back through time from the partition with each tip on its own to the partition with all tips in one block as we move from the tips of the tree to the root. If the tree is not binary (ie it allows multifurcations), more than two blocks can combine at a branching event. Because of the natural link to partitions, the study of tree shapes links to the enumeration of partitions and to lattice path combinatorics. These in turn allow the characterization and enumeration of possible tree shapes. Meanwhile the study of motifs in other biological networks has been fruitful, and could be extended to tree and evolutionary network shapes. Trees and evolutionary networks are of course also graphs (with an added time dimension); the tools of algebraic graph theory are now finding application in this area of mathematical biology.

The community's response to the idea for this workshop has been very positive. A * beside a participant's name indicates that they have expressed enthusiasm for the workshop, and plan to attend. 

References [1] Louis J Billera, Susan P Holmes, and Karen Vogtmann. Geometry of the space of phylogenetic trees. Adv. Appl. Math., 27(4):733–767, November 2001. [2] Xavier Didelot, Jennifer Gardy, and Caroline Colijn. Bayesian inference of infectious disease transmission from whole-genome sequence data. Mol. Biol. Evol., 31(7):1869–1879, July 2014. [3] Alison M Etheridge, Robert C Griffiths, and Jesse E Taylor. A coalescent dual process in a moran model with genic selection, and the lambda coalescent limit. Theor. Popul. Biol., 78(2):77–92, September 2010. [4] Fanny Gascuel, Regis Ferriere, Robin Aguilee, and Amaury Lambert. How ecology and landscape dynamics shape phylogenetic trees. Syst. Biol., 64(4):590–607, July 2015. [5] Amaury Lambert and Tanja Stadler. Birth–death models and coalescent point processes: The shape and probability of reconstructed phylogenies. Theor. Popul. Biol., 90(0):113–128, December 2013. [6] Tom M W Nye. An algorithm for constructing principal geodesics in phylogenetic treespace. IEEE/ACM Trans. Comput. Biol. Bioinform., 11(2):304–315, March 2014. [7] David A Rasmussen, Erik M Volz, and Katia Koelle. Phylodynamic inference for structured epidemiological models. PLoS Comput. Biol., 10(4):e1003570, April 2014. 2 [8] David Speyer and Bernd Sturmfels. The tropical grassmannian. Adv. Geom., 4(3):389–411, 2004.


Kyle Mahan's post: I was not expecting to have a long conversation with my therapist today about why it's important to me to have my own website


[1406.1391] ";A Vehicle of Symbols and Nothing More."; George Romanes, Theory of Mind, Information, and Samuel Butler

Donald Forsdyke indicates [at] that "...Polymath Adami has "looked at so many fields of science" and has correctly indicated the underlying importance of information theory, to which he has made important contributions. However, perhaps because the interview was concerned with the origin of life and was edited and condensed, many readers may get the impression that IT is only a few decades old. However, information ideas in biology can be traced back to at least 19th century sources. In the 1870s Ewald Hering in Prague and Samuel Butler in London laid the foundations. Butler's work was later taken up by Richard Semon in Munich, whose writings inspired the young Erwin Schrodinger in the early decades of the 20th century. The emergence of his text – "What is Life" – from Dublin in the 1940s, inspired those who gave us DNA structure and the associated information concepts in "the classic period" of molecular biology. For more please see: Forsdyke, D. R. (2015) History of Psychiatry 26 (3), 270-287."

Let's look into this and see where, if at all, there may be a bridge over to either Claude Shannon or Boolean Algebra.


@fadesingh, While I do agree in part with your comment that the technical side of the math could have been explored a bit more, to me, part of what this story is highlighting (and to great benefit as well) is the "other" personal side of mathematics which is rarely seen by the broader public. Most of math and math history is full of seemingly brilliant solo (read: lone wolf) researchers developing mathematics de novo in dark, smoke and caffeine-filled rooms and emerging with iron clad proofs. This particular story shows the growing more collaborative and friendly side of math in addition to the years of slow development of friendships and theory which have culminated into something potentially interesting. I would suggest that in this case you not take them too hard to task on the subject, particularly as the article was being written contemporaneously with the publication of the journal article itself. (The article here was published <i>just</i> in advance of the arXiv post, such that it didn't even include the link to the paper itself, though it was added the following day.)

I love that Quanta is continually exploring the areas of math and science at the depth and level which they've become accustomed. They're filling a very important gap in science communication between technical journal articles and somewhat sophisticated outlets like National Geographic, Scientific American, and Wired (in which they are also distributed, but yet are still an editorial cut above comparatively. [Cross reference: <a href="">Evolution of a Scientific Journal Article Title (from Nature to TMZ)</a>] I'm sure that C.P. Snow himself would praise them for helping to close the gap between the "Two Cultures."


@timwindsor @marcoarment Sorry, mobile wasn't showing original tweet & so much of what @jeffjarvis says is important, one has to ask.