Heatmaps are expensive to generate, which makes them infeasible for applications approaching anything like real time. Most of the cost is in the multiply operation (see the source). This operation is already written in C (it's part of PIL), so what else could we do to optimize this?
We came up with two ideas. First, we could parallelize it. Split up tiles even further, generate mini-heatmaps, and then knit them back together. This is basically carrying Google's tiling strategy even further. You'd have to distribute the parallelization over a number of CPUs to make it worthwhile. And since points just off the edge of a tile affect the tile itself, you end up with this overlap area where you are really doing double work. The smaller the tiles, the larger the overlap. At maximum parallelization you are doing fully twice the work.
The other idea was to reduce the dot resolution. Perhaps multiplying 100 dots looks about the same as multiplying only 10 well-placed dots. And probably, finding those 10 dots is cheaper than multiplying all 100. To make this work, you would need to calibrate your dot resolution to the zoom level. You would also need to normalize across all your data so the heatmap meant the same thing everywhere. For this reason, local changes could potentially ripple out to affect the entire dataset, which would hurt caching (our current optimization strategy). As always, optimizations are not without trade-offs.
Wouldn't it be neat to have real-time heatmaps?