Palingenesis

Having been reminded of the birthday paradox recently, I realized that I had never seen it inverted in a cynical way: what are the odds that you were born on the same day that some other famous person died? As it turns out, the odds are pretty good!

I decided there ought to be a way to visualize this, so I made Palingenesis, named after the greek term for rebirth.

The Data

Our target is to collect ~every person’s name, description, birth and death dates.

I started with the Wikidata Query Service, although its query size limits made collecting the entire dataset infeasible, even when paginating the results.

Next, I downloaded the entire ~100GB English Wikipedia dump, decompressing and parsing the file in a memory stream to pull out the relevant data. Decompression alone on my new Apple M-series chip takes an hour or two (bravo Apple Silicon), and streaming to a parser slows throughput even more.

I ended up returning to an old friend I had forgotten about: WDumper. Instead of stressing my local machine, I could write a specification and use the generosity of the Wikimedia Foundation’s Toolforge to run the dump filtering on their servers. A few hours of peaceful waiting later, I had a ~4GB compressed nt file containing only the relevant fields, which I could then download and parse locally.

Since the actual processing was so thin, and I’d otherwise be working entirely in React and TypeScript, this was a unique chance to also write a “data pipeline” in the language, as I usually reach for Python instead. Plus, n3.js is a mature library and would make parsing the nt file a breeze.

The Site

The site is a simple Next.js app router project. I hand rolled the core schema and data fetching logic, however the rest was written by a closely-instructed Claude. I am still amazed by the speed of iteration with these modern tools, and the quality of the output given competent instructions. Across different projects, I have found that AI is particularly good at hacking together “views” in software of all kinds, this being no exception.

After getting the site functional, I noticed a suspicious number of first-century figures being both born and dying on January 1st. Wikipedia stores dates with a precision field, which I neglected earlier. Handling this property was trivial, as the data was already included in my earlier dump: I just added a field to the schema and seeder, updated the matching algorithm to match both birth/death dates and precision, and made the date formatting conditional on precision.

Deployment

I chose Cloudflare Workers for its generous free tier, deploying the Next.js app via OpenNext. Seeding D1 was less smooth than I’d hoped — wrangler d1 execute is happiest with smaller SQL files, so I wrote a shell script that chunks the dump and applies the pieces one at a time.

With that, the site is live!