Carbon Brief: Markdown rabit hole
I spent the week wrestling with markdown parsing, trying to keep my head above water in the world of Unified JS.
Unified JS is an impressive ecosystem of JS libraries and plugins which work together in a predictable way to provide a powerful array of tools for parsing and transforming Markdown and HTML. Unfortunately the off the shelf stuff wasn't quite what I needed (or rather I wasn't able to bend the off the shelf stuff to my needs), so I had to get my hands dirty and write code to do some transformations myself. Keeping it all in my head was a bit of a house of cards, esp. working from home with kids whose school was on strike -- one interuption could erase half an hour or even an hour of mental card stacking. I got there in the end though (sometime around 7pm on friday).
This is one of those cases where I'm sure I probably could have used ChatGPT, or Claude, or whatever to help out. The domain is well understood and documented so there's a good chance an LLM would have the answers I needed, but I feel the process of climbing the hill of understanding means that I've actually got some new skills now and have explored the design space more than I would have otherwise. I get that prompt writing and LLM wrangling is its own skill but it's not one that I see the appeal of. Perhaps this is where I find myself slowly sidelined in the employment market place ¯|(ツ)/¯.
The reason for all this markdown shenanigans is that Google docs for the last 6 months or so has provided markdown export (though unfortunately, as far as I can tell, there's no API for this yet). The writers I work with all use Google docs to craft their articles and for years the gaps between the word-processor, the CMS, and the published web have been awkward stumbling points. We're looking to produce a series of glossaries, vaguely in the style of this Carbon Offsets one I made a couple of years back, and this type of structure heavy article is exactly the kind of thing that can benefit from making that structure available to code. I hope this will allow things like inline definitions etc. across the site at some point. So the plan is to make this a repeatable format, justifying the coding time spent upfront on markdown wrangling with quicker turnaround down the line.
The difficulty with any format like this is knowing whether you've captured all the requirements without adding too much flexibility ( = abstraction & code = support cost). Is the design of the system now precluding certain important features in 6 months time? have I overcomplicated things in service of use-cases that will never exist (YAGNI)? My strategy here is to make the system allow for exceptions, essentially parallel paths of code that live near each other, if a new glossary needs a feature not compatible with exisiting glossaries we can add a new route so instead of living at e.g. glossary/china it can live at glossary/enhanced/china this does lead to a certain amount of code duplication but also, hopefully, makes refactoring the 'enhanced' code into the core format easier later on.
The other thing: This week was the first week of the Data Viz Society's mentoring scheme. I'm being a mentor (at 47 I could hardly be a mentee even though I feel I'm only just learning, and not at all a real grown up with a 25+ year long career) more on this in the next few weeks.