What year is it? What happened? Hello?
What have I done in 2021? Here is a brief outline of the projects that I've worked on, listened to, and some good old fashioned stats about what I've done over the year.
Here is a non-exhaustive list of websites that I've worked on this year..
nlp-otherwise, my GitHub pages site which uses FastPages to build a static site from Jupyter Notebooks. I mostly use this to look at COVID-19 data in jupyter notebooks, using GitHub actions to update the data everyday by rerunning the notebooks on GitHub.
weatherotherwise.xyz is a Python website built with FastAPI using JinjaTemplates for the frontend. It's basically a wrapper around a weather API so that you can check what the current weather is anywhere in the world.
dataotherwise.xyz This was my first website, and it is written all in R using the R-Shiny framework. It again is mostly COVID-19 charts, but there are also some other posts there while I was trying to learn how to combine R, HTML and LaTeX. I used a Digital Ocean server to host it, and it even runs it's own instance of RStudio which I occasionally use if I need a remote server to run R code (which is very rarely these days).
fastbooks.herokuapp.com This was a quick website I made to list some books that people recommended for me. It was all done with fastAPI, again using only JinjaTemplates for the frontend.
tybed.com is a website which you can put a web link into and it will go visit that website, scrape the content on the page, and run it through a 'summary model' and output the summary. This was my first serious attempt and making a full stack website. I made a quick prototype with fastAPI as a backend and front end, then spent several weeks integrating a React frontend to make it look nicer. Sadly I lost steam and stopped short of successfully deploying both front and backed separately. My main problem was not finding good resources on how to deploy a monolithic app. I have a fastapi backend accepting requests from a redis queuer, then saving the result to a postgres database, then returning the entry ID to a React app which then queries the database, all running in a docker container. After trying to untangle the monolith into separate repos/deployments, I eventually moved on to other projects. Now that I know much more about React and different deployment methods, I will definitely be returning to this in 2022!
FastMenu (no link, because it's not open to the public) - My co-working office has a chef that comes in twice a week to cook for us. Twice every week, someone used to go around to collect orders from people for the next food day. Eventually I decided to help that person out by making a website so that people in the office can register their interest for the next meal on there. They can see the menu on the website, then place an order for as many people as they want (people sometimes have clients come along as guests). The backend is written with fastAPI and sqlmodel, and the frontend is a Next.js app.
- TractorSheeps a game for the iPad/iPhone written in Swift, using SpriteKit, where you control a tractor and you need to collect sheeps and cows. I played around with other frameworks for making the game, including Kivy (a Python framework for writing native apps for iOS and Android) and React Native. I was very pleased with how the animated cow sprites turned out. While I didn't manage to get it finished enough to put it up onto the app store quite yet, I reused the artwork in many of the my other projects.
- DyfniauAngau is a dungeon crawler game written in Rust, made while following along with the Hands-on Rust book. This was probably my favourite project of the year, spending a couple of hours every evening learning a new language was surprisingly fun! I reused some of the artwork from TractorSheeps, making it into a classic 'drive a tractor around hell while being hunted by demonic cows and sheep' game.. At some point I plan on pairing Rust with WebAssembly to deploy the game into the wild.
I used Gooey to give a GUI to a CLI application I made to generate invoices for me. The GUI would allow me to enter information about the company I'm invoicing, my rate, and choose the output format. If I choose
.texfor example, it uses a JinjaTemplate to write the output to .tex file, then uses pdflatex to compile it to a pdf. Alternatively I can compile directly to pdf by building up directly with the FPDF library, but this doesn't allow for editing after the fact.
I used spaCy and fast.ai to build a Welsh language model, then used that to train a sentiment analysis model for Welsh language movie reviews. The language model part worked quite well, trained on welsh wikipedia pages, but I struggled to find a sufficiently good dataset for the sentiment analysis part. I want to return to this project at some point, to either hand-craft a dataset, or find a different task to fine-tune the language model on.
I made a simple data-labelling app which I use locally to label data. It does a similar thing to Explosion.ai's prodigy, but much simpler (and I've not been able to justify paying for prodigy yet..). It's not pretty, but it does the job for when I want to label a few hundred examples of text.
Before 2021, I only coded in Python and R, but this year I've begun branching out to also use:
I could probably list every episode from all of these podcasts because I enjoyed almost all of them, but I'll list some that stood out to me (in no particular order).
Chris Padwick: An interview with the director of Computer Vision Machine Learning at Blue River Technology, where they use computer vision to aid farmers with identifying crops and weeds, allowing them to only spray herbicide where it is needed. Really cool application of computer vision which can save farmers lots of money, and presumably it's a pretty good thing to reduce the amount of herbicide we're dumping into the crops!
Kathryn Hume: An interview with the Head of Borealis AI (Royal Bank of Canada's ML research institute). This was really cool to hear about all the different topics and challenges that come up in financial ML, and what stood out to me was the language-to-SQL models that they briefly speak about. Borealis made a pretty successful model called Turing which was state-of-the-art until quite recently on converting text to SQL queries for large systems of databases (not just single tables).
Dominik Moritz: An interview with one of the co-authors of Vega-Lite, which is a data visualisation library. This was my first introduction into using non-python libraries to make interesting data visualisations - I ended up using Altair, a Python wrapper around Vega and Vega-lite, on my GitHub pages sites because of this.
Chris Albon: An interview with the Director of ML at Wikimedia, in which they talk about all the different kinds of NLP models that Wikimedia develop to help support Wikipedia, with a particular focus on the infrastructure that they developed to deliver all of the models. Really cool to hear about something that many people probably don't even realise is happening!
Emily M. Bender: This was a great episode with Professor of Linguistics Emily M. Bender, and I've written about this episode before in this post. They talk about a wide range of topics in NLP, particularly focusing on the recent growth in the size of language models being produced. Probably my favourite episode of the year!
Pete Warden: An interview with the Technical Lead of the TensorFlow Micro team, where they talk about really interesting problems arising from developing ML models for tiny devices - like a Raspberry Pi or even smaller chips within mobiles. For example, if you were to try and run a full-size speech recognition model (for something like Siri) on a normal chip, it would just drain your phone battery! But running a compressed, light-weight and efficient model on a low-power chip in a separate process saves alot of power and allows the speech recognition to be in 'always on' mode. I always try to opt for smaller models when doing a ML model, using optimised and quantized ONNX format when feasible. It can produce huge speedups and reduction in size, at the cost of sometimes tiny accuracy loss - so it was nice to hear of the work being done in this area!
Chai Time Data Science
- Andrada Olteanu: A great interview with data scientist Andrada Olteanu, talking about her journey of learning data science via Kaggle, and how she learned how to use D3-js to make really interactive data visualisations. This episode was my main inspiration for finally learning how to use D3!
Chris Ola: a cool interview about trying to understand what goes on inside neural network models. They talk about some techniques he uses to understand what each of component of these models are learning. For example in standard computer vision models, it's known that the first few layers learn to detect basic structural things like edges, corners, shapes or simple patterns. But as you dig deeper into the layers, it's quite hard to know what is being learned. This can often lead to models incorrectly learning things like 'I detect a ball on the floor. There must be a dog in the picture' if the model is trained on a dataset which is full of dogs with toys!
Luisa Rodriguez: I've listened to alot of podcasts and books on existential risk, and this was a fresh and interesting interview about what could humanity actually do if a global catastrophe (like a Nuclear war) happened. A pretty optimistic view on a fairly depressing topic!
Machine Learning Street Talk
- Quantum Natural Language Processing: A super interesting (albeit long) interview with Professor Bob Coecke, a physicist who focuses on Structure, Logic and Category Theory. He is well known for his work in compositional distributional models of natural language meaning, applying the theory of systems in Quantum Mechanics to word meanings in natural language.
Python Bytes + Talk Python to Me
Astronomy with Dr. Becky: A cool interview with an astrophysicist who uses Python to explore galaxies and black holes. Neat to hear about the use of python in exploring space!
Simon Willson: An interview with the creator of Datasette, and co-creator of Django. Some really cool tools for creating personal analytics from SQLite files (such as your twitter history). I've used Datasette a few times to explore some datasets, and always keep an eye on the developing ecosystem that is being built around it.
Will McGugan: An interview with open-source developer Will McGugan, creator of Rich and Textual. These are really cool python libraries for making nice terminal-based UIs in python. Great to hear about cool projects like this, and the openness in which they are being developed is nice to see.
Sebastián Ramírez: Not a single episode, but all the projects that Sebastián does ends up on Python Bytes or Talk Python to Me alot. His main projects are FastAPI (which I use all the time) and sqlmodel.
Some stats of commands I used in the terminal this year!
> zsh_stats 1 495 29.4643% git 2 294 17.5% ls 3 180 10.7143% cd 4 135 8.03571% python 5 114 6.78571% cargo 6 76 4.52381% npm 7 35 2.08333% ruby 8 28 1.66667% rbenv 9 21 1.25% pwd 10 21 1.25% npx 11 20 1.19048% subl 12 19 1.13095% rails 13 14 0.833333% cat 14 11 0.654762% code 15 9 0.535714% vim 16 9 0.535714% touch 17 8 0.47619% source 18 8 0.47619% rm 19 7 0.416667% curl 20 7 0.416667% chmod
Since I use git for all of my projects, that's no surprise. Just like everyone, I always forget what directory I'm in so
ls is also no surprise. The fact that
cargo (part of the Rust toolchain) is catching up to
python was a nice surprise, but then again you need
cargo to do anything with Rust but I often just have to run
python only once per session and rely on hot-reloading to watch for changes - or more commonly, to launch a Jupyter notebook session.
According to Github, I made 650 commits this year - that's almost two per day!
Podcasts and Audiobooks
I use PodcastAdict to listen to most of my podcast, and apparently I've listened to 511 episodes in 2021 (12 days and 23 hours!). I also use Spotify and Youtube for some podcasts, so that's alot of time! For audiobooks I use Audiable, which tells me I've listened to 22 days worth of audiobooks since I signed up (which was around the end of 2019).
Since skiing is a big part of my life during the winter, I managed to get out onto the slopes 31 times in 2021. Not a bad effort!
I initially thought 2021 was a very slow year for work, but having written down some of the things I worked on, it wasn't so bad in the end. It's been a mentally draining couple of years for everyone, so I'm just happy to be healthy and busy working on things I enjoy. I have many projects on my wish list for 2022, so I hope it continues like that.
Happy new year! Blwyddyn newydd dda! Guten Rutsch und frohes neues Jahr!