The sunny side of (contributing to) open-source software

September 18, 2022

No worries, there is no dark side of open-source software I am aware of. I just ate some sunny side up eggs lately. Oh, and I generated the above graph/logo with R code. Can you guess what music album it is based on?

Open-source is a wondrous place

Want to do an exotic statistical analysis or use a funky web framework? Guaranteed you will find it online, free to use and to modify. You really have about anything at your fingertips on open-source code platforms such as GitHub or GitLab. Big companies, Microsoft, Meta, Google, you name it, all open-sourced significant portions of how they do things. I find this amazing. The alternatives, software internally developed and hence duplicated many times over, or expensive so-so software from vendors inaccessible to a wider audience, is much worse. In a big joint effort, developers collaborate on software integrated in applications we use on a daily basis, continuously improving it issue by issue. We can all be a part of that.

From Google Summer of Code 2017 to other contributions

I had a light introduction to external software libraries on top of a base package (such as Python’s or R’s standard installation) when I used the pandas library to wrangle data for my Master’s thesis. I hardly knew what I was doing though. The first actual confrontation with open-source software was in the summer of 2017, during a Google Summer of Code (GSoC) project.

GSoC is a program by Google that hands out stipends to students willing to create or improve open-source software. Several programming language foundations open a call for projects. Each project has one or more mentors. I applied to create a new R package combining textual sentiment and time series analysis (the application is still online) with my to-be PhD advisors and received the stipend. Hence, I spent most of that summer coding in R (to the despair of my then girlfriend), and writing lots of documentation. By the start of autumn, I put the first version of the package online, downloadable by the entire world! I kept improving the package during several iterations, and the current version can be found in its full glory on GitHub here.

Another project I did was taking over the initial release of a predictive modeling Python library. The library is from my former employer, where I kickstarted my data science career. I was in charge of coordinating the development of a couple of new features, and am currently still maintaining the package. To be honest, both the R package and the Python library I am maintaining require very little effort. Most of the development work is done. Improvements can always be made, but if they are not essential, I prefer to concentrate on other things.

Whenever I have something data-oriented that I want to breathe into life, I now head to the DataWanderers collective that exists between me and my academic research friends. It’s not very active yet, but it fits most of the (experimental) analyses we do. It might slowly grow into a place that has myriad cool stuff done by a bunch of data-savvy people. Others are also welcome. The code snippet to produce the logo on top of this post is on there by the way. 😊

A pledge to be more committed to open-source

Overall, I have been doing my fair share of open-source work, but the status of consistent contributor is still far away. To advance, my next move should be to fix open issues in popular repositories. This way, I will learn to integrate in large structures and how to collaborate with top developers. Proposing and mentoring a brand new GSoC project would also be a step forward. Let the existence of this marvelous open-source community encourage me to be more committed to real open-source software projects, even when it’s not easy to find the time.