Back when I was studying CS at university, we had assignments to code various algorithms. When the assignments were due we were supposed to execute the code in laboratory systems and get the output checked by the professors. I liked coding and tinkering with the algorithms; some of my classmates didn’t. They used to copy the code from my system on a USB stick and execute them on their system and get done with it. I didn’t judge them but I was tired of giving each one their copy and sometimes also instructing them how to execute that piece of code. In one such assignment, I uploaded the code on GitHub and shared the link with anybody who asked. Unknowingly, I had made my first open source contribution.
A lot of people have similar stories and for most, Github is synonymous with Open Source. One thing that is also used interchangeably with Github is git. Even though these are three different entities, a lot of developers including me have never contributed to open source or used git without Github. Thus I was surprised when I learned that famous open source projects like Linux were not on Github even though they used git. Also, git was around before Github, so there must be other more native ways in git to get things done. I wondered how it worked but didn’t do anything about it until recently when I decided to experience a git only development workflow.
We use version control systems broadly for two use cases – to contribute to a repository where we have write access; to contribute to a public repository where we don’t have write access. To collaborate with developers to whom access can be given, Github or other platforms are not necessary. Anybody can host a Git server that can be used to host repositories. Developers can clone, push to, and pull from these remote repositories once access is granted to their ssh identity.
git clone origin git@<host>/<repo>.git
This is similar to how we upload our SSH keys to Github or other platforms for being able to clone the repositories using SSH.
One thing which is missing in the above-mentioned method is the ability to create Pull Requests, get them reviewed and merged into the code. It becomes a big issue when the developers don’t have write access to the repositories as is the case in open source development. General open source flow is to fork a repository, push the changes in your forked repositories, and create a pull request to the original repository from where the maintainers can review and merge. Pull request is not a feature of git, but of platforms like Github. Collaborative development in git has the following steps:
- Clone the original repository
- Create a patch
- Send the patch to the maintainer
- The maintainer applies the patch and pushes the code
We introduced two new processes in this workflow – creating a patch and applying a patch. Let’s discuss them further.
A patch is a diff of code and metadata around it. A diff is the actual code change which can be viewed using the command
git diff. It is a Unix concept that is way older than git and Unix systems even have a builtin tool,
diff which compares changes between two files. Its output can then be saved to a file that can be processed by another utility tool called
patch which updates one of the files to make its content identical to another file.
touch original.txt new.txt
diff original.txt new.txt -u > changes.patch
--- original.txt 2020-10-18 23:43:32.000000000 +0200
If we want to update the original file with the contents of the new one, we need to “patch” it.
patch original.txt changes.patch
NOTE: Use git-before-github repository to practice this example and all the ones which are to follow.
Now that we understand the concepts of diff and patch, we can proceed with using them in git. To generate a patch file in git, we need to use
git format-patch command.
git format-patch HEAD~1..HEAD
This will create a patch file for the latest commit. Tinker with the argument to produce patch files for other commits too. By default, each file represents a single commit and the file name is of the format
0001-<commit message>.patch. Interestingly, it is in Unix mbox format which means it has some email-like metadata (from, subject, etc.) followed by the patch data.
From 619814a3ac21012580e725136864e8397c14e20b Mon Sep 17 00:00:00 2001
This is the patch file I created to add my name to the contributors’ list of the git-before-github repository as an exercise for myself.
We have a patch file which we can send to the maintainers so that they can apply the patch. We can either use traditional ways to send the patch or use git itself, but more on it later. In this section, we will explore the process to apply a patch.
git apply <patch file> is a command which applies the changes described in the patch file locally but does not commit them. They will appear in our workspace and we can stage them and commit them. This is not ideal as even though we get the changes, the commit metadata like author name and message is lost. A better way to apply patches is
git am (abbrev. for “apply mailbox”).
git am --signoff <patch file>
--signoff is used if the commit message needs to be appended with a “Signed-off-by” line to indicate who applied the commit. Below is the signed-off and applied commit for the patch we generated as an exercise in the last section.
Author: Tarun Batra <email@example.com>
Now we can explore how to send a patch file to a maintainer in the “git” way. There are two ways that I know of –
git imap-send <patch file>
It will upload the email to an IMAP folder from where it can be sent. To use it we need to add IMAP server details in our
~/.gitconfigfile. I use Gmail so my IMAP details look like:
folder = "[Gmail]/Drafts"
host = imaps://imap.gmail.com
user = firstname.lastname@example.org
pass = yourpassword
port = 993
Gmail allows its IMAP server to be used if the Less secure app access setting is turned on.
git send-email --to=<email> <patch file>
It will use an SMTP server to send email. The corresponding config to use this command is:
smtpEncryption = tls
smtpServer = <host>
smtpUser = <user>
smtpServerPort = 587
Both of these methods can be used to send an email per patch file. The subject of the commit by default is of the format
[PATCH] <commit message>.
NOTE: Sender email addresses can be spoofed quite easily and this raises the question of authentication and authorization when submitting patches. I tried creating patches of PGP signed git commits but the patch doesn’t retain the PGP signature. We could encrypt the email itself using PGP but that’s a lot of extra work for both the contributor and the maintainer.
So what can we do with this newly acquired knowledge?
- Git patches often float around in the Linux development mailing list and it gives me satisfaction that I understand how they work a little better now than before.
- When platforms like Github and Gitlab have an outage, work in most of the development teams stall. Now I know a way to get my code reviewed even in these situations. (Okay, that might be a stretch)
- EDIT: About when this article was published,
youtube-dlwas taken down by Github due to a DMCA request. The whole episode brings to light that Github being a platform is subject to censorship, unlike git which is distributed.
You can practice for yourself by adding your name to the
CONTRIBUTORS.txt file of the git-before-github repository, committing it, and then sending me the patch. I would readily
merge apply it. 😀