{ speaking mind }
{ speaking mind }
Author: Paramvir Singh Karwal
Published: 2021-08-03 04:58:15.0
Website handcrafted by Paramvir Singh Karwal
Git is by far the most advanced version control system (VCS) that is out there. Git provides so much of control over the way the files are stored and versioned and without having to worry about the integrity of the files. Out of many other capabilities which makes git shine among all other VCSs comes capability of having a module within another module. Let’s go through the basic understanding of git sub modules and then see how can we can leverage it.
As we already know that a git submodule is just a module that is kept inside another module. But why would we do that? Suppose you are working on a project that is supposed to rate the candidates for some online test based on the answers they submit for your questions. Let’s say that generation of ratings of candidates is complex and it requires specific algorithms which is to be handled by a rating engine module which a third party team is developing whereas your module just has to record the responses and show the final results.
At this point it is obvious that you will have to use the rating engine module. Both of these modules are being maintained in a different repository. Now you can for sure copy the code of the rating engine module repo in your module and start using it. But what if the third party team pushes another enhancement to their module repo? You will either be using their old code assuming you are not aware of the change or you will have to copy the code again. This is inefficient approach and is not recommended.
The other thing that you can do is to add a submodule within your module which will point to the remote repo of the third party module. Now if there is any change that is pushed by the third party team, you can easily pull it using git like you would do with any other git repo. Git allows you to keep a clone of this repo as a subdirectory. This makes whole process pain less and avoids issues.
Let’s try to understand it with foo
and bar
git repo. We will keep the bar
repo as a submodule inside foo
module.
[/temp/git/demo]$ ls -lrt
total 0
drwxr-xr-x 1 PARAM 197121 0 Aug 21 02:15 foo
As of now foo
module contains just one foo_readme.txt
file.
[/temp/git/demo/foo]$ ls -lrt
total 0
-rw-r–r– 1 PARAM 197121 0 Aug 21 02:30 foo_readme.txt
Similarly we have a bar_readme.txt
file inside bar
module.
[/temp/git/demo/bar]$ ls -lrt
total 0
-rw-r–r– 1 PARAM 197121 0 Aug 21 02:46 bar_readme.txt
We will use colours to indicate the current state of modules with respect to the remote repository. This will help to visualize the changes. So the outline of a module will be blue if that module is up to date with the remote repo, red if it is behind the remote repo and green if the local repo is ahead of the remote repo. In starting we have two modules foo
and bar
which are up to date with the remote repos.
Adding a submodule bar
inside foo
:
[/temp/git/demo/foo]$ git submodule add https://github.com/paramvirkarwal/bar.git
Cloning into ‘C:/demo/foo/bar’…
remote: Counting objects: 3, done.
remote: Total 3 (delta 0), reused 3 (delta 0), pack-reused 0
Unpacking objects: 100% (3/3), done.
warning: LF will be replaced by CRLF in .gitmodules.
The file will have its original line endings in your working directory.
Check status:
[/temp/git/demo/foo]$ git status
On branch master
Your branch is up to date with ‘origin/master’.
Changes to be committed:
(use “git reset HEAD <file>…” to unstage)
new file: .gitmodules
new file: bar
Here you will notice that .gitmodules
file will be created if it is not already there and details of the submodule are added. Lets see what are the contents of this file.
[submodule “bar”]
path = bar
url = https://github.com/paramvirkarwal/bar.git
This contains the information of the submodule, the local path and the remote path. This file will be version controlled just like other files.
Let's do a git commit
[/temp/git/demo/foo]$ git commit -m “adding module bar inside foo”
[master 72d0c90] adding module bar inside foo
2 files changed, 4 insertions(+)
create mode 100644 .gitmodules
create mode 160000 bar
See below the illustration of current state of both modules:
As we just committed changes in foo module the outline shown is green which means local repo is ahead of the remote repo which means that the changes of foo
are not yet pushed to remote. Notice that outline of bar
submodule is still blue as there are no changes done for this module. Now let’s push the foo
changes to remote repo.
[/temp/git/demo/foo]$ git push origin master
Counting objects: 3, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 403 bytes | 403.00 KiB/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To https://github.com/paramvirkarwal/foo.git
25ef758..72d0c90 master -> master
[/temp/git/demo/foo]$ git status
On branch master
Your branch is up to date with ‘origin/master’.
nothing to commit, working tree clean
Let’s see the diagrammatic illustration again. As at this point both the modules are up to date with remote repo their outline is shown as blue.
Now let’s say that another developer / third party made a change in the bar
repo by adding a new file instructions.txt
in the remote repo. Try to imagine what will be the state of these modules with respect to their corresponding remote repos? Interestingly git status
on foo
will show that the foo
module is up to date with the remote repo.
[/temp/git/demo/foo]$ git remote update
Fetching origin
[/temp/git/demo/foo]$ git status
On branch master
Your branch is up to date with ‘origin/master’.
nothing to commit, working tree clean
Try doing the same with the bar
sub module inside foo
module.
[/temp/git/demo/foo/bar]$ git remote update
Fetching origin
remote: Counting objects: 2, done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 2 (delta 0), reused 2 (delta 0), pack-reused 0
Unpacking objects: 100% (2/2), done.
From https://github.com/paramvirkarwal/bar
95b2f8a..2be626a master -> origin/master
[/temp/git/demo/foo/bar]$ git status
On branch master
Your branch is behind ‘origin/master’ by 1 commit, and can be fast-forwarded.
(use “git pull” to update your local branch)
nothing to commit, working tree clean
It will say that the local repo is behind the remote repo. Now let’s put a diagram corresponding to it. The bar
repo is shown with the red outline as it behind the remote repo.
Now use git pull
inside the bar
submodule to get the latest code and do a git status
.
[/temp/git/demo/foo/bar]$ git pull
Updating 95b2f8a..2be626a
Fast-forward
instructions.txt | 0
1 file changed, 0 insertions(+), 0 deletions(-)
create mode 100644 instructions.txt
[/temp/git/demo/foo/bar]$ git status
On branch master
Your branch is up to date with ‘origin/master’.
nothing to commit, working tree clean
Now it shows that the local bar repo is up to date with the remote bar
repo. That’s good! But wait, what happens to the foo
module which is the parent of the bar
submodule? Do keep in mind that we have modified a subdirectory inside the foo
. As the contents inside the foo were modified, the local foo
module is now ahead of the remote foo repo. Try doing a git status for the foo module.
[/temp/git/demo/foo]$ git status
On branch master
Your branch is up to date with ‘origin/master’.
Changes not staged for commit:
(use “git add <file>…” to update what will be committed)
(use “git checkout — <file>…” to discard changes in working directory)
modified: bar (new commits)
no changes added to commit (use “git add” and/or “git commit -a”)
Notice that foo
sees as a new commit in the bar
repo and at this point of time it treats/shows bar
as one entity. It does not show the individual changes in the bar
repo (Well of course you can see it using git diff –cached –submodule
command )
Let's commit the changes:
[/temp/git/demo/foo]$ git add bar
[/temp/git/demo/foo]$ git commit -m “updated submodule”
[master afdcd73] updated submodule
1 file changed, 1 insertion(+), 1 deletion(-)
[/temp/git/demo/foo]$ git status
On branch master
Your branch is ahead of ‘origin/master’ by 1 commit.
(use “git push” to publish your local commits)
nothing to commit, working tree clean
The local foo
module is now ahead of the remote repo. Now let’s put a diagram corresponding to it as we have been doing. The foo
module is shown as green as it is ahead of the remote repo and bar
is shown with blue outline because we already got the latest code for bar using git pull
.
Now go ahead and push the change for foo
to remote repo as well.
[/temp/git/demo/foo]$ git push origin master
Counting objects: 2, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (2/2), 301 bytes | 301.00 KiB/s, done.
Total 2 (delta 0), reused 0 (delta 0)
To https://github.com/paramvirkarwal/foo.git
72d0c90..afdcd73 master -> master
[/temp/git/demo/foo]$ git status
On branch master
Your branch is up to date with ‘origin/master’.
nothing to commit, working tree clean
Now the diagram should look like below. Both of the modules are now up to date with the remote repos and which is why these are shown with blue outline.
There are much more additional things that you can do with the git submodules. You can refer the man page for it. From the man page of git:-
$ git submodule help
usage: git submodule [–quiet] add [-b <branch>] [-f|–force] [–name <name>] [–reference <repository>] [–] <repository> [<path>]
or: git submodule [–quiet] status [–cached] [–recursive] [–] [<path>…]
or: git submodule [–quiet] init [–] [<path>…]
or: git submodule [–quiet] deinit [-f|–force] (–all| [–] <path>…)
or: git submodule [–quiet] update [–init] [–remote] [-N|–no-fetch] [-f|–force] [–checkout|–merge|–rebase] [–[no-]recommend-shallow] [–reference <repository>] [–recursive] [–] [<path>…]
or: git submodule [–quiet] summary [–cached|–files] [–summary-limit <n>] [commit] [–] [<path>…]
or: git submodule [–quiet] foreach [–recursive] <command>
or: git submodule [–quiet] sync [–recursive] [–] [<path>…]
or: git submodule [–quiet] absorbgitdirs [–] [<path>…]
Please feel free to leave your comments.
18-Sep-2019 23:43
Very helpful thanks for sharing!!????
21-Sep-2019 22:12
Thanks Kamal, like always! :)
In distributed systems architecture we have to make a choice between below two
— Param (@paramvirsingh_k) May 2, 2021
-data consistency [C] among all the nodes all the time
-availability [A] of the application to the user
with mandatory partition tolerance [P] (network failure between nodes should be recoverable)
Of the 5 SOLID design principles , if you find any one of them properly implemented in some well written code there is a good chance that other one's are also lying around in the same code.
— Param (@paramvirsingh_k) August 30, 2020
These design principles go hand in hand.