Ec workflow

From OLPC
Revision as of 12:11, 12 August 2011 by Rsmith (talk | contribs) (Blessed Repository branches)
Jump to: navigation, search

Recommended Reading

The following URL's are recommended reading to help with understanding this document. Reading the entire site is not required (but highly recommended) but at least go through the git basics, theory and terminology.

http://book.git-scm.com
http://progit.org/book

To understand the basic concepts and use of rebasing the following are recommended.

http://progit.org/book/ch3-6.html
http://book.git-scm.com/4_interactive_rebasing.html

Rational

During the 1.5 C4 bringup there were multiple paths of EC code development going on. I was working on a large rewrite of the mppt code, Bryan Ma was adding support for bit banged I2C to talk to the new ISL charger chip and Paul was integrating/refactoring Bryan's changes, assisting me with a few bits of MPPT infrastructure, and general EC maintenance. There was not any clear plans for management of merging all the resulting branches and some complex interdependencies arose from each of us merging bits an pieces of other branches.

At bringup when we need to pull all these branches into a single codebase to work from it was a painful merge. Some of the pain was self-inflicted due to my insistence that the patches get cleaned up into logical blocks and we try to keep some sort of sane history in the master branch. During this time I began to research various merge strategies to help prevent this from happening again. http://book.git-scm.com/4_interactive_rebasing.html

I eventually settled on is what is being commonly referred to as the Integration Manager (IM) system plus requiring a linear history in the main repository. In the integration manager system each developer has his own repository but there is a separate blessed repository. The blessed repo is where releases happen from. Someone is designated as the integration manager and that person is responsible for managing the blessed repo. When a developer has a series of patches ready for inclusion into the blessed repository they issue a pull request to the integration manager. The integration manager will then pull the patches from the developers repository, verify that they meet the commit requirements and if they pass then commit them to the blessed repository. This is the same system used by most projects on GitHub and more famously the Linux kernel.

The advantages of the IM strategy for EC code is less about a single person managing the blessed repository and more about each developer having their own space with which to work. Each developer has a repository that they are free to manage how they see fit. It allows for each developer to have a place to backup work in progress and also allows for other developers to be able to look at the ongoing work. There is only 1 branch in a developers repository that matters to anyone other than the developer. This is the branch that the IM pulls from when its time to merge in the patches into the blessed repository.

Not specifically related to the IM scheme is maintaining a linear history in the blessed repository. IM does not require this but the IM setup makes achieving this a bit easier than free for all merging. Linear history means that there are no conflicts that require resolution at the time of merge and each commit has only 1 parent commit. Linear history of the blessed repository is desirable for 2 main reasons.

  1. A git merge-with-conflict-resolution based workflow creates dependencies or choke points.
    Each time a merge happens that needs a conflict resolution it means that the patch as the developer created is not the patch that enters the repository. It is now that patch _plus_ additional changes added. Now the original patch cannot stand alone. This breaks a code bisection because you cannot separate the patch from the merge commit that resolves the conflict. Its also often means that during the conflict resolution the developer has to attempt to resolve conflicts in code that they did not create and thus may not fully understand.
  2. Linear history allows for logical progression of features/changes.
    Because a git merge has 2 parents the history will show patches in between patches that were previously committed. This makes it very hard to see a logical progression of features when looking at the patches one by one because all the patches that fully implement a feature may not be co-located. To see the dependency chain you have to take a lot of extra steps. Examples of linear history are how CVS and SVN repositories work. Their workflow forces linear history.

Workflow

As previously mentioned each developer creates their own copy of the blessed repository in question [1]. This developer repository is totally under the control of the developer. They are free to manage it however they see fit. They can add/delete branches or rebase things at will. They should never see any conflicts (unless they create them) because they are the only person who commits to the repository. Other EC developers can have visibility into the work in progress because the repositories are available on the main OLPC development servers. Test builds can be created, features reviewed, etc. Patches in these repositories are public (or semi-public via ssh) but not permanent. Although the code accessible, another developer should _never_ base a series of changes off of the code in a personal repository unless the developers had a specific agreement among themselves on managing those branches. The Golden Rule is that the blessed repository is the _only_ source to use for creating patches for submission.

When it comes time to submit patches to the blessed repository the developer will create a branch in their repository called 'pullme' and then send some sort of notice to the integration manager to pull the patches into the blessed repository. The IM will pull the patches examine them to make sure they meet the blessed repository guidelines and if they pass then commit them to the blessed repository. Then the developer can then go to work on the next fix or feature. If they do not pass then the developer has to rework them. Once the patches have been merged into the blessed repository the developer can delete his working copy and start over.

Prior to generating the pull request the developer needs to prepare the patches correctly. To maintain linear history it is necessary for the developer to rebase the patches on the code in the blessed repository. The IM will not integrate patches into the blessed repository that result in a merge conflict or a non-fast-forward merge. The developer must rebase to prevent this. When first exposed to rebasing it may seem complex but rebasing in this context is simple. Recommended git commands are listed in [2]. What rebasing does is take all the patches you have outstanding and replay them one by one on the tip specified repository and branch. If the patch conflicts with something already in the repository then rebasing stops and the developer is allowed to deal with the conflict. Once the conflict has been resolved the rebase continues and re-creates the patch using the newly conflict resolved code. Although it sounds complex its _exactly_ the same process that the developer would go through if a conflict occurred in merge. The difference is that the resolution of the conflict now becomes part of the patch and does not create a dependency with other patches. It allows for the developer to resolve the conflict in their own repository where they can easily re-test or fixup the results before sending a pull request to the IM.

At minimum the developer is required to rebase the patches to preserve linear history. However, rebasing allows for restructuring patches such that each patch corresponds to 1 specific fix or feature. When rebasing prior creating a pull requst its a good time to review all of the patches for cleanup work. Often, during the course of development there are many revisions to the code under work. If you commit often for backup or reference purposes then this is especially true. In these cases many of the changes are just change noise and don't really add value to the history. Git allows you to merge these interim commits into a single commit. Git speak calls this "squashing". Squashing is available from a mode of rebase called interacive rebase. The gory details of interactive rebase are beyond the scope of this document but a recommended procedure is in the commands included in [2]. There is also lots of information available via google or in the urls that were listed as recommended reading in the beginning of this document.

Although squashing commits is not strictly required the IM may request that a developer do this prior to including the patches into the blessed repository.

When reading about rebasing on the web or in various git help manuals there will mention that rebasing patches that have been placed into a public repository is very bad. All of that information is completely true _if_ the rebased patches have been used by other developers. This would seem to apply to this workflow. However, in this workflow rebasing public patches is ok because developers are forbidden from basing patches on code in another developers repository. This is the Golden Rule from above. If its broken then the pain of dealing with an upstream rebase is self-inflicted.

In addition to enforcing linear history the IM will enforce one more requirement on the submitted code. Coding style. The EC codebase has a coding style that was inherited from the original creators of the code. To enforce this style the repositories contain the 'astyle' code formatter with a config file that matches the coding style. A makefile rule called 'format' was created that will run astyle with this config file on all source files in the repository. Running this make rule prior to creating a pull request is required. Running it often is recommended. The IM will refuse to integrate patches that result in 'make format' reformatting the code. Its desirable that commits that result in formatting changes only should be squashed into changes that actually modify the code in question. The IM may request that the developer squash out format noise prior to applying the patches to the blessed repository.

The short had summary of the workflow is:

  1. update code in local repo to match the blessed repository.
  2. git checkout -b new_stuff
  3. hack, commit, push, hack, commit, push. (push goes to developers public repo)
  4. interactive rebase to squash out change noise.
  5. git rebase git://BlessedRepo
  6. git push -f (-f is required because you rebased)
  7. git branch pullme
  8. send IM pull request.
  9. if accepted delete new_stuff and pullme ; goto #1
  10. if not accepted goto #3

Blessed Repository branches

EC 1.0 (Excluded for now since Paul and I are the only people who care)

EC 1.5

EC 1.75 http://dev.laptop.org/git/users/rsmith/ec-1.75/

Examples

Example 1 Creating a new clean developer repositories from the blessed repository on dev.laptop.org

The version of git on dev.laptop.org predates the --bare option for creating a bare repository. The first 3 steps would not be necessary when on a server with a more recent git version.

  1. Make sure your name and email is correct
      $ git config --global user.name "Firstname Lastname"
      $ git config --global user.email "your_email@youremail.com"
    
  2. Create local bare repository(s)
      $ mkdir ec-1.75_ras   
      $ cd ec-1.75_ras
      $ git init --bare
      $ cd ..
  3. Copy that bare repository to dev.laptop.org
      $ scp -r ec-1.75_ras dev.laptop.org:./my_gits
  4. Delete it.
      $ rm -r ec-1.75_ras
  5. Clone the blessed repository (if necessary)
      $ git clone ssh://dev.laptop.org/git/users/rsmith/ec-1.75
      $ cd ec-1.75 
    
  6. Move to desired branch (normally master)
      $ git checkout master 
      $ git push ssh://dev.laptop.org/my_gits/ec-1.75_ras master:master
    
  7. Clone the upstream developer repository locally.
      $ cd ..
      $ git clone ssh://dev.laptop.org/my_gits/ec-1.75_ras
    

Example 2 Rebase prior to pull request

Repo prep

In the following example I refer to the blessed repository master branch. It is referenced as a remote called 'blessed' and the local branch is called 'mainline' Assuming the same 1.75 repository from example 1 the commands to add this remote to the local repository and update the refs are

 $ git remote add blessed ssh://dev.laptop.org/git/users/rsmith/ec-1.75 
 $ git fetch blessed

Then to create the mainline branch

 $ git checkout --no-track -b mainline blessed/master 

Its possible to do all of the commands below without creating a local branch of the blessed repository's code but I find that a local branch makes things easier. The --no-track option prevents accidental pushes to the upstream blessed repository if for some reason you commit things to the wrong branch. 'git push' with no arguments will push all tracked refs that it can match between the local and remote repository so if you have a tracked branch in your local repository that is named the same as the blessed repository then it will be pushed.

Here branch 'master' is the current development branch. 'master' could be replaced with any of the other blessed branches listed in [ Blessed repository branchs ]

Once you have the blessed remote added you can update your local copy by

 $ git fetch blessed
 $ git checkout mainline
 $ git pull blessed/master

You have to specify the remote/branch to git pull because you checked it out with --no-track.

It it possible to setup your remote such that 'git pull' works with no arguments but 'git push' will not try to push to the remote but this requires editing of the config values. See 'man git-config' for details.

Rebase onto blessed

  1. Branch the current work. This is not strictly necessary but I prefer to do it because even if after several rebases I decide to start over its easy to return to the starting point. Git's reflog could also be used to recover but that is a more advanced usage of git.
 $ git branch merge 
  1. Make sure you have the latest code from blessed.
 $ git checkout mainline
 $ git pull blessed/master
 $ git checkout merge
 $ git rebase mainline

If your patches do not conflict with modifications already in the repository then the rebase will exit cleanly. If there are conflicts then the standard conflict markers will be added to the corresponding areas and a note printed you are then left at a prompt. 'git status' will also show you what files were affected. Once you have resolved the conflicts mark the affected files resolved with 'git add'

 $ git add conflict1.c
 $ git add conflict2.c

In addition to the standard sort of conflict resolution git supports advanced conflict resolution via the mergtool setting see 'man git-mergetool' for details

Then continue the rebase

 $ git rebase --continue

If for some reason it all goes wrong or you want to stop for some reason you can cancel the rebase with

 $ git rebase --abort

When the rebase is complete you are left with patches in your repository that will apply cleanly to the tip of the branch you rebased against and will fast-forward so they maintain linear history.

  1. Create the 'pullme' branch and push to your public repo
 $ git checkout -b pullme 
 $ git push origin pullme 

If you already have a pullme branch in your public repository you may need to delete it or use a forced push. To delete

 $ git push origin :pullme

To force push

 $ git push -f origin pullme
  1. Notify the IM that you have new patches to pull.

Interactive rebase with squash

TODO: Fixme not finished yet.

  1. Branch work
 $ git checkout -b merge
  1. Examine the outstanding patches in your development branch that are not in the mainline branch. The not operator (^) is handy for this. This says show me all the commits in the merge branch that are not in the mainline branch
 $ git log merge ^mainline