Modifying other people's software

(natkr.com)

53 points | by todsacerdoti 4 days ago ago

27 comments

skydhash 6 hours ago ago
Maybe I can't understand what TFA is describing, but from what I know a patch is usually tied to a specific commit, so a very specific point of time in the upstream lifetime. It does not make sense to have it lingering longer than that. Even in the case when you want to maintain a set of patches (package building,...) you usually revise it every new version of the software. In this case, the intent is much more important than the how (which quickly become history).
[-]
- lmm 4 hours ago ago
  The point is to maintain your set (perhaps stack) of patches as a set of patches on top of upstream for the long term. Yes, you will probably have to revise them as upstream changes, but this will let you maintain their identity as you do so. Is that something you will find useful? Maybe, maybe not.
- random3 2 hours ago ago
  You’re thinking a patch is text, but should think of it as a logical change. Unless the logic becomes part of upstream the patch is not tied to a specific point in “time”. There’s a cost to it, as you have to constantly rebase. This is the case with any non-vanilla distribution (e.g. Linux), although it’s also at a package level so you do this both for each package as well across every package. For well written code there’s reasonably low coupling so it’s less work to maintain.
- doix 5 hours ago ago
  Yes, I don't quite get it. When I need to maintain a fork, I just add an extra remote to git. Then I fetch upstream (what I call my remote) and rebase my changes against whatever branch I'm following. At any point in time I can generate a patch file that works for whatever version I have rebased against.
  Seems easy enough, I read the article multiple times and I don't get why what they are describing is needed.
  [-]
  - Nullabillity 5 hours ago ago
    (Author here.)
    The difference is that git rebasing is a destructive operation, you lose track of the old version when you do it. (Yes, there's technically the reflog.. but it's much less friendly to browse, and there's no way to share it across a team.)
    Maybe that's an okay tradeoff for something you use by yourself, but it gets completely untenable when you're multiple people maintaining it together, because constantly rebasing branches completely breaks Git's collaboration model.
    [-]
    - doix 4 hours ago ago
      I worked at a place that was allergic to contributing patches upstream. We maintained a lot of internal forks for things and had no problem collaborating.
      You don't need to push the rebased branch to the same branch on your remote, if that's an issue (although I don't see how it is).
      Maybe this is a case of "Dropbox is just rsync", but I feel like just learning git and using it is easier than learning a new tool.
      [-]
      - nicoburns 2 hours ago ago
        We do this for some of the components that are shared between Servo and Firefox. Firefox is upstream, and on the Servo side we have automated and manual syncing. The automated syncing mirrors the upstream `main` branch to our `upstream` without changes daily. The manual syncing rebases our changes on top a new upstream version through a manual rebase process. This happens monthly and each sync is pushed to a new branch to maintain history.
        Between monthly syncs we push our own changes to our latest monthly branch (which also get manually sent upstream when we get a chance).
    - cobbzilla 4 hours ago ago
      I see — you’re doing more than “here’s a few patches to keep working across revisions”, you’re doing separate-path feature work on a different, actively-developed project.
      To me that sounds like not a great idea, but if you must do it, I could see some usefulness to this.
      [-]
      - Nullabillity 4 hours ago ago
        Yeah. For reference, this is a typical patchset for the project that motivated it.[0] Some of the patches are "routine" dependency upgrades, some of them are bugfix backports, some of them are original work that we were planning to upstream but hadn't got around to yet. Some are worth keeping when upgrading to a new upstream version, some aren't.
        I agree that it's not ideal, but... there are always tradeoffs to manage.
        [0]: https://github.com/stackabletech/docker-images/tree/e30798ac...
- cobbzilla 5 hours ago ago
  Agreed. If you want your change and don’t want to bother the maintainers with a patch they are unlikely to accept, or can’t because it’s proprietary: fork the repo (at whatever tag makes sense), then periodically sync with the latest code for that version.
  The likelihood of conflicts is minimal, and often if you see conflicts it’s a good indication your issue may have been resolved. Or if not, you can see if it’s still needed, or how to adjust it.
  [-]
  - Nullabillity 4 hours ago ago
    (Author here.)
    > fork the repo (at whatever tag makes sense), then periodically sync with the latest code for that version.
    Yeah, this is the workflow that Lappverk is trying to enable.
    The problem is that neither of Git's collaboration models works well for this problem. Rebasing breaks collaboration (and history for the patchset itself), and merging quickly loses track of individual patches. Lappverk is an attempt to provide a safer way to collaborate over the rebase workflow.
- what 5 hours ago ago
  A patch just encapsulates what was added and removed in a particular change, it doesn’t care about any commits.
- shmerl 4 hours ago ago
  For example wine-staging (ran by Wine developers themselves) hosts patches for Wine project and they revise / rebase them with each Wine version, which is often not a trivial task. I don't see how you can avoid that really. But Wine staging itself is a git repository that holds patches (and their history) if that helps, which indeed can stay there for years.
  Same happens with patches that Debian applies on top of fixed versions of packages. They are stored in Debian's Salsa git.
praptak 44 minutes ago ago
You may have a look at Quilt. I doesn't solve the problem the author described but may help you once you accept there is no easy solution in sight.
Quilt is automation for the "bag of patches" model. I used it once when I needed to upgrade the internal bag of patches at $big_corp so as to apply them to a newer version of $public_app. It was predictably complex but somehow still manageable.
If you squint a bit then the [bag of patches] + [automated application in order] is a poor man's Git. If you keep this in a git repo then you're basically versioning repos (poor man's ones) in a repo. It almost sounds like the solution to author's problem :)
userbinator 7 hours ago ago
Many times I've just patched the binary even if source is available, because trying to reproduce the binary you currently have, with only the changes you want and everything else the same, can be an even more difficult exercise than simply changing a string or constant.
[-]
- PhilipRoman 3 hours ago ago
  Lol I remember doing this when I was younger with the `man` command to remove a 5 second exit delay for the browser output.
```
    radare2 -qq -w -c "wx 01 @ 0xb407" /usr/bin/man
```
- taneq 4 hours ago ago
  Especially if you make a habit of patching the binary instead of rebuilding from source! ;)
anilakar 3 hours ago ago
I once wrote a small C++ wrapper for POSIX dlfcn.h. Someone sent a pull request that would have turned it into a Windows-only library.
[-]
- yjftsjthsd-h 3 hours ago ago
  Like... Intentionally, or because they unthinkingly did something non-portable?
attila-lendvai 2 hours ago ago
whenever i rebase longstanding commits in my fork, i keep the previous branch by appending the date to its name.
reading the readme didn't make it clear to me how this app would make my life any easier (also considering the added complexity of a new tool).
[-]
- attila-lendvai 2 hours ago ago
  don't get me wrong, it's a PITA... but how would it hurt less using this tool?
  i rarely, if ever, need to look at the history of this.
thwarted 7 hours ago ago
The process described reminded me of "pristine source" and RPM spec files that take the upstream pristine source and patch it during the build process. Maintaining that is always a little bit of a headache if you don't do it regularly, especially having to maintain (generate and apply) a separate set of patch files for the changes and express/apply the patches in the spec file. This looks to make light work of that.
vlovich123 4 hours ago ago
Honestly I found a better strategy to name branches after the fork point and the date you started the fork. So you’d have main-2025-03-07 for a fork of main started 03-07 another main-2025-05-08 for a rebase. The patch set above that is just what you carry. I’m not sure maintaining them as literal patches is that helpful vs just keeping it as explicit patches to apply in git. But maybe this is the right strategy once your fork gets complicated but at that point you should be hard forking rather than soft forking IMO.
datadrivenangel 7 hours ago ago
Modifying source code like this is one method. For web software, bookmarklets are another great way to do that.
[-]
- bartread 7 hours ago ago
  I’m a big fan of Greasemonkey scripts for this, although these days I prefer Violentmonkey because it has several capabilities that the OG doesn’t.
cyberax 7 hours ago ago
This is supercool. One my constant problem with self-hosting is that I often need to modify just a couple of files here and there, but then I'm stuck with a forked repo or a dirty work copy.
I'm going to try to make a frontend UI for it.
[-]
- darkwater 31 minutes ago ago
  Are you talking about personal or professional self-host? Why are you constantly patching software you self-host? Not enough configurability? Using software not made for self-host? Holding it wrong? I ask because it seems...strange that you have these issues so often.