Gitpatch attempts to solve this. Supports versioned patches and patch stacks (aka stacked PRs). Also handles force-pushes in stacks correctly even without Change-IDs using heuristics based on title, author date etc. It should also be unusually fast. Disclosure: I'm the author.
I'm not convinced that review comments as commits make thing easier, but I think storing them in git in some way is a good idea (i.e. git annotations or in commit messages after merge etc)
I'm a pre-mercurial arcanist refugee who tends to promote Graphite in teams that are still struggling with mega-PRs and merge commits and other own goal GitHub-isms. Big fan in general even with the somewhat rocky scaling road we've been on :)
And I very much appreciate both the ambition and results that come from making it interop with PRs, its a nightmare problem and its pretty damned amazing it works at all, let alone most of the time.
I would strongly lobby for a prescriptive mode where Graphite initializes a repository with hardcore settings that would allow it to make more assumptions about the underlying repo (merge commits, you know the list better than I do).
We've talked about a "safe mode" where we initialize it similar to JJ - such that you can no longer directly run git commands without funneling them thru graphite, but which would make it bulletproof. Would that be interesting?
I find the idea of using git for code reviews directly quite compelling. Working with the change locally as you were the one who made it is very convenient, considering the clunky read-only web UI.
I didn't get why stick with the requirement that review is a single commit? To keep git-review implementation simple?
I wonder if approach where every reviewer commits their comments/fixes to the PR branch directly would work as well as I think it would. One might not even need any additional tools to make it convenient to work with. This idea seems like a hybrid of traditional github flow and a way Linux development is organized via mailing lists and patches.
> When I review code, I like to pull the source branch locally. Then I soft-reset the code to mere base, so that the code looks as if it was written by me.
This is eerily similar to how I review large changes that do not have a clear set of commits. The real problem is working with people that don’t realize that if you don’t break work down into small self contained units, everybody else is going to have to do it individually. Nobody can honestly say they can review tons of diffs to a ton of files and truly understand what they’ve reviewed.
I use the GitHub Pull Request extension in VSCode to do the same thing (reviewing code locally in my editor). It works pretty well, and you can add/review comments directly in the editor.
Well my employer chooses to use GitHub so I don’t have a choice there. And it’s vendor lock-in VSCode but that’s already my primary editor so it means there’s no need to learn another tool just for code review.
I use this a lot too. Also, if you open a PR on the GitHub website and press the “.” key, it opens the review in VSCode, which I consider a much better web experience.
It's so cool that Git is considering first class change IDs!! That's huge! This sounds similar to what we had at Facebook to track revisions in Phabricator diffs. Curious if anyone knows the best place to read about this?
Agree with your pain points. One thing id add is GitHub makes you reapprove every PR after each push. As an OSS contributor it’s exhausting to chase re-approvals for minor tweaks.
If you want to remain relevant in the AI-enabled software engineering future, you MUST get very good at reviewing code that you did not write.
AI can already write very good code. I have led teams of senior+ software engineers for many years. AI can write better code than most of them can at this point.
Educational establishments MUST prioritize teaching code review skills, and other high-level leadership skills.
Yeah, LLMs can do that very well, IMO. As an experienced reviewer, the "shape" of the code shouldn't inform correctness, but it can be easy to fall into this pattern when you review code. In my experience, LLMs tend to conflate shape and correctness.
> you MUST get very good at reviewing code that you did not write.
I find that interesting. That has always been the case at most places my friends and I have worked at that have proper software engineering practices, companies both very large and very small.
> AI can already write very good code. I have led teams of senior+ software engineers for many years. AI can write better code than most of them can at this point.
I echo @ZYbCRq22HbJ2y7's opinion. For well defined refactoring and expanding on existing code in limited scope they do well, but I have not seen that for any substantial features especially full-stack ones, which is what most senior engineers I know are finding.
If you are really seeing that then I would either worry about the quality of those senior+ software engineers or the metrics you are using to assess the efficacy of AI vs. senior+ engineers. You don't have to even show us any code: just tell us how you objectively came to that conclusions and what is the framework you used to compare them.
> Educational establishments MUST prioritize teaching code review skills
Perhaps more is needed but I don't know about "prioritizing"? Code review isn't something you can teach as a self-contained skill.
> and other high-level leadership skills.
Not everyone needs to be a leader and not everyone wants to be a leader. What are leadership skills anyway? If you look around the world today, it looks like many people we call "leaders" are people accelerating us towards a dystopia.
There is no reason to think that code review will magically be spared by the AI onslaught while code writing falls, especially as devs themselves lean more on the AI and have less and less experience coding every day.
There just hasn't been as many resources yet poured into improving AI code reviews as there has for writing code.
And in the end the whole paradigm itself may change.
I’m considered one of the stronger code reviewers on the team, what grinds my gears is seeing large, obviously AI heavy PRs and finding a ton of dumb things wrong with them. Things like totally different patterns and even bugs. I’ve lost trust that the person putting up the PR has even self reviewed their own code and has verified it does what they intend.
If you’re going to use AI you have to be even more diligent and self reviewed your code, otherwise you’re being a shitty team mate.
Same. I work at a place that has gone pretty hard into AI coding, including onboarding managers into using it to get them into the dev lifecycle, and it definitely puts an inordinate amount of pressure on senior engineers to scrutinize PRs much more closely. This includes much more thorough reviews of tests as well since AI writes both the implementation and tests.
It's also caused an uptick in inbound to dev tooling and CI teams since AI can break things in strange ways since it lacks common sense.
While I like the post and agree with everything the author talked about I find that this is not my problem. Despite having a similar workflow (classic vim user). The problem I have and I think a lot of others have too is that review just doesn't actually exist. LGTMs are not reviews, yet so common.
I'm not sure there's even a tech solution to this class of problems and it is down to culture. LGTMs exist because it satisfies the "letter of the law" but not the spirit. Classic bureaucracy problem combined with classic engineer problems. It feels like there are simple solutions but LGTMs are a hack. You try to solve this by requiring reviews but LGTMs are just a hack to that. Fundamentally you just can't measure the quality of a review[0]. Us techie types and bureaucrats have a similar failure mode: we like measurements. But a measurement of any kind is meaningless without context. Part of the problem is that businesses treat reviewing as a second class citizen. It's not "actual work" so shouldn't be given preference, which excuses the LGTM style reviews. Us engineers are used to looking at metrics without context and get lulled into a false sense of security, or convince ourselves that we can find a tech solution to this stuff. I'm sure someone's going to propose a LLM reviewer and hey, it might help, but it won't address the root problems. The only way to get good code reviews is for them to be done by someone capable of writing the code in the first place. Until the LLMs can do all the coding they won't make this problem go away, even if they can improve upon the LGTM bar. But that's barely a bar, it's sitting on the floor.
The problem is cultural. The problem is that code reviews are just as essential to the process as writing the code itself. You'll notice that companies that do good code review already do this. Then it is about making this easier to do! Reducing friction is something that should happen and we should work on, but you could make it all trivial and it wouldn't make code reviews better if they aren't treated as first class citizens.
So while I like the post and think the tech here is cool, you can't engineer your way out of a social problem. I'm not saying "don't solve engineering problems that exist in the same space" but I'm making the comment because I think it is easy to ignore the social problem by focusing on the engineering problem(s). I mean the engineering problems are magnitudes easier lol. But let's be real, avoiding addressing this, and similar, problems only adds debt. I don't know what the solution is[1], but I think we need to talk about it.
[0] Then there's the dual to LGTM! Code reviews exist and are detailed but petty and overly nitpicky. This is also hacky, but in a very different way. It is a misunderstanding of what review (or quality control) is. There's always room for criticism as nothing you do, ever, will be perfect. But finding problems is the easy part. The hard part is figuring out what problems are important and how to properly triage them. It doesn't take a genius to complain, but it does take an expert to critique. That's why the dual can even be more harmful as it slows progress needlessly and encourages the classic nerdy petty bickering over inconsequential nuances or over unknowns (as opposed to important nuances and known unknowns). If QC sees their jobs as finding problems and/or their bosses measure their performance based on how many problems they find then there's a steady state solution as the devs write code with the intentional errors that QC can pick up on, so they fulfill their metric of finding issues, and can also easily be fixed. This also matches the letter but not the spirit. This is why AI won't be able to step in without having the capacity of writing the code in the first place, which solves the entire problem by making it go away (even if agents are doing this process).
[1] Nothing said here actually presents a solution. Yes, I say "treat them as first class citizens" but that's not a solution. Anyone trying to say this, or similar things, is a solution is refusing to look at all the complexities that exist. It's as obtuse as saying "creating a search engine is easy. All you need to do is index all (or most) of the sites across the web." There's so much more to the problem. It's easy to over simplify these types of issues, which is a big part of why they still exist.
Part of the problem is that businesses treat reviewing as a second class citizen. It's not "actual work" so shouldn't be given preference, which excuses the LGTM style reviews.
I've been out of the industry for a while but I felt this way years ago. As long as everybody on the team has coding tasks, their review tasks will be deprioritized. I think the solution is to make Code Reviewer a job and hire and pay for it, and if it's that valuable the industry will catch on.
I would guess that testing/QA followed a similar trajectory where it had to be explicitly invested in and made into a job to compete for or it wouldn't happen.
I can be totally wrong, but I feel like that's a thing that sounds better on paper. I'm sure there's ways to do this correctly but every instance I've seen has created division and paid testers/QC less. Which I'd say the lower pay is a strong signal of it being considered second class. Has anyone seen this work successfully?
I also think there's benefits to review being done by devs. They're already deep in the code and review does have a side benefit of broadening that scope. Helping people know what others are doing. Can even help serve as a way to learn and improve your development. I guess the question is how valuable these things are?
Gitpatch attempts to solve this. Supports versioned patches and patch stacks (aka stacked PRs). Also handles force-pushes in stacks correctly even without Change-IDs using heuristics based on title, author date etc. It should also be unusually fast. Disclosure: I'm the author.
I'm not convinced that review comments as commits make thing easier, but I think storing them in git in some way is a good idea (i.e. git annotations or in commit messages after merge etc)
Just taking a step back, it is SO COOL to me to be reading about stacked pull requests on HN.
When we started graphite.dev years ago that was a workflow most developers had never heard of unless they had previously been at FB / Google.
Fun to see how fast code review can change over 3-4yrs :)
I'm a pre-mercurial arcanist refugee who tends to promote Graphite in teams that are still struggling with mega-PRs and merge commits and other own goal GitHub-isms. Big fan in general even with the somewhat rocky scaling road we've been on :)
And I very much appreciate both the ambition and results that come from making it interop with PRs, its a nightmare problem and its pretty damned amazing it works at all, let alone most of the time.
I would strongly lobby for a prescriptive mode where Graphite initializes a repository with hardcore settings that would allow it to make more assumptions about the underlying repo (merge commits, you know the list better than I do).
I think that's what could let it be bulletproof.
We've talked about a "safe mode" where we initialize it similar to JJ - such that you can no longer directly run git commands without funneling them thru graphite, but which would make it bulletproof. Would that be interesting?
Dude, I love Graphite.
Best AI code review, hands down. (And I’ve tried a few.)
I miss the fig workflow :-(
Try `jj`, as others have mentioned. It's being built by the team that built/maintains fig, and the are porting all their learnings into that.
hell yeah
The biggest grip I have with Github is the app is painfully slow. And by slow, I mean browser tab might freeze level slow.
Shockingly, the best code review tool I've ever used was Azure DevOps.
> The biggest grip I have with Github is the app is painfully slow. And by slow, I mean browser tab might freeze level slow.
Javascript at scale combined with teams that have to move fast and ship features is a recipe for this.
At least it's not Atlassian.
nit: gripe, not grip :-P
I find the idea of using git for code reviews directly quite compelling. Working with the change locally as you were the one who made it is very convenient, considering the clunky read-only web UI.
I didn't get why stick with the requirement that review is a single commit? To keep git-review implementation simple?
I wonder if approach where every reviewer commits their comments/fixes to the PR branch directly would work as well as I think it would. One might not even need any additional tools to make it convenient to work with. This idea seems like a hybrid of traditional github flow and a way Linux development is organized via mailing lists and patches.
is github's PR considered read-only?
i've had team members edit a correction as a "suggestion" comment and i can approve it to be added as a commit on my branch.
> When I review code, I like to pull the source branch locally. Then I soft-reset the code to mere base, so that the code looks as if it was written by me.
This is eerily similar to how I review large changes that do not have a clear set of commits. The real problem is working with people that don’t realize that if you don’t break work down into small self contained units, everybody else is going to have to do it individually. Nobody can honestly say they can review tons of diffs to a ton of files and truly understand what they’ve reviewed.
The whole is more than just the sum of the parts.
For those that want an easy button. Here ya go.
``` review () { if [[ -n $(git status -s) ]] then echo 'must start with clean tree!' return 1 fi
} ```I use the GitHub Pull Request extension in VSCode to do the same thing (reviewing code locally in my editor). It works pretty well, and you can add/review comments directly in the editor.
It's better, but still quite deep vendor lock-in (in both GitHub and VSCode).
Well my employer chooses to use GitHub so I don’t have a choice there. And it’s vendor lock-in VSCode but that’s already my primary editor so it means there’s no need to learn another tool just for code review.
GitHub may be dominant, but it's not like it doesn't have competitors nipping at its heels (GitLab, BitBucket come to mind).
VSCode is open source, and there are plenty of IDEs...
I guess I'm just focused on different lock-in concerns than you are.
JetBrains IDEs can do the same.
Unfortunately it’s not feature complete - you can’t paste images in review comments, for example. Still very useful for large PRs though.
I use this a lot too. Also, if you open a PR on the GitHub website and press the “.” key, it opens the review in VSCode, which I consider a much better web experience.
TIL thanks.
It's so cool that Git is considering first class change IDs!! That's huge! This sounds similar to what we had at Facebook to track revisions in Phabricator diffs. Curious if anyone knows the best place to read about this?
Agree with your pain points. One thing id add is GitHub makes you reapprove every PR after each push. As an OSS contributor it’s exhausting to chase re-approvals for minor tweaks.
mmmm this is up to each repo/maintainer's settings.
To be fair you don't know if one line change is going to absolutely compromise a flow. OSS needs to maintain a level of disconnect to be safe vs fast.
Good to know! Never been a maintainer before so I thought that was required.
This is a security setting that the author has chosen to enable.
Hm that’s not the case for my repositories? Maybe you have a setting enabled for that?
Essentially, you are turning fork/branch induced changes to "precommit" review like workflow which is great.
I was on a lookout for best "precommit" review tool and zeroed on Magit, gitui, Sublime Merge.
I am not an emac user, so i'll have to learn this.
In theory this functionality would be best suited as a git subcommand.
I suggest `git-precom` for conciseness.
Git already has `git add -p` but demands a lot from user.
Git demands a lot from user in general.
If you want to remain relevant in the AI-enabled software engineering future, you MUST get very good at reviewing code that you did not write.
AI can already write very good code. I have led teams of senior+ software engineers for many years. AI can write better code than most of them can at this point.
Educational establishments MUST prioritize teaching code review skills, and other high-level leadership skills.
> AI can already write very good code
Debatable, with same experience, depends on the language, existing patterns, code base, base prompts, and complexity of a task
How about AI can write large amounts of code that might look good out of context.
Yeah, LLMs can do that very well, IMO. As an experienced reviewer, the "shape" of the code shouldn't inform correctness, but it can be easy to fall into this pattern when you review code. In my experience, LLMs tend to conflate shape and correctness.
> you MUST get very good at reviewing code that you did not write.
I find that interesting. That has always been the case at most places my friends and I have worked at that have proper software engineering practices, companies both very large and very small.
> AI can already write very good code. I have led teams of senior+ software engineers for many years. AI can write better code than most of them can at this point.
I echo @ZYbCRq22HbJ2y7's opinion. For well defined refactoring and expanding on existing code in limited scope they do well, but I have not seen that for any substantial features especially full-stack ones, which is what most senior engineers I know are finding.
If you are really seeing that then I would either worry about the quality of those senior+ software engineers or the metrics you are using to assess the efficacy of AI vs. senior+ engineers. You don't have to even show us any code: just tell us how you objectively came to that conclusions and what is the framework you used to compare them.
> Educational establishments MUST prioritize teaching code review skills
Perhaps more is needed but I don't know about "prioritizing"? Code review isn't something you can teach as a self-contained skill.
> and other high-level leadership skills.
Not everyone needs to be a leader and not everyone wants to be a leader. What are leadership skills anyway? If you look around the world today, it looks like many people we call "leaders" are people accelerating us towards a dystopia.
There is no reason to think that code review will magically be spared by the AI onslaught while code writing falls, especially as devs themselves lean more on the AI and have less and less experience coding every day.
There just hasn't been as many resources yet poured into improving AI code reviews as there has for writing code.
And in the end the whole paradigm itself may change.
I’m considered one of the stronger code reviewers on the team, what grinds my gears is seeing large, obviously AI heavy PRs and finding a ton of dumb things wrong with them. Things like totally different patterns and even bugs. I’ve lost trust that the person putting up the PR has even self reviewed their own code and has verified it does what they intend.
If you’re going to use AI you have to be even more diligent and self reviewed your code, otherwise you’re being a shitty team mate.
Same. I work at a place that has gone pretty hard into AI coding, including onboarding managers into using it to get them into the dev lifecycle, and it definitely puts an inordinate amount of pressure on senior engineers to scrutinize PRs much more closely. This includes much more thorough reviews of tests as well since AI writes both the implementation and tests.
It's also caused an uptick in inbound to dev tooling and CI teams since AI can break things in strange ways since it lacks common sense.
AI can review code. No need for human involvement.
>remote-first web-interface
https://youtu.be/Qscq3l0g0B8
While I like the post and agree with everything the author talked about I find that this is not my problem. Despite having a similar workflow (classic vim user). The problem I have and I think a lot of others have too is that review just doesn't actually exist. LGTMs are not reviews, yet so common.
I'm not sure there's even a tech solution to this class of problems and it is down to culture. LGTMs exist because it satisfies the "letter of the law" but not the spirit. Classic bureaucracy problem combined with classic engineer problems. It feels like there are simple solutions but LGTMs are a hack. You try to solve this by requiring reviews but LGTMs are just a hack to that. Fundamentally you just can't measure the quality of a review[0]. Us techie types and bureaucrats have a similar failure mode: we like measurements. But a measurement of any kind is meaningless without context. Part of the problem is that businesses treat reviewing as a second class citizen. It's not "actual work" so shouldn't be given preference, which excuses the LGTM style reviews. Us engineers are used to looking at metrics without context and get lulled into a false sense of security, or convince ourselves that we can find a tech solution to this stuff. I'm sure someone's going to propose a LLM reviewer and hey, it might help, but it won't address the root problems. The only way to get good code reviews is for them to be done by someone capable of writing the code in the first place. Until the LLMs can do all the coding they won't make this problem go away, even if they can improve upon the LGTM bar. But that's barely a bar, it's sitting on the floor.
The problem is cultural. The problem is that code reviews are just as essential to the process as writing the code itself. You'll notice that companies that do good code review already do this. Then it is about making this easier to do! Reducing friction is something that should happen and we should work on, but you could make it all trivial and it wouldn't make code reviews better if they aren't treated as first class citizens.
So while I like the post and think the tech here is cool, you can't engineer your way out of a social problem. I'm not saying "don't solve engineering problems that exist in the same space" but I'm making the comment because I think it is easy to ignore the social problem by focusing on the engineering problem(s). I mean the engineering problems are magnitudes easier lol. But let's be real, avoiding addressing this, and similar, problems only adds debt. I don't know what the solution is[1], but I think we need to talk about it.
[0] Then there's the dual to LGTM! Code reviews exist and are detailed but petty and overly nitpicky. This is also hacky, but in a very different way. It is a misunderstanding of what review (or quality control) is. There's always room for criticism as nothing you do, ever, will be perfect. But finding problems is the easy part. The hard part is figuring out what problems are important and how to properly triage them. It doesn't take a genius to complain, but it does take an expert to critique. That's why the dual can even be more harmful as it slows progress needlessly and encourages the classic nerdy petty bickering over inconsequential nuances or over unknowns (as opposed to important nuances and known unknowns). If QC sees their jobs as finding problems and/or their bosses measure their performance based on how many problems they find then there's a steady state solution as the devs write code with the intentional errors that QC can pick up on, so they fulfill their metric of finding issues, and can also easily be fixed. This also matches the letter but not the spirit. This is why AI won't be able to step in without having the capacity of writing the code in the first place, which solves the entire problem by making it go away (even if agents are doing this process).
[1] Nothing said here actually presents a solution. Yes, I say "treat them as first class citizens" but that's not a solution. Anyone trying to say this, or similar things, is a solution is refusing to look at all the complexities that exist. It's as obtuse as saying "creating a search engine is easy. All you need to do is index all (or most) of the sites across the web." There's so much more to the problem. It's easy to over simplify these types of issues, which is a big part of why they still exist.
Part of the problem is that businesses treat reviewing as a second class citizen. It's not "actual work" so shouldn't be given preference, which excuses the LGTM style reviews.
I've been out of the industry for a while but I felt this way years ago. As long as everybody on the team has coding tasks, their review tasks will be deprioritized. I think the solution is to make Code Reviewer a job and hire and pay for it, and if it's that valuable the industry will catch on.
I would guess that testing/QA followed a similar trajectory where it had to be explicitly invested in and made into a job to compete for or it wouldn't happen.
I can be totally wrong, but I feel like that's a thing that sounds better on paper. I'm sure there's ways to do this correctly but every instance I've seen has created division and paid testers/QC less. Which I'd say the lower pay is a strong signal of it being considered second class. Has anyone seen this work successfully?
I also think there's benefits to review being done by devs. They're already deep in the code and review does have a side benefit of broadening that scope. Helping people know what others are doing. Can even help serve as a way to learn and improve your development. I guess the question is how valuable these things are?
I've used Reviewboard and Phabricator and both seem "fine" to me. Superior to Github (at the time, anyway).
I was recently looking for something that at least presents a nice diff that resembles code review one in neovim.
This is a pretty cool tool for it: https://github.com/sindrets/diffview.nvim
On the branch that you are reviewing, you can do something like this:
:DiffviewOpen origin/HEAD...HEAD
ersc.io
I got into using Jujutsu this year. I'm liking it so far. Is there a beta access in the works?
Say more.