> Oddly they don’t seem to have figured out the generation counting trick, which is something I did come up with over twenty years ago. Combining the two ideas is what allows for there to be no reference to commit ids in the history and have the entire algorithm be structural.
Can you say more about this? What exactly is this trick you’re talking about? What are the benefits?
I've read much of the HN discussion on the previous post, a skimmed the rest, but I didn't see a couple of things addressed:
First, how could you make this deal with copies and renames? It seems to me like the pure version of this would require a weave of your whole repository.
Second, how different is this from something like jujutsu? As in, of course it's different, your primary data structure is a weave. But jj keeps all of the old commits around for any change (and prevents git from garbage collecting them by maintaining refs from the op log). So in theory, you could replay the entire history of a file at a particular commit by tracing back through the evolog. That, plus the exact diff algorithm, seems like enough to recreate the weave at any point in time (well, at any commit), and so you could think of this whole thing as a caching layer on top of what jj already provides. I'm not saying you would want to implement it like that, but conceptually I don't see the difference and so there might be useful "in between" implementations to consider.
In fact, you could even specify different diff algorithms for different commits if you really wanted to. Which would be a bit of a mess, because you'd have to store that and a weave would be a function of those diff algorithms and when they were used, but it would at least be possible. (Cohen's system could do this too, at the cost of tracking lots of stuff that currently it doesn't need or want to track.) I'm skeptical that this would be useful except in a very limited sense (eg you could switch diff algorithms and have all new commits use the new one, without needing to rebuild your entire repository). It breaks distributed scenarios -- everyone has to agree on which diff to use for each commit. It's just something that falls out of having the complete history available.
I'm cheating with jj a bit here, since normally you won't be pushing the evolog to remotes so in practice you probably don't have the complete history. In practice, when pushing to a remote you might want to materialize a weave or a weave-like "compiled history" and push that too/instead, just like in Cohen's model, if you really wanted to do this. And that would come with limitations on the diff used for history-less usage, since the weave has to assume a specific deterministic diff.
> Oddly they don’t seem to have figured out the generation counting trick, which is something I did come up with over twenty years ago. Combining the two ideas is what allows for there to be no reference to commit ids in the history and have the entire algorithm be structural.
Can you say more about this? What exactly is this trick you’re talking about? What are the benefits?
I've read much of the HN discussion on the previous post, a skimmed the rest, but I didn't see a couple of things addressed:
First, how could you make this deal with copies and renames? It seems to me like the pure version of this would require a weave of your whole repository.
Second, how different is this from something like jujutsu? As in, of course it's different, your primary data structure is a weave. But jj keeps all of the old commits around for any change (and prevents git from garbage collecting them by maintaining refs from the op log). So in theory, you could replay the entire history of a file at a particular commit by tracing back through the evolog. That, plus the exact diff algorithm, seems like enough to recreate the weave at any point in time (well, at any commit), and so you could think of this whole thing as a caching layer on top of what jj already provides. I'm not saying you would want to implement it like that, but conceptually I don't see the difference and so there might be useful "in between" implementations to consider.
In fact, you could even specify different diff algorithms for different commits if you really wanted to. Which would be a bit of a mess, because you'd have to store that and a weave would be a function of those diff algorithms and when they were used, but it would at least be possible. (Cohen's system could do this too, at the cost of tracking lots of stuff that currently it doesn't need or want to track.) I'm skeptical that this would be useful except in a very limited sense (eg you could switch diff algorithms and have all new commits use the new one, without needing to rebuild your entire repository). It breaks distributed scenarios -- everyone has to agree on which diff to use for each commit. It's just something that falls out of having the complete history available.
I'm cheating with jj a bit here, since normally you won't be pushing the evolog to remotes so in practice you probably don't have the complete history. In practice, when pushing to a remote you might want to materialize a weave or a weave-like "compiled history" and push that too/instead, just like in Cohen's model, if you really wanted to do this. And that would come with limitations on the diff used for history-less usage, since the weave has to assume a specific deterministic diff.
Discussion on the previous post in this series: https://news.ycombinator.com/item?id=47478401
Someone make a TLA+ model for this bad boy