> Yet, this shift made me re-evaluate the open source code publishing. Prior to that, I have been positive about free and open software, and considered this to be the default mode for work such as kefir. I did not require any justifications from myself to publish something. Now, however, I feel more and more that the main beneficiaries of my unpaid work are companies scraping the internet to train large language models. Currently accepted status quo in this area goes against my own intentions in licensing this work under GNU GPLv3. Publication has ceased to be the "null hypothesis" for me, and requires explicit mental justification which I am not able to provide.
I feel this pain, one of my small donation driven sites has been destroyed by crawlers who just ignore robots.txt and burn the site into the ground.
Sort of jokingly I proposed an update to the "spam fax" law:
Seems to me LLMs have changed some things. I'm not sure how it's best put, but it used to be:
- Seeing code (or a blogpost or whatever) was a result from effort where thought had gone into it. The writer paid effort so the reader didn't have to.
- There'd be some level of attachment to what you've put effort into.
With LLMs, that's undermined: it's easy to produce thoughtless imitations. Code or comments where thought didn't go into it. So, seeing some result isn't an indication of skill, but also not even an indication thought went into it.
I guess there's still something lost if someone isn't going to share code they've put thought into. -- But on the other hand, if it's just for me & I don't have to share it with a wider audience, getting LLMs to write out code isn't so expensive.. so code itself isn't necessarily something to value so much.
> Yet, this shift made me re-evaluate the open source code publishing. Prior to that, I have been positive about free and open software, and considered this to be the default mode for work such as kefir. I did not require any justifications from myself to publish something. Now, however, I feel more and more that the main beneficiaries of my unpaid work are companies scraping the internet to train large language models. Currently accepted status quo in this area goes against my own intentions in licensing this work under GNU GPLv3. Publication has ceased to be the "null hypothesis" for me, and requires explicit mental justification which I am not able to provide.
I feel this pain, one of my small donation driven sites has been destroyed by crawlers who just ignore robots.txt and burn the site into the ground.
Sort of jokingly I proposed an update to the "spam fax" law:
https://www.karlbunch.com/random/website-protection-act/
> The sender pays, not the receiver.
You have a hole here. Your web server is sending the response and the bot is receiving.
Fix that and … profit? :-)
Seems to me LLMs have changed some things. I'm not sure how it's best put, but it used to be:
- Seeing code (or a blogpost or whatever) was a result from effort where thought had gone into it. The writer paid effort so the reader didn't have to.
- There'd be some level of attachment to what you've put effort into.
With LLMs, that's undermined: it's easy to produce thoughtless imitations. Code or comments where thought didn't go into it. So, seeing some result isn't an indication of skill, but also not even an indication thought went into it.
I guess there's still something lost if someone isn't going to share code they've put thought into. -- But on the other hand, if it's just for me & I don't have to share it with a wider audience, getting LLMs to write out code isn't so expensive.. so code itself isn't necessarily something to value so much.
I put my site behind a username/password wall, to block LLM bots.
same, not worth getting 100GB of content getting scrapped every other day.
It was nice hearing about it. If this is a healthy direction for the project, then so be it. At least source to previous versions is still available.