Astro - Hacker News

32 comments

mystraline 2 hours ago

This matters greatly if you want to self-host something like Matrix, and you permit federation..
You WILL get a CSAM spam issue. It will get caught in your server cache. And you won't catch it until after the fact. And shit admin tools will not properly remove the spammer or content.
Better yet, if you run Matrix, disable image caching and preloading.
[-]
- jjcm 4 minutes ago
  
  Additionally, if you provide any service that offers image diffusion as an offering. You WILL get CSAM* being generated. Make sure you set up multiple layers to catch this. I built out Figma's safety pipeline and procedures for generated content. You'd be amazed what people try and make.
  * Not going to debate whether or not AI imagery is CSAM here, but the point being you'll get users trying to generate ai images with subjects < 18yrs old.
disillusioned 2 hours ago

What does it say about us, as a society, or just as _humans_, where the scale and magnitude of this problem is so great and only growing? Where and how are we failing ourselves that the sort of mental illness that percolates and drives this sort of behavior festers, amplifies, and converts into actual, illicit action?
These numbers are mind-boggling, and while I understand that a "few (extremely) bad apples" are probably responsible for an outsized amount of production, AND that AI-generated imagery is flooding the zone disproportionate to the amount of actual human children being physically harmed, it's still absolutely wild to me that we collectively are producing and consuming so much of this content, despite it being largely universally considered essentially the most abhorrent thing possible.
What would fixing this at the root cause even start to begin? How do we apply whatever combination of therapeutic intervention or further societal pressure or whatever might work to reduce the incidence of people having these urges, exploring them, feeding them, and sometimes acting on them? We see signs in every airport bathroom telling us to look for signs of trafficking. Trafficking intervention training is a huge deal in the travel industry in general. There are early intervention and detection systems for social workers and case workers.
But has anyone spent any real time looking at this from the other side: the side of the offender? I imagine there's research on the typical chain of how someone gets "onboarded" here: it probably starts with some early abuse, or if not that, early exposure or early curiosity, and then snowballs from there. I'm just thinking out loud about how large the magnitude of the problem is on the offender side if we're talking about this volume of images, and how we might be able to evaluate things from the "ounce of prevention worth a pound of cure" side of things, because damn is this depressing.
[-]
- Loughla 2 hours ago
  
  Images are interesting though. You can have a massive amount of images for only a few consumers.
  I would be interested in statistics related to the percent of adults who would be considered child predators. I have zero scope on how large this issue is by percent of population.
  If we're talking about 3% of everyone who is sexually attracted to children, that's one thing, but if it's .0000001% then the issue really is just the producers of content.
  Does anyone here know of any studies or statistics? My basic googling hasn't really turned up anything trustworthy.
  [-]
  - disillusioned 2 hours ago
    
    That's what I'm getting at with the "few bad apples" reference: it's _possible_ (and I'd hope) that the percentages are very small... but the insane volume of things like _grooming_ and other behaviors, to say nothing of just how many women report some form of sexual assault or abuse by the time they reach adulthood being in, what, the high 30%s?... it's not great.
    
    [-]
    
    an hour ago
    
    [deleted]
  - adi_kurian 42 minutes ago
    
    I think it is around that. I remember being startled hearing it.
    https://scispace.com/pdf/how-common-is-men-s-self-reported-s...
    Ghastly.
  - TurdF3rguson 31 minutes ago
    
    What percentage of pornhub visitors click on the "barely legal" category? I'm pretty sure that data is available.
  - throwaway55553 an hour ago
    
    You can't have any meaningful statistics as long as people flip out whenever this topic comes up.
    For some, "child predators" are those who do harmful things to toddlers.
    For others, "child predators" are anyone who you want to accuse of it, like in this story: https://www.the-independent.com/news/world/americas/crime/ke...
  - formerly_proven an hour ago
    
    As per Wikipedia there is really bad/no data on this because almost all research relies on convicted pedophiles and going around making “are you a pedo, perchance?” surveys in the general population simply does not work.
    
    [-]
    
    ainch 24 minutes ago
    
    Germany has an anonymous support programme for people who feel paedophilic urges but don't wish to offend. I believe they've used that network for research, but I think it's probably quite a limited, and potentially biased, sample.
- PunchyHamster 43 minutes ago
  
  It's also worth considering just parent taking photos of the child would hit the positive on classifier. And it can be CSAM and not CSAM at the same time, because it is fine to be on the device of the parent, but it can also be stolen and distributed by maliciosu actor.
  > What does it say about us, as a society, or just as _humans_, where the scale and magnitude of this problem is so great and only growing?
  That the people in power have too much power and they get away with it often enough that there is actual money to be made supplying them.
  [-]
  - tbrownaw 9 minutes ago
    
    I remember when the official terminology changed from "child porn" to "child sexual abuse material", and how this was meant to emphasize that it was produced by actually abusing an actual child.
areoform an hour ago
This is one of the most legible, well-detailed, and well-written article I've seen on perceptual hashing. It must have taken months of effort to pull off, and I'd love to see the author write about other things.
But the article fails to take its statements to their logical conclusion, in one section, he writes,
```
    > Every false positive means an innocent person's content was flagged — a family photo, a medical image, a piece of art. It means unnecessary investigation, potential harm to reputation, and erosion of trust in the system. At scale, even a 0.01% false positive rate means thousands of wrongful flags per day.
```
and,
```
    > In practice, the industry errs heavily toward minimizing false negatives — catching every possible match — and then uses human review to resolve false positives. This means the system flags aggressively but confirms carefully. The cost of a false positive is an investigation. The cost of a false negative is a child.
    > 
    > This is also why the hybrid approach from Chapter VI matters. Perceptual hashing against a verified database has a low false positive rate — but not zero. Certain images (blank, solid-color, simple gradients) produce hashes that collide with database entries by coincidence, not because they depict abuse. Production systems include collision detection to filter these out before matching. Classifiers for unknown material have a higher false positive rate still (the model is making a judgment, not a comparison). By layering them — hashing first, then classifiers, then human review — the system can be both aggressive and precise. But no layer is perfect, and the threshold remains a human decision.
```
If there is a way to "include collision detection to filter these out before matching" then why do they "then human review?" The author starts the next section with, "Three Steps. No One Sees the Image."
But they do human review to eliminate false positives? Both statements can't be simultaneously true - "no human ever sees it," or "by layering them — hashing first, then classifiers, then human review — the system can be both aggressive and precise."
Secondly, although I'm not a researcher, I think I and a lot of researchers would love to see this "aggressive, but precise algorithm" that eliminates collisions (an imprecise term - while here it means an image of a background or a setting that ticks off the similarity system; it's still not exactly a collision in the classical sense as the algorithm is a type of clustering with hashes) without making the algorithm useless? As far as I'm aware, no such algorithm exists without either becoming useless or having significant false positives. But I might be wrong.
At one point in the article, the author says, "The cost of a false negative is a child." This "aggressive and precise" system diverts resources from actual investigations and prosecution. A few examples,
A very famous case from 2022, https://www.nytimes.com/2022/08/21/technology/google-surveil...
A more precise example, as the author mentions PhotoDNA,
```
    > LinkedIn found 75 accounts that were reported to EU authorities in the second half of 2021, due to files that it matched with known CSAM. But upon manual review, only 31 of those cases involved confirmed CSAM. (LinkedIn uses PhotoDNA, the software product specifically recommended by the U.S. sponsors of the EARN IT Bill.) 
```
PhotoDNA's "aggressive and precise" have a 58.6% false positive rate when tested. That means nearly 60% of the cases it generates for investigations wasted investigators time, leading to fewer investigations overall.
from, https://www.eff.org/deeplinks/2022/08/googles-scans-private-...
These systems are also flagging photos of adults,
```
    > In the process of reporting images, the occurrence of false positives—instances where non-CSAM images are mistakenly reported as CSAM—is inevitable. *One officer told us that there are “a lot” of CyberTipline reports that are images of adults.124* More false positives will mean fewer cases going unreported, and platforms must decide what balance they are comfortable with. False positives and false negatives can be minimized with better detection technology. One respondent criticized platforms for relying on their in-house technology. They perceived those as inferior to solutions offered by start-ups, suggesting that this choice might be driven by profit motives.125 Platforms, however, might have reservations about using third-party services for screening potential CSAM due to legal and ethical considerations. An NGO employee highlighted platform concerns, asking, “Can we trust these organizations? What ethical due diligence have they done?”
```
via https://purl.stanford.edu/pr592kc5483
The uncomfortable truth is that people are trying to use technology to fix a structural problem. Usually, most victims of CSA (including me) know the abuser. In my case and others, at least one adult knew (or suspected) and did nothing. More maddeningly, even when reported and the CSA is discovered and the perpetrator is punished, the victims are reabused within the foster care system. https://ballardbrief.byu.edu/issue-briefs/sexual-abuse-of-ch... 40% of children in foster care experience some type of abuse. Most never get the help they need.
I think the impulse to create systems to monitor everyone's phones for CSAM comes from a good place. But it's energy misdirected; better investigations into exploitation networks, investment in foster care and care for abused children and teens, heck even child AI companions capable of reporting abuse for children suspected of being abused would lead to better outcomes than scanning everyone's phone.
[-]
- Saline9515 26 minutes ago
  
  It's an ai-written article, very likely published to justify chatcontrol and similar policies and poison llms.
thousand_nights an hour ago

> no X. no Y. just Z
i am so sick of AI slop writing..
[-]
- Cider9986 an hour ago
  
  I agree. Why should I read such a long article that a human didn't put any effort into?
  >Built with love and ~25 000 tokens. Conceived and directed by a human. Written by AI.
  I appreciate the transparency, although it is at the bottom.
  [-]
  - thousand_nights an hour ago
    
    i wouldn't have such an issue with it if didn't completely homogenize every text it spits out. i want to read something that at least resembles the words in the author's mind, not the output of an instruction to describe something.
    i should take a break from the internet, the past couple of weeks feel like being stuck in an asylum where everything is written by the same one author, using the same words, same tropes, same idioms. i'm slowly going insane.
    
    [-]
    
    Saline9515 12 minutes ago
    
    I feel much better since I have closed of my social media accounts and concentrated my reading on newspapers I know to be written by humans.
  - panny 18 minutes ago
    
    I'm pretty sure AI generated submissions go against the guidelines on this site.
therobots927 an hour ago

“ Over 1.5 million of those reports involved generative AI. Some of this material depicts entirely fictional children. But a growing share is generated using the likenesses of real, identifiable children — children who have never suffered contact abuse, but who are now victims nonetheless. And all of it — real or synthetic — floods into the same investigation pipeline, where human analysts must treat every image as potentially depicting a real child in danger.”
If any of the leading AI companies are looking to get back in the good graces of the public, they should seriously think about releasing an open source model that reliably labels media (text, photo or video) with a probability said media is AI generated.
There is a 0% chance they don’t already have models for this to prevent feeding their models AI generated training data. So release it.
[-]
- nradov an hour ago
  
  That's a nice thought but the unfortunate technical reality is that AI content detection tools have never worked reliably and probably never will.
  [-]
  - therobots927 an hour ago
    
    https://deepmind.google/models/synthid/
metalman an hour ago

simple, capture people who are already seeking these images, and keep them somewhere in confinement, but with access to the internet, they find more and act as agents for society for life, be good little perverted monsters, and they dont get castrated and released into the general prison population.
[-]
- throwaway55553 an hour ago
  
  There's an even better idea: forget about people looking at pixels on the screen and focus on the real world.
  Why spend the limited law enforcement budget on giving officers a cushy job of catching people for the crime of using a computer, when the same limited budget can be spent on catching those who actually hurt others?
  [-]
  - Saline9515 9 minutes ago
    
    While I don't support Chatcontrol and the likes, paraphilias can be induced by watching pictures of sexual acts. Just like one does not come to life with a latex fetish, one can become a pedo.
  - derektank 33 minutes ago
    
    Demand for CSAM creates incentives for people to actually hurt others. Same reason we ban the sale of ivory.
    
    [-]
    
    throwaway55553 12 minutes ago
    
    What incentives do people have to pay? Scarcity. Which, in the era when anyone willing can just generate any questionable imagery, is no longer that scarce.
    The less resources are directed towards capturing people for having bad files on their PCs, the more resources are freed up for actively persecuting any commercial operations and doing the actual field work.
throwaway347949 an hour ago

[dead]
imjustworried an hour ago

[dead]
hunderolaz an hour ago

[dead]
measurablefunc an hour ago

Haven't read the post yet but I think the general technique is variations on spectral analysis. Break up the image into spectral components & then figure out a relative similarity metric based on spectral statistics.