Astro - Hacker News

10 comments

ontouchstart 13 hours ago

How do you prevent AI agent to scrape your data the same way you scrape HF?
For example, I can “cache” your page as a shared link in this comment
https://www.openpaperdigest.com/paper/paperdebugger-a-plugin...
Or in a gist somewhere:
https://gist.github.com/ontouchstart/38d80cab66794014d17e193...
Then I can have a bot to scrape these pages with context as training data.
This can be out of hands for you in inference cost. Then you need VC money to sustain your website. Wish you the best luck to get there.
[-]
- ontouchstart 13 hours ago
  
  The reason I said that is that I already have a POC to use LLM to go to a gist and do something with the date in it.
  https://gist.github.com/ontouchstart/03f4c7ee853061772b479d9...
brihati 3 days ago

Really interesting project, it nudges you toward learning instead of mindless feeds.
One suggestion: could you add tags to the research papers so readers can more easily filter by their interests? For example, I’m looking to follow recent work from top venues like NeurIPS specifically on training code-oriented LLMs. Tagging would make it much easier to dive into topics like that.
Thanks for your effort
[-]
- davailan 3 days ago
  
  Thanks, good suggestion!
4 days ago

[deleted]
pentaphobe 4 days ago

This is really neat!
If there was one of these for non-AI papers I'd easily lose hours each day
Totally off topic, but come to think of it, I'd love to see more feeds support anti-bubbling (show me _less_ of what I've frequently consumed)
[-]
- davailan 4 days ago
  
  Thanks!
  What topics are you interested in?
  Two different directions I'm thinking of for Open Paper Digest:
  - either some recommendation algorithm that figures out which topics you are interested in and serves you papers based on that. Would need a good way to get signals though. That's why I'm now bootstrapping the process with Huggingface Trending Papers, but that immediately constrains the topics.
  - or more search driven, where you type "I'd like to read about X" and it starts your feed
  With regards to anti-bubbling: interesting thought, a "reverse" recommendation algorithm...
  [-]
  - stym06 3 days ago
    
    you could just rank the papers, and show trending ones as a separate tab.
    for filters, create a set of pre-defined tags and let the LLM choose one of your pre-defined tags from the paper's summary.
  - pentaphobe 4 days ago
    
    > what topics are you interested in?
    That's just it - any list I give would probably miss the mark. I guess it all ties back to computational thinking in some way? (physics, neuroscience, rendering algorithms, medicine, linguistics, category theory)
    Perhaps if recommendation algorithms could be that generalised it would scratch most of the desire for a good anti-bubble..
    But still misses that special sauce of discovering papers/topics I didn't know I was interested in.
    Libraries and stumbling into random university lectures did this very well (or newsagents, video shops, etc..) -- broadening rather than narrowing
    LLMs / vector space seem well placed to automate this kind of expansive/lateral matching -- but it does seem we (or marketers) tend to build recommenders around the assumption that individuals' interests are a singularity to zero in on.. (and so likely train our models for same)
    Anyway - end rant - thanks again, really cool project! Clearly got me inspired :)
Micoloth 4 days ago

Suuper cool! I agree with the other commenter that having more scientific fields would be very interesting as well. Maybe you could filter them by topic