Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Rockets – Reddit and websockets (github.com/rtheunissen)
98 points by rtheunissen on Aug 1, 2015 | hide | past | favorite | 28 comments


The animated logo immediately caught my attention. Kudos to Ken, the illustrator [1]. Great portfolio.

[1] http://cargocollective.com/kensamonte/


Yeah, Ken is very talented. I'll extend your appreciation.


Interesting. It relies on a central service that's polling Reddit, though. Why not a distributed, peer-to-peer system instead? By combining the power of hundreds of individual nodes polling the site, you could have fast visibility of new content without being too obvious to the site being monitored.


I made a proof of concept like this some years ago.

It was only synchronising the scores, and was was running as a Chrome extension.

The idea is that each user sees the scores updated in realtime when he display a Reddit page, but he also broadcasts the updated scores back to the server every time he loads a page.

That way no polling is involved, and it doesn't generate any additional load on Reddit servers since the updated data comes from people who were loading tbose pages anyway. This means im not bound to any API restriction (it's basically distributed scrapping).

https://github.com/MasterScrat/LiveReddit


That's a very cool idea. An unintended side-effect of rockets could be that it reduces the load on reddit's server. My theory is that there are N bots that rely on monitoring new content, and they're all polling for the same data. If those bots just hook into something like this, that's N less requests per second sent to reddit.com.


There's of course a side effect of data being untrusted. While an individual couldn't game this if the system would depend on multiple sources, an organized group (like 4chan) could likely make the system show whatever they want.


How might they achieve that?


I was referring to MasterScrat's idea of a browser extension. In that case, it's only a matter of saturating the service with crafted data, something that 4chan excels at doing.


It's not against the rules of reddit, and the allowed request rate is enough to grab everything even during busy periods.

You're right that it relies on a central service that's polling reddit. I've done all I can to make it as stable and scalable as possible. I've been running, monitoring and testing it for a week and it's been keeping up just fine.


Given that many people have bots doing the exact same thing, I don't think you'd have any trouble. Hell, total load on reddit would decrease because people would use your service. I don't know why reddit didn't implement something like this themselves.


Given the small amount of people that would use it (relative to the entire userbase), I don't think it would ever be on their roadmap. It would take many developer hours and constant maintenance, which wouldn't be worth it for them as a company. They have more important things to build and fix.


I'm not OP, but then you'd have to solve for consensus. Solvable, but annoying.


Achieving consensus using browsers as nodes is unpractical for many reasons. Firstly, there are hard limits on localStorage, then there is the issue that nodes go offline really often (whenever someone closes their browser/tab) and therefore it's highly volatile - You would have to replicate data across nodes like crazy to reduce the odds of data loss (the bandwidth use would be massive with loads of duplicate data all over the place). Plus you still need a server to do the signaling to coordinate consensus across all the nodes. Also, there are security implications when storing data inside other peoples' browsers (though this isn't relevant in this specific case).

There has been some hype around using WebRTC for storing data only on the client-side using consensus algorithms like RAFT, but given the constraints which most browsers have and the massive amount of complexity required to get this working, it's really not worthwhile.

I don't think these browser constraints will be lifted anytime soon - There are some very good reasons for having them in the first place - For one, most people don't want their computers' CPU, memory and their internet bandwidth to be used up to process other people's stuff.


That seems to work really well, thanks for your work. I might use it.

One thing I'm struggling with is the data model, of what exactly is returned on an event. I see that right that rockets is sending data like the API would've returned them, and that is why they are not defined in the code?

I'm new to Reddits Api and was wondering how to get a comments url. Just realizing that I should just echo the full object and find keys that way. I'm leaving the comment, maybe you want to add some documentation/links to the proper Api documentation in that direction.


> maybe you want to add some documentation/links to the proper Api documentation

That's an excellent idea, added in http://git.io/vOOKi.

> how to get a comments url

A comment permalink is a strange combination of the comment itself and its parent post. You can build a comment permalink using the following template:

  reddit.com/r/{subreddit}/comments/{link_id[3:]}/_/{id}


Thanks a lot! I discovered link_url and that would have worked for the moment, but the real comment permalink would have taken a lot of time to find out.

The link to the json-wiki page is perfect as well, that is exactly what I searched and did not find. Thanks again.


Brilliant! Do you poll Reddit for new messages?


Yup, every second. There's roughly ~30 models created per second on average.


This is absolutely amazing. I'll definitely be using it extensively.

I'm also incredibly curious about how it's implemented though. As far as I know, reddit doesn't have an API point for all new comments, do they?


Thanks! Let me know if you have any problems or questions. Re: the implementation, take a look at the task that fetches new comments: http://git.io/vOGT3.

reddit's primary 'new comments' endpoint let's you fetch at most the 100 newest comments. But you have to account for times when the previously 'newest' model is more than 100 comments in the past, ie. you have a gap because there were too many comments or another request took longer than expected.

The endpoints themselves are simple, but the task to make sure you don't miss any or fall behind is tricky.


I wanted this for a script I was thinking of writing, nice!

Unfortunately, I'm a total convenience victim these days, and any node-based service is very unlikely to enter my stack. Python is okay, Go is best. I just love only deploying a single binary. I think it speaks volumes that I like writing Python but like deploying Go.


You can still make full use of this. All you need is basic websocket support. The node client [1] is small, so wouldn't take long to port to Python or Go.

[1] http://git.io/vOOZo


This is why I like Docker. I don't need to think about installing the whole stack on the server.


Websockets? That means the Reddit bot will be listening from a webpage? Is that how Reddit bots operate? I'm confused, I thought Reddit bots would be a normal computer process, listening/polling for content and making HTTP calls to the API, nothing from a web browser. Someone please explain it to me!


Not necessarily from a webpage.

Web browsers support [1] what's called the WebSocket API [2], which allows browsers to create socket connections to a socket server.

But many programming languages also provide a library to create socket connections on the server. This is what most reddit bots would be doing, as demonstrated in the rockets-client [3], which uses a Node.js websocket library called 'ws'.

So it's incredibly flexible. You could wrap a UI around it in the browser, or use it server-side with any mainstream programming language that supports websockets.

[1] http://caniuse.com/#feat=websockets [2] http://www.websocket.org/aboutwebsocket.html [3] https://github.com/rtheunissen/rockets-client


Thank you very much.


So what's Reddit's take on something like this?


The company? I don't think they mind, as long as you're following their API rules [1], which I am.

The community? Surprisingly disinterested. Not sure if I did something wrong. I thought they'd love it, but only got 2 points in r/programming and 7 points in r/node. It's not about the points but it's still discouraging.

Maybe I timed it badly, maybe I didn't bait lazy leaders well enough, maybe it's actually not something cool, maybe the ranking algorithm just wasn't in my favour.

¯\_(ツ)_/¯

[1] https://github.com/reddit/reddit/wiki/API




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: