Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thoroughly scraping is challenging, especially in an environment where you don’t have (or want) a JavaScript runtime.

For content extraction, I found the approach the Postlight library takes quite neat. It scores individual html nodes based on some heuristics (text length, link density, css classes). It the selects the nodes with the highest score. [1] I ported it to Swift for a personal read later app.

[1] https://github.com/postlight/parser



For getting the HTML, you can use microlink, just passing the URL to https://html.microlink.io/{url}, like https://html.microlink.io/https://example.com


This is pretty cool. Care to share your Swift port?


Not planning to. It’s my first Swift/iOS project. I neither want to polish it nor maintain it publicly. Happy to share it privately, email is in the bio. I’m planning on a blog post describing the general approach though!


Care to share the Swift port?




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: