Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The "inside-out" framing resonates with me. I have been building embeddable scripts that get dropped into third-party sites via a script tag, and the architectural decisions you are making here mirror a lot of the same trade-offs I have encountered.

The biggest challenge with any in-page tool is the tension between needing deep DOM access and maintaining isolation. For the agent UI itself, you almost certainly want iframe isolation -- CSS conflicts with the host page are a constant headache otherwise. But for the actual DOM interaction (reading page state, simulating events), you need to be in the host page context. This dual architecture (iframe for your UI, direct access for page interaction) adds complexity but is worth it for reliability across diverse sites.

One thing I would flag as a real production concern: Content Security Policy. A significant number of enterprise and SaaS sites set strict CSP headers that will block inline scripts, eval, and sometimes even dynamically created script elements. If your target audience includes embedding this in production apps, you will hit CSP issues quickly. The bookmarklet approach cleverly sidesteps this for demos, but for a proper integration the host app needs to explicitly whitelist your script origin.

The HTML dehydration approach you described in the comments (parsing live HTML, stripping to semantic essentials, indexing interactive elements) is smart. In my experience, the fidelity of that serialization step is where most of the edge cases live. Shadow DOM, canvas elements, dynamically loaded content, iframes-within-iframes -- each one needs special handling and you end up building a progressively more complex serializer over time. Keeping that layer thin and well-tested is probably the highest-leverage investment for long-term maintainability.



Really appreciate the in-depth feedback.

Iframe and CSP are big problems. For the in-page version, I chose to leave out Shadow DOM, canvas, and iframes. Although I know one of the developers forked a version to control same-origin iframes. I don't think it's practical to try to hack around browser security (and website security) — that's why I built the browser extension. I'm hoping the bridge that lets a page call the extension can cover most use cases.

My original HTML dehydration script was ported from `browser-use`. You're absolutely right that it's getting heavier over time, and it's the key factor influencing the overall task success rate. I'm looking to refactor that part and add an extension system for developers to patch their own sites. Hope it turns out well.

Thank you for the feedback. I'll be extra cautious to keep the dehydration code maintainable.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: