Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: MechanicalSoup, Python library for automating interaction with websites
180 points by mattme on July 9, 2014 | hide | past | favorite | 27 comments


An alternative to this is RoboBrowse [1] which is also based on requests + BeautifulSoup4 and seems a lot more mature.

[1] http://robobrowser.readthedocs.org/en/latest/readme.html


Note: this fails when you try to install using pip 1.1, the version that comes with Debian wheezy. If you have this problem, it's easy to work around it for your specific virtualenv, without changing the rest of the system:

- mkvirtualenv whatever

- pip install pip --upgrade

- pip install robobrowser


Haha! If I'd known, I would have used that myself.


Still, it proves the idea is a good one!


Mechanize is a core part in quite a few of my projects lately and the fact that it hasn't been modified in over 2 years has been very worrisome.

There are lots of edge cases out there on websites. Mechanize has built up years of fixes and workarounds for these, I hope that MechanicalSoup can learn from these the easy way rather than waiting to make the same mistakes again.

I also hope that this repo grows into a bigger community of support, not just one person contributing (who could leave / get bored at any time). Looking forward to following this!


Er...that's just the Python Mechanize, right? Ruby's Mechanize has been regularly updated and patched, though I can't say I've used it to the extent that I've run into infuriating edge cases: https://github.com/sparklemotion/mechanize


Right, python's mechanize at https://github.com/jjlee/mechanize seems to be pretty stagnant.


I personally find Capybara[0] to be the happy medium for web scraping, if Python isn't a hard requirement. It has a simple API, like MechanicalSoup, but it can also easily be configured to use Selenium, node-webkit, or any other browser you want for full proper JS evaluation.

[0]http://github.com/jnicklas/capybara


Unfortunately there isn't a Capybara library for Python. There are comparable packages like Lettuce and now MechanicalSoup (to a certain extent).


This is fantastic. I've used python mechanize in some very large projects and it was very frustrating - their lack of documentation and, well, the fact that it's complete "abandonware".

I've had mechanize repository cloned for a year now, planning to do something with it - never got around to. Looks like MechanicalSoup just got themselves a new contributor!


Sell me MechanicalSoup over Selenium.


Selenium seems very heavy-weight to me (granted I have only used selenium server). If you don't need to interpret javascript after a page loads then you might be able to use mechanize. In my experience I've gotten better performance and a higher ease of development with mechanize over selenium or other "full" headless browsers. Different tools for different jobs I suppose, I just tend to go for the smallest tool first.

edit: replace mechanize with mechanicalsoup in the above paragraph, they are aiming to solve the same problems in the same way.


> Selenium seems very heavy-weight to me

Fair point. Perhaps try Sulfur?


I started googling, excited about discovering a new library, only to realize what you'd done there -_-


Try "Helium web automation" ;-)


Selenium and Sulfur both have a certain smell to their code, just rotten. I prefer Oxygen.


With Selenium 2, you aren't required to use the selenium-server anymore. You can drive browsers directly from the client lib.


Personally I really like selenium + PhantomJS (headless WebKit browser) since it allows you to do things like automate real user interactions in your tests, then run the test suite on your CI: http://www.realpython.com/blog/python/headless-selenium-test...

Also interesting if you're going down this road is CasperJS which is basically the same thing for JavaScript, and Velocity ( https://github.com/xolvio/velocity ) which is a test runner for Meteor that seems to run all your tests constantly and give you ~real-time feedback for TDD.

Lastly, there are many things to tie in Cucumber-style testing specs with Capybara and these other tools if you're into that.


they're for different use cases.

mechanicalsoup is more like zombie.js ( http://zombie.labnotes.org/ ) than selenium, chromedriver, or some other webdriver ( https://dvcs.w3.org/hg/webdriver/raw-file/default/webdriver-... ) implementation in that it emulates browser functionality from within a runtime, making http requests and parsing the response directly from the python (or node or ruby or whatever) runtime rather than communicating with a browser that, in turn, makes requests to the website.

the advantages of these "emulated" browsers is that tests run faster and are easier to set up. the disadvantage is that they don't fully duplicate browser functionality, particularly for client-side javascript. i think zombie might be able to run some javascript since it's in no, but mechanical soup appears ( https://github.com/hickford/MechanicalSoup/blob/master/mecha... ) not to execute javascript at all.

this is a nice little library that, as the README explains, fills a spot in the python ecosystem that had apparently become somewhat stagnant, but there's really not much to it other than combining Requests with beautifulsoup in order to provide a drop-in replacement for some existing api. i think this would mainly be useful for scraping rather than testing. the emulated browser and custom unittest module that ship with Django are probably better for the latter.


If you don't care what browser is making the requests, use MechanicalSoup. If you do care, use Selenium.


Does it have a javascript engine? Because we had to abandon BeautifulSoup/Mechanize over this a couple years ago and switch to HTMLUnit (Java).


There's not a lot to it so far (a single class, three tests). I wonder if the author has a road map for the project.


How does MechanicalSoup (or RoboBrowse, for that matter, this is the first I've heard of either) compare to Scrapy?: http://scrapy.org/


Have any documentation on a roadmap for things as they go forward? Would love to send some PR's your way :)


This is exactly what I was looking for my next project. I was trying to do this with Node.js to avail, Thanks!


Pretty cool, thanks for sharing!


Would that be an equivalent to Casper.js ?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: