Show HN: MechanicalSoup, Python library for automating interaction with websites

flexd · on July 9, 2014

An alternative to this is RoboBrowse [1] which is also based on requests + BeautifulSoup4 and seems a lot more mature.

[1] http://robobrowser.readthedocs.org/en/latest/readme.html

rahimnathwani · on July 14, 2014

Note: this fails when you try to install using pip 1.1, the version that comes with Debian wheezy. If you have this problem, it's easy to work around it for your specific virtualenv, without changing the rest of the system:

- mkvirtualenv whatever

- pip install pip --upgrade

- pip install robobrowser

mattme · on July 10, 2014

Haha! If I'd known, I would have used that myself.

mattme · on July 12, 2014

Still, it proves the idea is a good one!

wc- · on July 9, 2014

Mechanize is a core part in quite a few of my projects lately and the fact that it hasn't been modified in over 2 years has been very worrisome.

There are lots of edge cases out there on websites. Mechanize has built up years of fixes and workarounds for these, I hope that MechanicalSoup can learn from these the easy way rather than waiting to make the same mistakes again.

I also hope that this repo grows into a bigger community of support, not just one person contributing (who could leave / get bored at any time). Looking forward to following this!

danso · on July 9, 2014

Er...that's just the Python Mechanize, right? Ruby's Mechanize has been regularly updated and patched, though I can't say I've used it to the extent that I've run into infuriating edge cases: https://github.com/sparklemotion/mechanize

wc- · on July 9, 2014

Right, python's mechanize at https://github.com/jjlee/mechanize seems to be pretty stagnant.

lazerwalker · on July 10, 2014

I personally find Capybara[0] to be the happy medium for web scraping, if Python isn't a hard requirement. It has a simple API, like MechanicalSoup, but it can also easily be configured to use Selenium, node-webkit, or any other browser you want for full proper JS evaluation.

[0]http://github.com/jnicklas/capybara

actionscripted · on July 10, 2014

Unfortunately there isn't a Capybara library for Python. There are comparable packages like Lettuce and now MechanicalSoup (to a certain extent).

Deusdies · on July 10, 2014

This is fantastic. I've used python mechanize in some very large projects and it was very frustrating - their lack of documentation and, well, the fact that it's complete "abandonware".

I've had mechanize repository cloned for a year now, planning to do something with it - never got around to. Looks like MechanicalSoup just got themselves a new contributor!

diminoten · on July 9, 2014

Sell me MechanicalSoup over Selenium.

wc- · on July 9, 2014

Selenium seems very heavy-weight to me (granted I have only used selenium server). If you don't need to interpret javascript after a page loads then you might be able to use mechanize. In my experience I've gotten better performance and a higher ease of development with mechanize over selenium or other "full" headless browsers. Different tools for different jobs I suppose, I just tend to go for the smallest tool first.

edit: replace mechanize with mechanicalsoup in the above paragraph, they are aiming to solve the same problems in the same way.

dkhar · on July 10, 2014

> Selenium seems very heavy-weight to me

Fair point. Perhaps try Sulfur?

mhluongo · on July 10, 2014

I started googling, excited about discovering a new library, only to realize what you'd done there -_-

mherrmann · on July 10, 2014

Try "Helium web automation" ;-)

MrMeker · on July 10, 2014

Selenium and Sulfur both have a certain smell to their code, just rotten. I prefer Oxygen.

hugs · on July 10, 2014

With Selenium 2, you aren't required to use the selenium-server anymore. You can drive browsers directly from the client lib.

seanp2k2 · on July 10, 2014

Personally I really like selenium + PhantomJS (headless WebKit browser) since it allows you to do things like automate real user interactions in your tests, then run the test suite on your CI: http://www.realpython.com/blog/python/headless-selenium-test...

Also interesting if you're going down this road is CasperJS which is basically the same thing for JavaScript, and Velocity ( https://github.com/xolvio/velocity ) which is a test runner for Meteor that seems to run all your tests constantly and give you ~real-time feedback for TDD.

Lastly, there are many things to tie in Cucumber-style testing specs with Capybara and these other tools if you're into that.

auvrw · on July 10, 2014

they're for different use cases.

mechanicalsoup is more like zombie.js ( http://zombie.labnotes.org/ ) than selenium, chromedriver, or some other webdriver ( https://dvcs.w3.org/hg/webdriver/raw-file/default/webdriver-... ) implementation in that it emulates browser functionality from within a runtime, making http requests and parsing the response directly from the python (or node or ruby or whatever) runtime rather than communicating with a browser that, in turn, makes requests to the website.

the advantages of these "emulated" browsers is that tests run faster and are easier to set up. the disadvantage is that they don't fully duplicate browser functionality, particularly for client-side javascript. i think zombie might be able to run some javascript since it's in no, but mechanical soup appears ( https://github.com/hickford/MechanicalSoup/blob/master/mecha... ) not to execute javascript at all.

this is a nice little library that, as the README explains, fills a spot in the python ecosystem that had apparently become somewhat stagnant, but there's really not much to it other than combining Requests with beautifulsoup in order to provide a drop-in replacement for some existing api. i think this would mainly be useful for scraping rather than testing. the emulated browser and custom unittest module that ship with Django are probably better for the latter.

smellf · on July 10, 2014

If you don't care what browser is making the requests, use MechanicalSoup. If you do care, use Selenium.

goorpyguy · on July 9, 2014

Does it have a javascript engine? Because we had to abandon BeautifulSoup/Mechanize over this a couple years ago and switch to HTMLUnit (Java).

jdnier · on July 9, 2014

There's not a lot to it so far (a single class, three tests). I wonder if the author has a road map for the project.

webmaven · on July 11, 2014

How does MechanicalSoup (or RoboBrowse, for that matter, this is the first I've heard of either) compare to Scrapy?: http://scrapy.org/

rhgraysonii · on July 9, 2014

Have any documentation on a roadmap for things as they go forward? Would love to send some PR's your way :)

supsep · on July 10, 2014

This is exactly what I was looking for my next project. I was trying to do this with Node.js to avail, Thanks!

jpd750 · on July 10, 2014

Pretty cool, thanks for sharing!

volent · on July 10, 2014

Would that be an equivalent to Casper.js ?