Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So... anyone wanna run a quick scrape to dump the full list? I'd be interested to see how obscure they get.


this is easy to do in python if anybody wants to try

  #!/usr/bin/python

  import requests
  from bs4 import BeautifulSoup

  for g_id in range(76001):
      base_url = 'https://www.netflix.com/browse/genre/'
      url = base_url + str(g_id)
      headers = {'Cookie': '<nf-cookies-copied-from-browser>'}
      response = requests.get(url, headers=headers)
      soup = BeautifulSoup(response.text, 'lxml')
      genre_els = soup.findAll("span", { "class" : "genreTitle" })
      if len(genre_els) > 0:
          print(str(g_id) + ',' + genre_els[0].text)


seems pretty easy, but on my OSX machine I had to:

- install requests

- install beautifulsoup4

- still had problems with xml parser in python2.7

- python3 wouldn't recognize the requests version installed

- i gave up, looks like I need to take the time to study python web scraping from a book.

- >>> quit("ill have to try this later when I have more time")


No regex? :)

It would be interesting to compare the lists based on countries.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: