Built this last week as part of the interview process for a job. I know it's fla...

Cyph0n · on Feb 19, 2017

Excellent idea. There are a lot of interesting ways to improve this, but you have an MVP running, which is a good start.

Regarding your codebase: clear and to-the-point code, well commented, and helpful commit messages. Including a `requirements.txt` is a plus.

Good job, keep it up!

sAbakumoff · on Feb 19, 2017

Well, I believe that any code should be readable enough so that the comments like below wouldn't be required.

# splits a Wikipedia section into sentences # and then chunks/tokenizes each sentence

If I had interviewed the author, I would have asked him what's the purpose of commenting like that.

aldarn · on Feb 19, 2017

I believe it's easier to read a native language than code. I would also counter there is no harm to comments like this so just because you don't find it useful doesn't mean someone else won't.

sAbakumoff · on Feb 19, 2017

Okay. My point is in that in the real job you don't have time for writing this type of comments. Instead you have your current task to work on, the issue that was re-opened and needs to be revisited, the bug to argue with QA about, the deadline to discuss with PM, the code-review to do ASAP. You simply don't have time to write the perfect code that is full of the comments in the "native language".

yosamino · on Feb 19, 2017

This rather sounds like you don't have the time to not do it.

Imagine using the time that spent on "re"-visiting "re"-opened bugs that are vague enough to be argued about on writing code that doesn't need these "re"s in the first place.

I contend that that might be a difficult place to get to especially because it's a team effort as well, but I feel it's more productive and less stressful to work like that.

sAbakumoff · on Feb 19, 2017

I am too long in IT to imagine anything like that.

sethammons · on Feb 19, 2017

That particular comment seems OK. In general, one should make the code readable to the point of not needing a comment. Comments can rot over time in legacy systems. Somethings really benefit from a comment, however I feel the following snippet is a better example of a comment that should not exist as it does not serve to clarify the code and is just a direct English translation of the simple code:

        # Iterate through article's sections

        for section in self.page.sections:

sAbakumoff · on Feb 19, 2017

Yeah that's clearly redundant

cancancan · on Feb 19, 2017

One of the purposes is to write down what you need to do as comments, and then implement each part. Like pseudo code. Saying you don't have time for it is like saying you don't have time to think through what you're trying to do.

sAbakumoff · on Feb 19, 2017

It's entirely possible to think through what you're trying to do without writing the code comments. I prefer good old-fashioned pen and paper for example.

cancancan · on Feb 19, 2017

Of course, but you don't share those papers with us :) I like those kinds of comments a lot. In a good editor it's so easy to just scan a lot of code and understand what's what. You could look at something you or someone else wrote years ago, in different style, different paradigm, different language, ... and still have a perfectly clear picture of what code does in no time.

To each his own I guess.

tedmiston · on Feb 19, 2017

The project is cool and I know the code is not necessarily the point. That said, if I were being picky, I'd ask why the author chose not to use docstrings. The code itself is fine but not very Pythonic. There are small inconsistencies* that running pycodestyle [1] once would have caught and could be fixed quickly — I recommend OP consider that.

*Mostly related to: whitespace / spacing, indentation, mixing single quotes and double quotes, magic numbers, naming conventions

[1]: https://pypi.python.org/pypi/pycodestyle

blazespin · on Feb 19, 2017

Super super awesome, what a brilliant idea. You might want to do pattern matching such that the answer to the question doesn't match the text of the question. Your example image shows the immediate flaw there.

alex_g · on Feb 19, 2017

Thanks, and good point. It's a very simple approach so there a numerous weaknesses that will be improved upon with a bit more knowledge of NLP.

jdormit · on Feb 19, 2017

Did you get the job?

alex_g · on Feb 19, 2017

As far as I know they're still reviewing it.

bradgessler · on Feb 19, 2017

Good luck! If it doesn't work out for any reason let me know if you're interested in Poll Everywhere.

echelon · on Feb 19, 2017

They should have given you an offer... you clearly delivered.

I bet you're going to get some offers from making this post on HN. Make sure you have your contact info in your profile. :)

jdormit · on Feb 19, 2017

Good luck!

ViktorasM · on Feb 19, 2017

Neat and simple implementation. Consider docstrings for describing methods, this tends to integrate with IDEs a lot better than comments.

raverbashing · on Feb 19, 2017

It's a nice idea

One potential improvement is to remove the common parts of answer and question (as in your Triumph example)

lappet · on Feb 19, 2017

Very cool. Can you please add more info or talk about how the grammar/parsing is set up?

alex_g · on Feb 19, 2017

Sure! Take a look at: https://github.com/alexgreene/WikiQuiz/blob/master/python/Ar...

I used nltk (natural language toolkit), which takes care of most of the hard work. It tokenizes whatever text you pass it, and even assigns each word a part-of-speech (noun, adjective, etc).

The grammar is where I tinkered the most. You can see I have 3 grammar rules set up (NUMBER, LOCATION, PROPER). nltk will go through the tokenized words and see if any sequences of words match any of the rules.If it finds a match it groups/chunks those words together into a phrase with the tag you've specified (ie. LOCATION).

As for the rules themselves, they're very easy to write once you understand the syntax. For example, let's look at my PROPER rule, {<NNP|NNPS><NNP|NNPS>+}

Everything in the {} is the rule. The tags inside of the <> are the parts-of-speech assigned by nltk. Translating the rule literally would be: match any sequence that has: [an NNP or an NNPS] followed by one or more of [an NNP or an NNPS]. In other words, any sequence of two or more NNP or NNPS words.

lappet · on Feb 19, 2017

Thanks a lot for the explanation. NNP would be Noun-Noun-Phrase and NNPS would be Noun-Noun-Phrase-Sentence I believe? I will play around with the syntax more.

tradersam · on Feb 19, 2017

Good luck mate, this is a really cool little project.

krashidov · on Feb 19, 2017

Awesome work! How long did this take you?