Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Parsing PHP in Go (stephensearles.com)
143 points by codezero on July 27, 2014 | hide | past | favorite | 38 comments


I worked on a PHP compiler: http://phpcompiler.org. Although it only supports PHP 5.2, it has a really nice parser, which had a lot of work put into it (a ton of edge cases in particular).

It's not in Go, but it does a lot of the interesting things you asked about: static analysis, dead code elimination, transpiling. It also compiles to (pretty poor) C code.

For static analysis, there's a lot to do. Here's my PhD on the topic: http://paulbiggar.com/research/#phd-dissertation

I worked on this for about 4 years, and if my experience is indicative of working on PHP compilers is general, you have a lot of fun, and a massive amount of frustration in front of you.


Since scrutinizer-ci took their interesting "PHP-analyser" private I've been looking for a better static analysis tool for PHP that I can contribute to. HPHPc is alright in HHVM but learning OCaml is slowing me down, so I'm definitely going to take a look at your compiler! Well done :)


And here's a PHP parser written in PHP...

https://github.com/nikic/PHP-Parser


Unlike the OP, Nikita's PHP parser is actually used for a lot of things. He wrote a script to detect code broken by syntax changes in the next version of PHP, for example.

Anthony Ferrara used it to implement PHP in PHP: https://github.com/ircmaxell/PHPPHP


nikic and Anthony are some of my favourite PHP developers. The stuff that people have been doing in PHP, what with HHVM and Hack and the PSR standards and Composer/Packagist, etc etc. is just amazing!


^ Wow is all I have to say. Solving PHP's grammatical syntax issues alone is a big thing, I commend efforts like this, it seems to work pretty well too (based on playing around with it for a few minutes).


Now we just need a Go parser written in PHP, for obvious reasons.


what about a Go compiler written in Go? is it bootstraped already?


It's being worked on for Go 1.4 (scheduled to release in December). It's the primary feature they want to get done for that release. There was a talk about in back in May[0]. rcs is working on c2go to convert the existing compiler to Go[1][2].

[0] http://www.confreaks.com/videos/3432-gophercon2014-go-from-c...

[1] https://code.google.com/p/rsc/source/list

[2] https://code.google.com/p/rsc/source/browse/#hg%2Fc2go


Go isn't bootstrapped, but there are parsers for go written in go in the stdlib. http://golang.org/pkg/go/ast/


They're working on it


For those interested:

Ruby in Go: https://github.com/grubby/grubby

Javascript in Go: https://github.com/robertkrimen/otto


Woah, I'm pretty stoked to see someone link to my ruby implementation (neé Grubby).

It seems like the authors of Golang believe that a lot of problems with languages (refactoring, updating code to work with new libraries / versions, etc) can be solved as parsing problems. Hence, Golang has a lot of good tools for parsing text.

They even ported yacc to Go (via Plan9). http://golang.org/cmd/yacc/


Such as? Do you mean the Go standard libraries for Go code?

Yacc clones exit for almost every language out there, and there are better ways to do parsing than yacc with its stone age error reporting.


I'd be really delighted if you could show me a better tool for writing a parser, given a grammar in Golang than goyacc. You're absolutely right that the error reporting in yacc isn't that modern, but it's very functional, very powerful and (best of all), a lot of people have experience with it.

I certainly couldn't find any better tools in Golang when I started, but I wouldn't be surprised if someone had started one since.


Tools like ANTLR, for example.

Parser generators based in attribute grammars is another example.

The language is called Go.


How performant is this or other similar projects (pfff, PHP-Parser)? Are any of them a viable option to use for a base of a improved support for PHP in text editors (say vim, st, atom)?


Parsing PHP in ocaml: https://github.com/facebook/pfff

try ./pfff -dump_php /path/to/file.php


Parsing, lexing and the general task of writing compilers is such a breeze in OCaml. I remember when I was in college, one of our year projects was to write a small compiler for a subset of postscript using ocamllex and ocamlyacc, couldn't believe how nice and natural it felt. What a great language.


Personally I would be a lot more interested in a good platform for writing static analysis tools. I believe the community in general would take a more immediate benefit (and what a benefit!...) from this than from a lone PHP to Go transpiler.


PHP has to be one of the most transpiled languages.

C++ (well Hack->C++), .NET, Java, Python and now Go (probably others).

All in various states of incompleteness, though HHVM isn't going away anytime soon.


Javascript has it's fair share of transpiledness.


Usually JS is the compilation target, not the language being compiled, though.


Are there any numbers on performance vs php itself?


It's not an interpreter - it parses PHP code into an abstract-syntax-tree (list of entities like open-if, variable, assignment, etc).


But with a transpiler, it could become an IL to PHP compilation via Go...?

Not that it would be faster or better than say, HHVM or any other of a number of compilers for PHP [1] but my knowledge of that space is quite limited.

[1] http://stackoverflow.com/questions/1408417/can-you-compile-p...


This could be a step in that process, but in the grand scheme of things required to compile one language to another, the mere front-end parser is not generally all that significant a portion of the effort. The vast majority of the effort would be the bug-for-bug compatible implementation of PHP semantics and base libraries and functionality.

("Bug-for-bug" here does not mean that PHP has a lot of bugs per se. What it is is the highest level of compatibility. An emulator of a game console strives to be "bug-for-bug" compatible, for instance. Programming and programming languages being what they are, anything less often turns out to be surprisingly non-linearly less useful, i.e., "80% compatible" isn't anywhere near "80% useful".)


Go is pretty nice for parser/compiler applications because it is fast and the runtime doesn't take too long to startup.


ML languages are way better, specially given sum types and pattern matching.


I agree for tree manipulation - I don't necessarily agree for writing recursive descent parsers.

But I admit, I read the Ocaml and haskell compilers source code, and it was pretty nice.


Please don't make a PHP transpiler. :(


Why not?

This could be a crucial tool for companies backporting legacy PHP code into a new language.


I agree with you (that it could be an important tool).

However I will say that the VB6->VB.Net transpiler which Microsoft produced (and clearly spent significant amounts of effort on) was pretty terrible. And that is one of the most "complete" transpilers I know of...

The problem is that for a transpiler to produce "good" output code it needs to have a deep understanding of both context but also intent. This is particularly important when converting from one language to another with slightly different underlying concepts (like VB5-6 Vs. VB.Net). Without that understanding it just produces spaghetti code, that will technically compile (*although often it didn't in the VB6->VB.net example) but is unmaintable.

I liken it to Microsoft Word's HTML engine. Word can produce websites, and those websites technically looked correct in most browsers, but they became an unmaintainable mess in the medium to long term. A lot of transpilers have the same issue.

The best thing I can say about transpilers is that they're very good for a starting point (assume 100% refactoring anyway) and converting simplistic data storage vehicles (e.g. classes with tons of constants).


There's also the problem of dealing with the standard library. Not everything has a directly equivalent function. So you'll make your app dependent on an obscure library based on another language's standard library.


More advanced transformers even handle direct transformation of library calls to "native" library calls in the target language. I think it's mostly things that try to take advantage of syntax that is already pretty similar - Processing.js transforming from Java to JavaScript, for example, that decide it would be easier to do a relatively simple syntax transformation and then implement some sort of wrapper for function calls (as you describe) than to do a potentially more in depth and complicated transformation.


The migration tool that came with Visual Studio 2002-2008 is a licensed product from a third party company (a "lite" version actually). The tool was demonstrated (and this fact mentioned) in some Channel9 videos (Microsoft website) about ten years ago.

Word and Frontpage used the same COM code based on trident (IE). Frontpage is dead, the successor Expression Web used a new HTML engine and is dead as well. The second Frontpage successor that still used the trident engine was Sharepoint Designer 2007. Version 2010+ lacks most layout features as the old Frontpage based code generated ugly HTML4. Word 2010 (and probably also 2013) still generates (ugly) HTML4 with inlined VML (graphics, WordArt) based on older trident.

And there is also InfoPath 2003-2013 (dead as of 2014) that is based on a modified Frontpage/trident code. It uses CAB based archive format to store XML and XSD files that resemble the user defined form data. The InfoPath WebForms are generated server side based on the XML stylesheet and XML data. Microsoft is working on a successor to InfoPath merged with other Office products and mobile compatible.


Well, Facebook made one (HipHop) and ended up with 1GB executables.


And to do so, had to refactor all their code to avoid any dynamic code, which is a bit painful. Moving over to a JIT with HHVM was a much better idea.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: