Kythe seems like an awesome project, and kudos to Google for releasing this in the open.
For those interested in code analysis and dev tools, another library you might want to check out is srclib (I'm one of the authors). srclib is an open-source polyglot code analysis library designed for editors and code explorers. Its mission, supporting a common language-independent schema to support building better language-aware tools, is closely aligned with Kythe's. There's documentation and a succinct description of the problem we're trying to solve at https://srclib.org.
srclib currently supports Go, Java, Python, JavaScript, Ruby, Haskell, and soon PHP. There's a simple command line API that editor plugins can call, and currently there are srclib plugins for Emacs, Sublime, and Atom. srclib also powers https://sourcegraph.com.
I'm looking forward to seeing where Kythe goes and hopefully integrating Kythe and srclib. I think this is a huge step forward toward better tools for programmers. Just ask anyone who works/used to work at Google about the quality of their internal dev tools vs. the outside world. Thanks to the Kythe team for sharing this with the world!
My usecase is that I have c# and java code that call each other but I need some kind of checker that validates "yes, the c# method signature is still compatible with the java method signature". It sounds like Kythe is the right tool for that, what is your take on that? Does srclib support something like this too?
It's not as easy as it sounds because the c# or java method signature may have attributes which alter the compatability in custom ways, so I cant just compare "yes all 3 params are compatible and the name is correct". I'd have to plug in some custom logic that takes into account what the attribute does.
What do you think?
It's hard for me to say without more details about your specific problem. srclib does provide a Data field where information like method signatures is typically emitted, so you can compare the signature for the C# method with that of the Java one, but I'm not sure if that includes the other attributes you need to know for your problem.
If there's a lot of custom logic, then it might be better to write an ad hoc tool that checks the AST of the Java against the AST of the C#. srclib and I think also Kythe are designed for building tools that want to be language-agnostic, rather than digging into specific language behavior.
I think an interesting possible application of this tool would be source-to-source compilation between languages. For example, once Objective-C support is added, could Kythe be the basis for something like j2objc?
What does this do? I've browsed through the site for a few minutes, and still have no idea what kind of tools you could build with this that you couldn't build before.
Is this for cross-language doc generation? Refactoring tools? Something else?
Are there any concrete examples of a tool built on top of this that would otherwise be impossible / very difficult?
You can build the same tools as before -- the purpose of Kythe isn't to fundamentally change the kinds of tools you can make, it's intended to make it easier to glue those tools together.
Google uses this approach internally to generate cross-references for a huge, heterogeneous multi-language codebase. Linking across generated code, connecting documentation to its references, and exposing all those features in editors, code browsers, code-review tools, and so forth, are all a lot easier when that information has a common representation.
And of course, those problems exist even in much smaller codebases. Kythe isn't really a "product", but rather an interlanguage for tools that manipulate source code.
Kythe isn't really a "product", but rather an interlanguage for tools that manipulate source code.
That's a concise explanation. Thanks.
Of course, the bottleneck is always in achieving widespread integration with existing tooling. Your overview lists requirements for compiler and build system instrumentation alike, as well as tools that then consume and filter the graph data. It'll be interesting to see if Kythe gains the needed mindshare for this.
You're right that the work needed to connect (say) a compiler or static analyzer to (say) an editor is usually substantial.
Right now, projects typically duplicate this work over and over again, for each combination of language and editor. We've found that for a lot of common cases you can re-use the work you did to instrument a given compiler and/or editor for others to mix and match, if they can agree on a format for the data.
Obviously this doesn't work for every such problem, but in our experience it's surprisingly effective for most of the day-to-day tasks engineers need to solve, such as figuring out what will break if I commit this change to the repo.
A similar effort by Facebook open sourced 4 years ago:
(I'm one of the author) http://github.com/facebook/pfff/wiki/Main
with indexers for PHP, C, Java, Ocaml, and preliminary support for many other languages.
Seems like most code editors these days have reached that microsoft excel point where most of the requested features are already present and it is a matter of usability and better ways to help users learn these inherently complicated tools. I'm constantly surprised at how many really bright people aren't using their debugging and profiling tools effectively.
The big features I'd like to see are more around collaboration and remote execution. The ability to share, search, remotely debug a big stack easily would be great. Github has taken some big steps forward on that but I'd love wrap that up into the editor. Use cases like natively connecting to a coworker's editor and see what is failing or review some code.
It's not very clear at all what the vision is and how this is supposed to be used. I can make guesses, but some clarity would be great if any of the Google people involved are in this thread.
For those interested in code analysis and dev tools, another library you might want to check out is srclib (I'm one of the authors). srclib is an open-source polyglot code analysis library designed for editors and code explorers. Its mission, supporting a common language-independent schema to support building better language-aware tools, is closely aligned with Kythe's. There's documentation and a succinct description of the problem we're trying to solve at https://srclib.org.
srclib currently supports Go, Java, Python, JavaScript, Ruby, Haskell, and soon PHP. There's a simple command line API that editor plugins can call, and currently there are srclib plugins for Emacs, Sublime, and Atom. srclib also powers https://sourcegraph.com.
I'm looking forward to seeing where Kythe goes and hopefully integrating Kythe and srclib. I think this is a huge step forward toward better tools for programmers. Just ask anyone who works/used to work at Google about the quality of their internal dev tools vs. the outside world. Thanks to the Kythe team for sharing this with the world!