Good article on the high level concepts of a knowledge graph, but some concerning mischaracterizations of core functions of ontologies supporting the class schema and continued disparaging of competing standards-based (RDF triple-store) solutions. That the author omits the updates for property annotations using RDF* is probably not an accident and glosses over the issues with their proprietary clunky query language.
While knowledge graphs are useful in many ways, personally I wouldn't use Neo4J to build a knowledge graph as it doesn't really play to any of their strengths.
Also, I would rather stab myself with a fork than try to use Cypher to query a concept graph when better standards-based options are available.
> While knowledge graphs are useful in many ways, personally I wouldn't use Neo4J to build a knowledge graph as it doesn't really play to any of their strengths.
I'd strongly disagree. The built-in Graph Data Science package has a lot of nice graph algos that are easy to reach for when you need things like community detection.
The ability to "land and expand" efficiently (my term for how I think about KG's in Neo4j) is quite nice with Cypher. Retrieval performance with "land and expand" is, however, highly dependent on your initial processing to build the graph and how well you've teased out the relationships in the dataset.
> I would rather stab myself with a fork than try to use Cypher to query a concept graph when better standards-based options are available.
Cypher is a variant of the GQL standard that was born from Cypher itself and subsequently the working group of openCypher: https://opencypher.org/
> That the author omits the updates for property annotations using RDF* is probably not an accident and glosses over the issues with their proprietary clunky query language.
Not just that, w.r.t. reification they gloss over the fact that neo4j has the opposite problem. Unlike RDF it is unable to cleanly represent multiple values for the same property and requires reification or clunky lists to fix it.
Not sure what the problem is here. The nodes and relationships are represented as JSON so it's fairly easy to work with them. They also come with a pretty extensive set of list functions[0] and operators[1].
Neo4j's UNWIND makes it relatively straightforward to manipulate the lists as well[2].
I'm not super familiar with RDF triplestores, but what's nice about Neo4j is that it's easy enough to use as a generalized database so you can store your knowledge graph right alongside of your entities and use it as the primary/only database.
It has been a while so maybe things have changed, but the main reasons I remember are 1) lists stored as a property must be a homogeneous list of simple builtin datatypes so no mixing of types, custom types, or language tagging like RDF has as first class concepts. 2) indexes on lists are much more limited ( exact match only iirc) so depending on the size of the data and the search parameters it could be a big performance issue. 3) cypher gets cumbersome if you have many multi-valued properties because every clause becomes any(elem in node.foo where <clause>). In Sparql it's just ?node schema:foo <clause>.
I don't think everybody should run away from property graphs for RDF or anything, in terms of the whole package they are probably the right technical call ninety-something percent of the time. I just find Neo4J's fairly consistent mischaracterization annoying and I have a soft spot for how amazingly flexible RDF is, especially with RDF-star.
GraphDB is the one I usually use. It has a web interface that eases the first steps.
Virtuoso (especially Virtuoso 7, which is open source) is also an option. [a bit more command line based].
In case you want to have a look a the SPARQL client I maintain, Datao.net, you can go to the website and drop me a mail. [i really need to update the video there as the tool has evolved a lot since that time]
The new kid on the block is very much QLever. Still lacking some features, especially wrt. real time update that make it unsuitable for replacing the Wikidata SPARQL endpoint altogether just yet, but it's clearly getting there.
If you just want to try some queries, there is a public sparql wikidata endpoint at https://query.wikidata.org . If you press on the file folder icon there are example queries, which let you get a feel for the query language.
While I'm all for standards-based options, I think the fetishization does a disservice to anyone dipping their toes into graph databases for the first time. For someone with no prior experience, Cypher is everywhere and implements a ton of common graph algorithms which are huge pain points. AuraDB provides an enterprise-level fully-managed offering which is table stakes for, say, relational databases. Obviously the author has a bias, but one of the overarching philosophical differences between Neo4J and a Triple Store solution is that the former is more flexible; that plays out in their downplaying of ontologies (which are important for keeping data manageable but are also hard to decide and iterate on).
I can attest to that, or at least to the inverse situation. We have a giant data pile that would fit well onto a knowledge graph, and we have a lot of potential use cases for graph queries. But whenever I try to get started, I end up with a bunch of different technologies that seem so foreign to everything else we’re using, it’s really tough to get into. I can’t seem to wrap my head around SPARQL, Gremlin/TinkerPop has lots of documentation that never quite answers my questions, and the whole Neo4J ecosystem seems mostly a sales funnel for their paid offerings.
I think neo4j is a perfectly good starting point. Yeah, I feel like they definitely push their enterprise offering pretty hard, but having a fully managed offering is totally worth it IMO.
I enjoy cypher, it's like you draw ASCII art to describe the path you want to match on and it gives you what you want. I was under the impression that with things like openCypher that cypher was becoming (if not was already) the main standard for interacting with a graph database (but I could be out of date). What are the better standards-based options you're referring to?
But then data validation everywhere; so for language-portable JSON-LD RDF validation there are many implementations of JSON Schema for fixed-shape JSON-LD messages, there's W3C SHACL Shapes and Constraints Language, and json-ld-schema is (JSON Schema + SHACL)
While knowledge graphs are useful in many ways, personally I wouldn't use Neo4J to build a knowledge graph as it doesn't really play to any of their strengths.
Also, I would rather stab myself with a fork than try to use Cypher to query a concept graph when better standards-based options are available.