Good article on the high level concepts of a knowledge graph, but some concernin...

CharlieDigital · on Aug 16, 2024

    > While knowledge graphs are useful in many ways, personally I wouldn't use Neo4J to build a knowledge graph as it doesn't really play to any of their strengths.

I'd strongly disagree. The built-in Graph Data Science package has a lot of nice graph algos that are easy to reach for when you need things like community detection.

The ability to "land and expand" efficiently (my term for how I think about KG's in Neo4j) is quite nice with Cypher. Retrieval performance with "land and expand" is, however, highly dependent on your initial processing to build the graph and how well you've teased out the relationships in the dataset.

    > I would rather stab myself with a fork than try to use Cypher to query a concept graph when better standards-based options are available.

Cypher is a variant of the GQL standard that was born from Cypher itself and subsequently the working group of openCypher: https://opencypher.org/

More info:

https://neo4j.com/blog/gql-international-standard/

https://neo4j.com/blog/cypher-gql-world/

enragedcacti · on Aug 16, 2024

> That the author omits the updates for property annotations using RDF* is probably not an accident and glosses over the issues with their proprietary clunky query language.

Not just that, w.r.t. reification they gloss over the fact that neo4j has the opposite problem. Unlike RDF it is unable to cleanly represent multiple values for the same property and requires reification or clunky lists to fix it.

CharlieDigital · on Aug 16, 2024

    > clunky lists

Not sure what the problem is here. The nodes and relationships are represented as JSON so it's fairly easy to work with them. They also come with a pretty extensive set of list functions[0] and operators[1].

Neo4j's UNWIND makes it relatively straightforward to manipulate the lists as well[2].

I'm not super familiar with RDF triplestores, but what's nice about Neo4j is that it's easy enough to use as a generalized database so you can store your knowledge graph right alongside of your entities and use it as the primary/only database.

[0] https://neo4j.com/docs/cypher-manual/current/functions/list/

[1] https://neo4j.com/docs/cypher-manual/current/syntax/operator...

[2] https://neo4j.com/docs/cypher-manual/current/clauses/unwind/...

enragedcacti · on Aug 16, 2024

It has been a while so maybe things have changed, but the main reasons I remember are 1) lists stored as a property must be a homogeneous list of simple builtin datatypes so no mixing of types, custom types, or language tagging like RDF has as first class concepts. 2) indexes on lists are much more limited ( exact match only iirc) so depending on the size of the data and the search parameters it could be a big performance issue. 3) cypher gets cumbersome if you have many multi-valued properties because every clause becomes any(elem in node.foo where <clause>). In Sparql it's just ?node schema:foo <clause>.

I don't think everybody should run away from property graphs for RDF or anything, in terms of the whole package they are probably the right technical call ninety-something percent of the time. I just find Neo4J's fairly consistent mischaracterization annoying and I have a soft spot for how amazingly flexible RDF is, especially with RDF-star.

CharlieDigital · on Aug 17, 2024

What would you recommend as an RDF database to explore?

lolive · on Aug 17, 2024

GraphDB is the one I usually use. It has a web interface that eases the first steps. Virtuoso (especially Virtuoso 7, which is open source) is also an option. [a bit more command line based].

In case you want to have a look a the SPARQL client I maintain, Datao.net, you can go to the website and drop me a mail. [i really need to update the video there as the tool has evolved a lot since that time]

zozbot234 · on Aug 17, 2024

The new kid on the block is very much QLever. Still lacking some features, especially wrt. real time update that make it unsuitable for replacing the Wikidata SPARQL endpoint altogether just yet, but it's clearly getting there.

riku_iki · on Aug 19, 2024

> The new kid

that kid is 7 years old already, and in my understanding currently has only one active contributor. But idea of the project is very strong.

bawolff · on Aug 17, 2024

If you just want to try some queries, there is a public sparql wikidata endpoint at https://query.wikidata.org . If you press on the file folder icon there are example queries, which let you get a feel for the query language.

khaki54 · on Aug 18, 2024

Marklogic is the best triple store

whakim · on Aug 17, 2024

While I'm all for standards-based options, I think the fetishization does a disservice to anyone dipping their toes into graph databases for the first time. For someone with no prior experience, Cypher is everywhere and implements a ton of common graph algorithms which are huge pain points. AuraDB provides an enterprise-level fully-managed offering which is table stakes for, say, relational databases. Obviously the author has a bias, but one of the overarching philosophical differences between Neo4J and a Triple Store solution is that the former is more flexible; that plays out in their downplaying of ontologies (which are important for keeping data manageable but are also hard to decide and iterate on).

9dev · on Aug 17, 2024

I can attest to that, or at least to the inverse situation. We have a giant data pile that would fit well onto a knowledge graph, and we have a lot of potential use cases for graph queries. But whenever I try to get started, I end up with a bunch of different technologies that seem so foreign to everything else we’re using, it’s really tough to get into. I can’t seem to wrap my head around SPARQL, Gremlin/TinkerPop has lots of documentation that never quite answers my questions, and the whole Neo4J ecosystem seems mostly a sales funnel for their paid offerings.

Do you by chance have any recommendations?

whakim · on Aug 19, 2024

I think neo4j is a perfectly good starting point. Yeah, I feel like they definitely push their enterprise offering pretty hard, but having a fully managed offering is totally worth it IMO.

alexchantavy · on Aug 16, 2024

I enjoy cypher, it's like you draw ASCII art to describe the path you want to match on and it gives you what you want. I was under the impression that with things like openCypher that cypher was becoming (if not was already) the main standard for interacting with a graph database (but I could be out of date). What are the better standards-based options you're referring to?

westurner · on Aug 16, 2024

W3C SPARQL, SPARUL is now SPARQL Update 1.1, SPARQL-star, GQL

GraphQL is a JSON HTTP API schema (2015): https://en.wikipedia.org/wiki/GraphQL

GQL (2024): https://en.wikipedia.org/wiki/Graph_Query_Language

W3C RDF-star and SPARQL-star (2023 editors' draft): https://w3c.github.io/rdf-star/cg-spec/editors_draft.html

SPARQL/Update implementations: https://en.wikipedia.org/wiki/SPARUL#SPARQL/Update_implement...

/? graphql sparql [ cypher gremlin ] site:github.com inurl:awesome https://www.google.com/search?q=graphql+sparql++site%253Agit...

But then data validation everywhere; so for language-portable JSON-LD RDF validation there are many implementations of JSON Schema for fixed-shape JSON-LD messages, there's W3C SHACL Shapes and Constraints Language, and json-ld-schema is (JSON Schema + SHACL)

/? hnlog SHACL, inference, reasoning; https://news.ycombinator.com/item?id=38526588 https://westurner.github.io/hnlog/#comment-38526588

throwaway48540 · on Aug 16, 2024

SparQL, rdf triples.

wey-gu · on Aug 17, 2024

ISO-GQL

andersonvaz · on Aug 16, 2024

Do you mind in mentioning some of the options available that you consider better than Cypher?

jsemrau · on Aug 17, 2024

>better standards-based options are available.

Which ones would you recommend?