Nice to see Datalog being validated by a big name, though I don't see what's modern about Logica in particular, or why one should use it over plain Datalog (as syntactical Prolog subset) when the available backends are restricted to SQL rewriters or locked-in to BigQuery. Will have to look into the model for aggregate queries which I guess is a selling point for Logica (as is modularization/composition with optimization), and a weak point since typically neither in Datalog (the decidable logic fragment) nor portable.
Edit: also I find the title a bit grandiose since this isn't about Logic Programming in general, but only database querying
> ...or why one should use it over plain Datalog...
I have been looking for examples of how to use a practical implementation of Datalog for years and the closest I've come to is actually miniKanren instead. Could you point me to codebases that productively use Datalog internally?
I feel your parent post was probably looking for software that successfully uses datalog to fetch/query its data, not one that provides data through datalog. User, not provider.
I'm well aware of datomic family databases, but it's the part about solving interesting problems with them that interests me, not that someone has implemented another one ;)
What example with miniKanren have you found? This is an area I have a passing interest but never find the time to delve deep enough to find anything shiny enough
The most interesting open source example I have found is a hy-lang minikanren that implements a tree-rewriter which lets you encode equivalent implementations of code snippets. It's a code-linter that will automatically simplify code you write via higher-level rules written in mk
I've heard informally there are many companies relying on clojure/core.logic to rewrite complicated business rules for the sake of filtering and constraint solving problems in applications, but I do not know of any open source examples to reference.
edit: i accidentally linked to a dependency of the project i meant to link to. originally linked to the mk implementation:
https://github.com/algernon/adderall
Yeah but that reduces Logica outside Google to SQL rewriting. When the weak point of SQL isn't so much the syntax but the scalability and expressiveness limitations for big data and document data (fixation on strong ACID/CAP guarantees, schemas). SQL syntax has strong points, too; one being that it's not just a query but also update/batch language with ACID semantics; another being that it's standardized with a range of mature options available.
Consider also the practical side: using Datalog as merely "prettier SQL" still doesn't allow you to dynamically define data properties or go schema-less as in RDF or other logic/deductive graph databases. Whenever you want a new column, you must execute DDLs (ALTER TABLE ADD COLUMN) also leading to forced commits, overly broad permissions, chaotic backup procedures and/or code artefacts containing the dreaded SELECT * syntax. Also, parsing Datalog queries, reformulating into SQL, then re-parsing SQL in the DB engine isn't the most efficient thing.
Basically, the workflows and use cases for SQL RDBMSs and Datalog/graph databases are not the same, and if you're using one on top of the other, you're getting the intersection of possibilities but the union of problems, as is well known from O/R mappers );
>> Whenever you want a new column, you must execute DDLs (ALTER TABLE ADD COLUMN) also leading to forced commits, overly broad permissions, chaotic backup procedures and/or code artefacts containing the dreaded SELECT * syntax.
I don't understand what you mean here. With datalog, if you have a predicate person(Name, Age, Height) and you want to add an argument (a "column") for income, you can simply create a new predicate person(Name, Age, Height, Income).
Or, if you want to avoid duplication, you can write a rule to combine the information in two (or more) predicates:
You don't need to remove the old predicate. That's actually one case where Datalog works better than SQL, that only allows "rows" i.e. "facts" (in Datalog parlance) but not "rules" that establish relations _between tables_.
That's true for Datalog, based on what I know about Prolog (not a Datalog expert!). I don't know how it works in Logica, but from reading the article above I think the semantics would be similar.
>> Basically, the workflows and use cases for SQL RDBMSs and Datalog/graph databases are not the same, and if you're using one on top of the other, you're getting the intersection of possibilities but the union of problems, as is well known from O/R mappers );
That's funny. But I don't think it applies here. SQL and datalog are both relational. The difference is that Datalog lets you define relations over tables ("rules"), not just relations over data ("facts"/"rows"). Essentially, SQL is one half of datalog's relational semantics - only information without reasoning. Datalog adds reasoning on top, but the reasoning is still, well, relational (facts, rules and queries are all relations). There's no impedence mismatch here, as in trying to fit relational data into a non-relational program.
I'd still be very concerned about putting something as complicated as this on top of SQL, because despite what it says on the tin, SQL is not a declarative language. Any serious database-using application will still be chugging through synonymous queries to get to the one that works right, manually annotating the databases with the indexes it needs, figuring out when to manifest tables or views or any number of other optimizations the engine can't or won't do on its own, and doing a lot of other operations that are hard enough when using SQL directly, but will be made even more so by trying to operate through such an opinionated interface. SQL is already handicapped by trying to be declarative but in practice being a language where a lot of nominally-equivalent things will result in different queries under the hood.
I loooooooooove the ideas being expressed in the post. I am firmly in the camp that SQL needs a deep rethink because of its many and manifold software engineering flaws, and this is exactly the sort of thing I'm thinking of, not just a slight gloss on SQL, but a complete rethink. I'm just not sure this is going to be practical sitting on top of SQL. Make this a native query language for Postgres or something and we'd be talking. One step at a time, though. I'm very positive on this step being taken.
At this point, extracting the industry from its path dependence history [1] of SQL is a Google-sized problem. The engineering itself isn't necessarily a Google-sized problem but the rest of it is.
[1]: https://en.wikipedia.org/wiki/Path_dependence - that is, if databases were all separately evolving over the years and only this year were they all going to get together and produce a standard to unify themselves, it would not look like SQL. It would quite likely look a lot more like this. SQL has too many glaring flaws, not least of which is its total composability fail.
From what I understand Logica compiles to SQL so it can run on BigQuery. I don't
think that's putting it "on top of SQL".
I think it's a bit confusing that datalog is always discussed in the context of
databases and as a "query language" etc. In fact it's a subset of Prolog, so it
really belongs to the subject of logic programming. It doesn't help that Prolog
programs themselves are implemented as databases and that Prolog programming
uses terms such as "query" that blur the waters about exactly what one is doing.
I confess I don't have a background in databases and so I only understand the
very basics about SQL's semantics, which is the Relational Calculus, but as far
as I understand it, RC is a subset of predicate logic (a.k.a. first-order
logic). Prolog is itself a different subset of predicate logic, Horn clause
logic; and Datalog is a subset of Prolog and equivalent to SQL in expressive power.
Very briefly, every expression in Prolog is a Horn clause. A clause is a
disjunction of literals. A literal is an atom, or the negation of an atom. An
atom is an atomic formula, a predicate symbol followed by a number of terms in
parentheses where the number is the "arity" of the predicate. Terms are variables, functions or constants.
For example, father(john, bob) is an atom of the predicate father/2, where
"father" is the symbol and "2" is the arity.
An example of a clause is grandfather(x,y) ∨ ¬father(x,z) ∨ ¬parent(z,y). This
is a disjunction of one positive literal, grandfather(x,y) and two negative
literals, ¬father(x,z) and ¬parent(z,y). By the rules of logical connectives,
the same disjunction can be written as an implication: father(x,z) ∧ parent(z,y)
→ grandfather(x,y). By Prolog convention also observed in Datalog, implications
are written with the positive literal first: grandfather(x,y)← father(x,z),
parent(z,y). The left-facing implication arrrow is rendered as ":-" in ASCII
friendly manner, conjuctions are represented by the comma, ",", and variables
are represented by upper-case letters, yielding the standard Prolog -and
Datalog- notation:
grandfather(X,Y):- father(X,Z), parent(Z,Y).
The above clause is a Horn clause. A clause is Horn when it has at most one
positive literal. A Horn clause is definite when it has exactly one positive
literal. Horn clauses with 0 positive literals are called "goals", Horn clauses
with exactly one positive and 0 negative literals are often called "unit
clauses" and Horn clauses with one positive and any number of negative literals
are usually called "definite clauses" (confusingly). A definite clause is datalog if
it has no functions of arity more than 0 (constants are functions with arity 0) as
arguments to a literal. For example, in the following, [1] is Datalog, [2] is not
(but is Prolog):
s(0). % [1]
s(N):- s(s(N)). % [2]
Where s(N) is a function (possible to determine syntactically because it's an
argument to a lieral). In Prolog parlance, definite clauses are also called
"rules", unit clauses are also called "facts" and goal clauses are also called
"queries".
Now, s(0) is a Prolog and Datalog fact and is just as fine a SQL table, called
"s" and with a single row with one value, "0". Here's a fuller example:
That's a Prolog and Datalog program with two "facts" and a "rule". The following
are two queries and their results:
?- father(X,Y).
X = bob, Y = john ;
X = john, Y = alex.
?- grandfather(X,Y).
X = bob, Y = alex ;
false.
Each query starts with "?-" at the command-line and ends with a "." as all
Prolog clauses. Below the query are its results: the instantiations of the
variables X and Y in the query that make the query true. The ";" means there may
be further results. And "false" means there are no more results.
Now, I leave it as an exercise to the reader (you) to figure out how the above
works out with SQL. Keep in mind that father/2 has a clean translation to a SQL
table named "father" with two columns, for example named "father" and "child".
The "rule" for grandfather/2 is probably best represented as a join.
In any case, as you can probably see, we have here a very different language
than SQL, but with semantics that can be seen as, in a sense, being equivalent
to the semantics of SQL. Except, where SQL makes a distinction between "data"
and "queries over data", Datalog only has facts, rules and queries, that are all
Horn clauses and that are all part of the "program database".
So it's not a complicated machinery on top of SQL at all. The only thing I'm
concerned is of the naturaleness of SQL queries generated by the "compiler"
(some kind of transducer, probably). On the other hand, I reckon SQL is only
meant to work as a kind of "relational assembly" and will not have to be seen by
any human eyes except in rare cases. Or that's hopefully the plan.
Edit: note there are many, Many, MANY variants of Datalog with confusingly subtly different semantics. See the book I recommended in my comment to juki, below. Personally, I get lost in the variations pretty quickly...
"I think it's a bit confusing that datalog is always discussed in the context of databases and as a "query language" etc."
It is a reflection of the way that SQL is so ensconced in the developer's gestalt as "The Way To Query Data", such that Querying Data means SQL and SQL means Querying Data, that most people are not capable of coming at something like a logic-based database layer as a first-order element on its own, but can only conceive of it as an SQL layer.
Further observe how there's a category of databases called "NoSQL"... when you have a category of something defined by its not being in some other category, that really shows just how large that category looms in the developer's mindset. NoSQL is slowly cracking the SQL consensus, but it's a long and slow process. You'll know it has really made it when we give a positive name to that category, or perhaps more likely, 3 or 4 names to the several types of databases within. "Document store" is getting close to being the name of one of the styles.
My particular reason for speaking this way though is that this specific technology manifests that way. You'll note I called for this to be made a native query layer, because I'm personally pretty much over SQL and ready for the next thing to come out. I'm tired of it being the 1970s again every time I speak to a database. Unfortunately, it's such a task that it has killed everyone who has tried it so far. AIUI FoundationDB got the closest from anyone I've seen. I'm not sure how they're doing; a cursory web search suggests perhaps they aren't as dead as I thought.
>> My particular reason for speaking this way though is that this specific technology manifests that way. You'll note I called for this to be made a native query layer, because I'm personally pretty much over SQL and ready for the next thing to come out.
I agree with that, although since I don't write any SQL anymore, I don't really mind it, as such. But I think the reason Logica is compiled to SQL must be the pervasive association of "database" with "SQL" that you point out. Datalog in fact has its own execution model that doesn't really need SQL. Perhaps the people behind Logica felt that it would be easier for it to be adopted if it piggy-backed on SQL, the same way that so many languages target the JVM etc. I too think that's a little disappointing. But I'm heavily invested in logic programming so I'm glad to see _some kind_ of logic programming language at least created at Google (no idea how much it's used though).
> That's actually one case where Datalog works better than SQL, that only allows "rows" i.e. "facts" (in Datalog parlance) but not "rules" that establish relations _between tables_.
What is the difference between Datalog rules and SQL views?
It's been a while since I used SQL and I'm a bit rusty in it, but views would probably be the equivalent of Datalog rules, yes. The difference, as in my other comment to OP, is that Datalog rules are part of the Datalog program, which also stores the actual "tables" i.e. the facts. Whreas in SQL, views are only sort of ... virtual? Like I say I'm a bit rusty- but from my understanding, SQL vies don't live in the same space as tables.
Funny thing. It used to be my day to day work was 80% SQL. Nowadays it's 99% Prolog maybe with a little bit of bash and powershell scripting (gotta automate those experiments!). I kiiind of miss SQL? But not quite. Personally I don't 100% get the grumbling about SQL's syntax. It's unintuitive and it works very hard to hide the actual semantics behind it, but, eh, at least it has clean semantics.
I recently found this free book on databases that goes over both SQL and Datalog. It's a bit thick with obtuse terminology but it actually goes in depth over many useful topics:
I use Prolog for my research. I study Inductive Logic Programming (ILP) for my
PhD. ILP is a field in the intersection of machine learning and logic
programming, that studies approaches to learning logic programs from examples,
background knowledge and language bias (it helps to think of background
knowledge as a library of sub-routines from which a program is to be composed
and to think of language bias as constraints on the structure of learned
programs).
Obviously Prolog is well-suited for this task, but there's a reason why you
don't often hear of "Inductive Python Programming" or "Inductive Java
Programming", say. The reason is that imperative languages tend to have lots of
specialised syntax, for example for class declarations, loops, variable
assignment etc. Whereas Prolog syntax consists entirely of one kind of
expression, the Horn clause. So for instance, to learn a program with a "loop"
in Prolog you "only" need to add a recursive clause to the program, where a
recursive clause is simply an ordinary Horn clause with the same predicate
symbol in a head literal and one or more body literals. To learn a program with
a loop in Python you have to add the loop to the program as a specialised
structure with its own peculiar syntax.
Also, because in Prolog everything is a Horn clause, examples, background
knowledge and language bias can be (and often are) represented as Prolog
programs themselves, so it's possible to learn new background knowledge, new
language bias and even new examples. That'd be tricky to do in Python where
examples, say, would be not programs, but the inputs of and outputs to programs.
The sister field to ILP, of Inductive Functional Programming exploits the
homoiconicity of functional languages in similar ways.
Finally, Prolog is a language with a deductive inference algorithm as an
interpreter and it turns out deduction can be sort of inverted into induction.
Which is to say, we can go from reasoning to learning, with but a tiny little
hop. Well, ish.
If you're interested in more details about my work, there's links in my profile.
Although I'm not really familiar enough to comment on much of this, I will point out that ORMs are still very popular and valuable despite their problems. If this is anything like ORMs, I would expect it to be very useful to many people despite theoretical problems that tend to be fairly manageable in practice.
My hope would be mainly that this can get datalog into mainstream use, and soon get more (and more mature) libraries created by the community. That is in itself very exciting to me though.
Would be pretty awesome if we could have logica (or something similar) for dataframes (including pandas), and so could build pipelines of transformations-via-queries on those.
(If there is anything like this already implemented, I'm all ears!).
Edit: also I find the title a bit grandiose since this isn't about Logic Programming in general, but only database querying