Rules engines come up frequently in my domain area. What are some good resources...

closeparen · on May 20, 2019

Be deliberate and skeptical about how and why your rule definition language is a better fit for the problem space than a general purpose programming language. When you find yourself recreating a general purpose programming language, stop. Just drop down to one. Or start with one. A very successful rules engine at my employer is Python minus features, as opposed to the typical "config plus features until it becomes a shitty Python."

Realize that what you are doing is a programming language, and create as much of the infrastructure for programming as you can for rule authors (version control, code review, unit testing, incremental deploy, etc).

zaphirplane · on May 20, 2019

What about security and safety. A user provided rule in a DSL has controller access to the rest of the environment. While a user provided script in a language will have a lot of security and safety issues, even if you trust the user there is security in depth and safety of limiting the accidental damage

gdy · on May 21, 2019

Not sure about Python, but it is very easy to embed Lua in an app in way that executed scripts have access only to what is deliberately exposed to them.

adrianN · on May 20, 2019

Sandboxing the code is a solved problem, is it not? There are a number of websites that run code for you somehow.

FabHK · on May 20, 2019

It's a surprisingly tricky problem, btw, at least for some languages. Here's a nice 2014 talk by Jessica McKellar: Building and breaking a Python sandbox that gives insight into some pitfalls. Might be "solved" by now though, don't know.

https://www.youtube.com/watch?v=sL_syMmRkoU

zaphirplane · on May 21, 2019

Running stand alone and throw away code in a container, is very different from running a user provided script within your long lived application securely. Think credentials, Db access, file system access, network

But you want to access the DB and write to files and the network just not anywhere, so you have different process and communicate via rpc

jlj · on May 20, 2019

I prefer to use SQL because it handles the set definition very cleanly. It is basically the "where" clause of a query. And it's a good middle ground where programmers and business analysts can speak the same language to describe a dataset.

Once the set is defined, the programming language of choice can be used for the action. It could also be SQL, or since we use Spark for a lot of our compute it could be Scala, Python, or Java.

I've been in recent discussions about building a new DSL for this, but I haven't been convinced why we need a new DSL when SQL is already a widely supported DSL.

hinkley · on May 20, 2019

Bertrand Meyer suggested you split the code for making a decision and the code for acting on it into separate methods. Happens to work pretty well for writing unit tests, too.

If I don’t edit your function, it’s harder for me to screw it up. You can segregate code without pulling it out into an interpreter.

chriswarbo · on May 20, 2019

That sounds very much like functional programming styles, where decisions are made by "pure functions" and actions are taken by "interpreters" or "effect handlers". The "ports and adapters" and "functional core, imperative shell" approaches are similar.

jlj · on May 20, 2019

Thanks for the lead, will read up on this.

In my SQL biased mind I think of the decision as a dataset defined by some filters and joins. When new data meets the criteria, it triggers an action for that set.

p2t2p · on May 20, 2019

Can I ask a different question?

Very often rule engines are introduced with assumption that "advanced user" with understanding of the business side of the issue would provide (create/modify/remove) set of rules for the system.

More often than not it happens so only devs are able to modify those rules.

So the question is: why is it easier for you to write business logic in some kind of DSL instead of in the actual programming language?

thrower123 · on May 20, 2019

I have never seen this work out as intended - at best it results in developers having to code up things in a limited way through a leaky abstraction. I've never seen users be actually able to use such systems.

The best that can be said is that it sometimes gives the ability to do some customization or extension without having to do a full deployment, but it also locks you into an ever-expanding surface area of code that needs to be supported and increases the chances of bugs from unforeseen interactions.

mdpopescu · on May 20, 2019

Constraints. I prefer not having the freedom to do anything I want in some cases. Pretty much the same rationale as the one behind removing the "goto" instruction.

samcodes · on May 20, 2019

One time we put these rules in a graph database. It worked better than hard-coding and a rules engine.

Jupe · on May 20, 2019

Ha, funny you should ask... I've started a project about a month ago that will apply various "rules" be applied to unique resource instances. (Sorry, I'm being explicitly vague as there are IP concerns here.)

Here are some approaches that I find myself gravitating towards with this project (but still in design/early dev, and much more to address before this part):

1. Tag-based attributes for resources (as in "meta-data"). Tag combinations can be leveraged to apply specific "rules". But the interpretation of those tags won't be a jump-table or vector-table. If I can't see the domain I'm debugging, neither will the devs who come after me!

2. Taxonomies suck for this kind of thing. Ontologies (what I see as Taxonomies with links) may be required in extreme cases, but just make sure that the full path is always "visible", not just a pointer to node a in the ontology graph. The project I did work on for some time was usurped by another that grabbed my attention (I was re-assigned). Too bad, I didn't get to follow this through. I don't think this is needed for my current project, but time will tell. (I hope not, as it's "elegant" but not at all "simple")

3. State tracking automata may be possible as well. Each request to execute rules stores the path through the rule set directly within the request. Easy debugging and visibility. I've not implemented this type of "rule" system before, but I'd like to give it a shot some day. Of course, writing rule paths is time consuming (even if in-memory) and may be overkill.

4. One (simple) example of a rules engine is a command-line tool's argument parser. Does anyone store this data in a jump-table? I don't think so, (maybe GIT or AWS-CLI, but I don't know as I've not looked yet). Most solutions have a parse pre-step and then a current-state result, usually followed by a "switchboard" type rule applier. But, that state is usually always available; I think that simplicity.

In short, for a given set of "attributes" which drive a "policy", I'd prefer to see the code rolled-out and explicit rather than deferring to some dance between a set of "rule tables" and a set of classes that get invoked in "special" ways.

Of course, I know nothing of your domain; so please take this all with a grain of salt. I've not worked with more than several dozens of "rule sets" before; you may be facing 1000's (and it sounds like speed is critical for you as well). Your rules may be transient, time sensitive and possibly even self-modifying (ugh!). If so, I'd make certain that the path through the rule-sets (if truly required) are well documented and verbose in their logging.