Law Expression Part 2: The Data Structure

My previous exploration into law expression left much to be desired. While I liked some of my ideas, I was blocked from producing anything fruitful by lack of a suitable data structure.

In my hunt for such a data structure, I stumbled upon JsonLogic - which is a moderately popular format for expressing rules that supports logical operations and variables; this seems pretty much exactly what I am looking for. It even has a couple of open source Blockly editors I might be able to draw on for inspiration if I wanted to build a Blawx-like UI

Experiment

The first thing I tried is passing the following law into meta.ai (I fancied using an open source model for a change) and asking it to express it as JsonLogic

https://www.nysenate.gov/legislation/laws/PEN/175.10

§ 175.10 Falsifying business records in the first degree.
A person is guilty of falsifying business records in the first degree
when he commits the crime of falsifying business records in the second
degree, and when his intent to defraud includes an intent to commit
another crime or to aid or conceal the commission thereof.
Falsifying business records in the first degree is a class E felony.

At first, meta got a bit confused about the JsonLogic syntax, but when I pointed out the mistake, it was able to fix it and yielded the following:

Logic:

{
  "and": [
    {
      "==": [{ "var": "charge" }, "falsifying_business_records_in_the_second_degree"]
    },
    {
      "in": [{ "var": "intent_to_defraud" }, ["commit_another_crime", "aid_or_conceal_commission"]]
    }
  ]
}

Data:

{
  "charge": "falsifying_business_records_in_the_second_degree",
  "intent_to_defraud": "aid_or_conceal_commission"
}

Thoughts

This seemed like a sufficiently fruitful experiment to continue. LLMs are reasonably capable of extracting pure logic from the law, and JsonLogic seems to be a good fit.

Time to prototype a little more seriously!

Prototyping

JsonLogic Spec

Without context on JsonLogic, the LLM made syntactical mistakes. To avoid this, I will want to pass in the JsonLogic syntax as part of the prompt.

To do this, I fed the raw text content from the main JsonLogic pages into a prompt requesting a DSL spec:

You are a principal software engineer and your job is to create a complete DSL specification for JsonLogic.
You are given the following documentation on JsonLogic and your task is to extract all supported operations and provide an example for each.
Ignore information that does not pertain to JsonLogic.
Do NOT document any information that is not outlined in the following documentation:
---
{jsonLogic}

---

And wired up a reviewer to double check the output for me:

You are a principal software engineer reviewing the work of another engineer. You will provide an honest review of their work, outlining mistakes they made, if any. If they made no mistakes, respond "LGTM".

Their job was to create a complete DSL specification for JsonLogic.
They were given the following documentation on JsonLogic and asked to extract all supported operations and provide an example for each.
They were also asked to ignore information that does not pertain to JsonLogic and NOT document any information that is not outlined in the following documentation:
Original Documentation:
---
{jsonLogic}

---

Their DSL specification
{jsonLogicSpec}

Here's a screenshot of the pipeline (built with LangFlow)

pipeline

Having reviewed the output, I was happy with the result. Now to prototype out the law pipeline itself!

Law Pipeline

To let me test that the output is valid JsonLogic, I asked the LLM to provide me with not only the rule, but a few examples of data we could apply to the rule. That way, I can run the rule on the data provided, and if either was not valid JsonLogic, I would know. I also asked for the list of variables. It seems like a good idea to be able to visualise the key components the law depends on.

The prompt template I came up with looks like this:

I am looking to express laws as JsonLogic.

Here is some info on how JSON logic works:
---
{jsonLogic}
---

Please express the following law as JSON logic:
{law}

Provide your response as JSON in the following form:
// The pure JSON logic rule expressed as a JSON object
rule: object
// three examples of data that we could run the JsonLogic rule on
examples: object[]
// a list of variables referenced in the rule
variables: string[]

To test this, I plugged in this law:

§ 175.05 Falsifying business records in the second degree.
A person is guilty of falsifying business records in the second degree when, with intent to defraud, he:
Makes or causes a false entry in the business records of an enterprise; or
Alters, erases, obliterates, deletes, removes or destroys a true entry in the business records of an enterprise; or
Omits to make a true entry in the business records of an enterprise in violation of a duty to do so which he knows to be imposed upon him by law or by the nature of his position; or
Prevents the making of a true entry or causes the omission thereof in the business records of an enterprise.
Falsifying business records in the second degree is a class A misdemeanor.

And got this output.

{
  "rule": {
    "and": [
      { "==": [{ "var": "intent_to_defraud" }, true] },
      {
        "or": [
          { "==": [{ "var": "action" }, "make_false_entry"] },
          { "==": [{ "var": "action" }, "alter_true_entry"] },
          { "==": [{ "var": "action" }, "omit_true_entry"] },
          { "==": [{ "var": "action" }, "prevent_true_entry"] }
        ]
      }
    ]
  },
  "examples": [
    {
      "intent_to_defraud": true,
      "action": "make_false_entry"
    },
    {
      "intent_to_defraud": true,
      "action": "alter_true_entry"
    },
    {
      "intent_to_defraud": false,
      "action": "omit_true_entry"
    }
  ],
  "variables": ["intent_to_defraud", "action"]
}

Checking the examples against the rule using JsonLogic's playground showed that these work!

I also like the way this reads. In my opinion, the LLM (gpt-4o) did a good job of choosing variables and selecting a simple logical structure to represent them. That said, it could have been even simpler by using an in statement for the "action" variable rather than or - like meta.ai did with § 175.10 above.

That said, I think this is fit for purpose and if I want to get serious about simplicity I can try to bake it into the prompt or include a revision step later.

Here's what my prototype pipeline looks like:

Prototype pipeline

Note: I attempted to use the open source deepseek-coder-v2-lite-instruct-mlx model locally using LM Studio, but was a lot less happy with the output. While it produced something resembling JsonLogic, it was a lot more complex and 1 of the 3 examples errored out in the playground. It probably isn't fair to compare a 2.46B params model I can run on my MacBook with OpenAI's flagship model, estimated to have been trained on over 200B parameters.

Analysis

These experiments proved fruitful. LLMs are reasonably capable of extracting pure logic from the law, and JsonLogic seems to be a good fit.

Some things I will need to consider:

Syntax Errors

Without context on JsonLogic, the LLM made syntactical mistakes. Passing the JsonLogic syntax as part of the prompt helped significantly with this. For even more reliability, I might want to run tests on the outputs within the pipeline and if errors arise, pass them back into the LLM.

Complexity

These experiments were conducted on very small and simple laws. These laws also are very conducive to translation into logic. Many laws simply make declarations, so logic isn't as applicable there.

I might run into problems if I don't:

Find ways to identify which laws are conducive to logical expression
Find ways to manage highly complex laws where the LLM may get distracted

Completeness

It's an important distinction that this exercise is not intended to extract 100% of the information from the law. If that were the case, these experiments would have failed completely.

For example, the variable omit_true_entry corresponds to

Omits to make a true entry in the business records of an enterprise in violation of a duty to do so which he knows to be imposed upon him by law or by the nature of his position

A great deal of complexity is embedded in this variable, and this complexity is not represented by the output in any way. I think this is ok. Laws are always going to be open to interpretation; I don't think it's my job, or the LLMs, to interpret every concept in the law and pin it down in code. If anything, it's better if I can extract logic in a pure and uncontroversial way, whilst leaving the debatable parts encapsulated in variables.

The way this is shaping up, my goal is becoming to extract as much of the logic from the law as possible. If, at the end of this, I can generate structures where only the variables need to be debated, and the logic can be agreed upon in advance, I think I'll consider this a success.

Part 3

To see how I built an app around this idea, check out part 3