Bug #356: Missing leading "|" on "INDEX" row in downloaded rule from rulemaker v4 - Output 2: Rule Authoring Software - The Project Management Worksite of Xalgorithms Alliance

Actions

Copy link

Bug #356

open

Missing leading "|" on "INDEX" row in downloaded rule from rulemaker v4

Added by Charles Langlois 5 months ago. Updated 5 months ago.

Status:

In Progress

Priority:

High

Assignee:

Charles Langlois

Start date:

02/08/2026

Due date:

% Done:

Estimated time:

Description

Given I create a sample rule in rulemaker v4
Given I download the rule from the workflow page ("Save Copy of Rule to Local Machine")
Then the output file is missing a leading | on the INDEX row

I think all rows are expected to have a leading pipe (|) character.

Of course a technical file format specification would be useful to coordinate rulemaker implementation and parser implementations in rule taker & rule reserve.

See attached resulting file from download.

Files

rule_933e80c7-72d8-4990-8445-97ea6799322d.txt

10.6 KB

sample file from rulemaker v4

Charles Langlois, 02/08/2026 02:50 AM

Actions

Copy link

Updated by Joseph Potvin 5 months ago

Status changed from New to In Progress
Assignee changed from Huda Hussain to Charles Langlois
Priority changed from Normal to High

Here is the issue I mentioned in our call, regarding one more missing pipe "|" characters.

Actions

Copy link

Updated by Joseph Potvin 5 months ago

RE: "Of course a technical file format specification would be useful to coordinate rulemaker implementation and parser implementations in rule taker & rule reserve."

Here it is (which I intend to integrate into the specification / thesis as I work towards the book form:
https://gitlab.com/xalgorithms-alliance/data-with-direction-specification/dwds-documents/-/blob/16918d90c276397a23e9c83bf59e5bac043cd81b/current/Refinement%20of%20the%20DWD%20Data%20Structure.pdf

RE: "I think all rows are expected to have a leading pipe (|) character.
And a trailing, with the result that that, when a rule or lookup record is stored in RR on a single row, the double pipe || can signify "line break" (we used to use Carriage return + line feed (CRLF) but that's not intuitive for non-techs). That suggests we should ensure that all unfilled fields contain "NULL" (borrowed from SQL and C and C++).

Actions

Copy link

Updated by Charles Langlois 5 months ago

Joseph Potvin wrote in #note-2:

RE: "Of course a technical file format specification would be useful to coordinate rulemaker implementation and parser implementations in rule taker & rule reserve."

Here it is (which I intend to integrate into the specification / thesis as I work towards the book form:
https://gitlab.com/xalgorithms-alliance/data-with-direction-specification/dwds-documents/-/blob/16918d90c276397a23e9c83bf59e5bac043cd81b/current/Refinement%20of%20the%20DWD%20Data%20Structure.pdf

RE: "I think all rows are expected to have a leading pipe (|) character.
And a trailing, with the result that that, when a rule or lookup record is stored in RR on a single row, the double pipe || can signify "line break" (we used to use Carriage return + line feed (CRLF) but that's not intuitive for non-techs). That suggests we should ensure that all unfilled fields contain "NULL" (borrowed from SQL and C and C++).

Alright for a trailing pipe as well, for consistency.

Not sure I understand what you mean however.

What's non-intuitive about CRLF? Of course I wouldn't prescribe CRLF over LF for line endings (CRLF a default for legacy retrocompatibility, and semantically equivalent to LF). But I see nothing more intuitive for separating records in a text format than a line break. A LF character is a single byte character that's universally interpreted by text-supporting software into a visual line break. Any record separator that's not displayed as a line break would make the data very hard to read (for humans).

In contexts where the data is only consumed by machine, the representation of record separators is irrelevant/not present (e.g. in SQL table form, or in-memory data structure, there is no concern about the text representation of the separators, they are abstracted away in the query interface).

I would personally favor keeping the usual convention of "||" being interpreted as an empty field (two separator with nothing in-between), and record separators being line endings (visual, conventional, 1 byte character so as efficient as you can hope for a text format). This keeps thing both simple and efficient (no need for a special token to represent nothing, a sparse table in text format can save significant storage space and wire traffic if all fields would otherwise be null instead of empty). And it should be conventional & intuitive enough for anyone used to reading csv-style formats, or text-based tables.

It may be that the need does arise to distinguish accidental omission from intentional empty content. This leads to the question of default values for some fields. a "null" token is often used to represent a default of no relevant value, distinct from an actual empty (zero-length) string. If that becomes important, and we don't have quotation marks to explicitly represent the presence of text values, then we need to support empty fields as distinct from another value like null.

If that's assumed to not be a concern, I would argue for simplicity and minimalism. less characters the better.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Custom queries

Bug #356

Missing leading "|" on "INDEX" row in downloaded rule from rulemaker v4

Updated by Joseph Potvin 5 months ago

Updated by Joseph Potvin 5 months ago

Updated by Charles Langlois 5 months ago