Bug #356
openMissing leading "|" on "INDEX" row in downloaded rule from rulemaker v4
0%
Description
Given I create a sample rule in rulemaker v4
Given I download the rule from the workflow page ("Save Copy of Rule to Local Machine")
Then the output file is missing a leading | on the INDEX row
I think all rows are expected to have a leading pipe (|) character.
Of course a technical file format specification would be useful to coordinate rulemaker implementation and parser implementations in rule taker & rule reserve.
See attached resulting file from download.
Files
Updated by Joseph Potvin 4 months ago
- Status changed from New to In Progress
- Assignee changed from Huda Hussain to Charles Langlois
- Priority changed from Normal to High
Here is the issue I mentioned in our call, regarding one more missing pipe "|" characters.
Updated by Joseph Potvin 4 months ago
RE: "Of course a technical file format specification would be useful to coordinate rulemaker implementation and parser implementations in rule taker & rule reserve."
Here it is (which I intend to integrate into the specification / thesis as I work towards the book form:
https://gitlab.com/xalgorithms-alliance/data-with-direction-specification/dwds-documents/-/blob/16918d90c276397a23e9c83bf59e5bac043cd81b/current/Refinement%20of%20the%20DWD%20Data%20Structure.pdf
RE: "I think all rows are expected to have a leading pipe (|) character.
And a trailing, with the result that that, when a rule or lookup record is stored in RR on a single row, the double pipe || can signify "line break" (we used to use Carriage return + line feed (CRLF) but that's not intuitive for non-techs). That suggests we should ensure that all unfilled fields contain "NULL" (borrowed from SQL and C and C++).
Updated by Charles Langlois 4 months ago
Joseph Potvin wrote in #note-2:
RE: "Of course a technical file format specification would be useful to coordinate rulemaker implementation and parser implementations in rule taker & rule reserve."
Here it is (which I intend to integrate into the specification / thesis as I work towards the book form:
https://gitlab.com/xalgorithms-alliance/data-with-direction-specification/dwds-documents/-/blob/16918d90c276397a23e9c83bf59e5bac043cd81b/current/Refinement%20of%20the%20DWD%20Data%20Structure.pdfRE: "I think all rows are expected to have a leading pipe (|) character.
And a trailing, with the result that that, when a rule or lookup record is stored in RR on a single row, the double pipe || can signify "line break" (we used to use Carriage return + line feed (CRLF) but that's not intuitive for non-techs). That suggests we should ensure that all unfilled fields contain "NULL" (borrowed from SQL and C and C++).
Alright for a trailing pipe as well, for consistency.
Not sure I understand what you mean however.
What's non-intuitive about CRLF? Of course I wouldn't prescribe CRLF over LF for line endings (CRLF a default for legacy retrocompatibility, and semantically equivalent to LF). But I see nothing more intuitive for separating records in a text format than a line break. A LF character is a single byte character that's universally interpreted by text-supporting software into a visual line break. Any record separator that's not displayed as a line break would make the data very hard to read (for humans).
In contexts where the data is only consumed by machine, the representation of record separators is irrelevant/not present (e.g. in SQL table form, or in-memory data structure, there is no concern about the text representation of the separators, they are abstracted away in the query interface).
I would personally favor keeping the usual convention of "||" being interpreted as an empty field (two separator with nothing in-between), and record separators being line endings (visual, conventional, 1 byte character so as efficient as you can hope for a text format). This keeps thing both simple and efficient (no need for a special token to represent nothing, a sparse table in text format can save significant storage space and wire traffic if all fields would otherwise be null instead of empty). And it should be conventional & intuitive enough for anyone used to reading csv-style formats, or text-based tables.
It may be that the need does arise to distinguish accidental omission from intentional empty content. This leads to the question of default values for some fields. a "null" token is often used to represent a default of no relevant value, distinct from an actual empty (zero-length) string. If that becomes important, and we don't have quotation marks to explicitly represent the presence of text values, then we need to support empty fields as distinct from another value like null.
If that's assumed to not be a concern, I would argue for simplicity and minimalism. less characters the better.
-