Support #342
openReview and comment on DWD Array and DWD Coordinates data structuring
0%
Description
I'd like some feedback on the DWD Array and DWD Coordinates data structuring we recently implemented in RuleMaker. You can create some test rules here https://rulemaker3-dev.onrender.com -- I'll do a walkthrough online when you want.
Here is the (slightly updated) spec that Huda implemented recently:
PDF
https://gitlab.com/xalgorithms-alliance/data-with-direction-specification/dwds-documents/-/blob/master/Semi-Technical%20Note%20on%20DWDS%20Lookup%20Tables_2024-12-24_Edited2025-09-07PDF.pdf
ODT
https://gitlab.com/xalgorithms-alliance/data-with-direction-specification/dwds-documents/-/blob/master/Semi-Technical%20Note%20on%20DWDS%20Lookup%20Tables_2024-12-24_Edited2025-09-07ODT.odt
Once this receives more feedback I'll convert it into markdown and create a README on GitLab.
Updated by Joseph Potvin 9 months ago
A comment on the criteria for assessment of the DWD Coordinates structure...
For our lookup tables and logic gates, it is intended to be:
- the most compressed structure without losing human auditability;
- the easiest to audit for data integrity, including very large tables;
- the fastest to process, as the only non-data character is pipe "|", no other structuring characters to interpret
Updated by Andrew Feng 9 months ago
Joseph Potvin wrote:
I'd like some feedback on the DWD Array and DWD Coordinates data structuring we recently implemented in RuleMaker. You can create some test rules here https://rulemaker3-dev.onrender.com -- I'll do a walkthrough online when you want.
Here is the (slightly updated) spec that Huda implemented recently:
https://gitlab.com/xalgorithms-alliance/data-with-direction-specification/dwds-documents/-/blob/master/Semi-Technical%20Note%20on%20DWDS%20Lookup%20Tables_2024-12-24_Edited2025-09-07PDF.pdf
ODT
https://gitlab.com/xalgorithms-alliance/data-with-direction-specification/dwds-documents/-/blob/master/Semi-Technical%20Note%20on%20DWDS%20Lookup%20Tables_2024-12-24_Edited2025-09-07ODT.odtOnce this receives more feedback I'll convert it into markdown and create a README on GitLab.
My understanding of the problem is as follows. We need a textual representation of a mapping from keys to values, where the keys live in the Cartesian product space K1xK2xK3 (of the form (k1,k2,k3) where k1 is in K1, k2 is in K2, ...). We also need to ensure such a representation is compressed, easy to audit, and fast to process. I believe the DWD array achieves the last two, while the DWD coordinate list achieves all three with slight compromises to the second.
The first and second criteria are quite straightforward to assess, and as for ease of auditing, I think it is satisfied because of the following.
- Given any condition on the keys, it is easy to find values whose associated key satisfies that condition. For example, it is easy to find all values whose key satisfies k1=a and k2=ii.
- Given any value, it is easy to find its associated key.
Updated by Joseph Potvin 9 months ago
Thanks.
In the sample file for Section 5.2.2 of RFC9000, you'll see a genuine coordinate list. I think there's got to be a better way to handle the column indexing and the labelling of groups in the key hierarchy.
I refer to these rows:
INDEX|DATA|1|2|3|..|207|208|209| |W1|COLUMNHEADER|1|2|3|..|207|208|209| .. |W2|Function|1|2|3|..|207|208|209| .. |W3|Expression|1|2|3|...|207|208|209|
The reason I left them all in is so that any data processing requirement can be performed directly on the data package. But it might be practical to represent series as done in the segment above with two dots.
Your thoughts?
Updated by Andrew Feng 9 months ago
Joseph Potvin wrote in #note-3:
Thanks.
In the sample file for Section 5.2.2 of RFC9000, you'll see a genuine coordinate list. I think there's got to be a better way to handle the column indexing and the labelling of groups in the key hierarchy.
I refer to these rows:
INDEX|DATA|1|2|3|..|207|208|209| |W1|COLUMNHEADER|1|2|3|..|207|208|209| .. |W2|Function|1|2|3|..|207|208|209| .. |W3|Expression|1|2|3|...|207|208|209|The reason I left them all in is so that any data processing requirement can be performed directly on the data package. But it might be practical to represent series as done in the segment above with two dots.
Your thoughts?
Can you elaborate on what you mean by column indexing and labelling of groups? I want to make sure I understand.
Updated by Joseph Potvin 9 months ago
I mean that I'd like to not have to redundantly repeat the entire column index each time a category needs to be referenced across all the columns. The reason for the repetition is so that sifting can be done on the data package "as is". Perhaps just make the SQL asterisk between pipes: INDEX|DATA |*| meaning SELECT to return every column.
INDEX|DATA |1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35|36|37|38|39|40|41|42|43|44|45|46|47|48|49|50|51|52|53|54|55|56|57|58|59|60|61|62|63|64|65|66|67|68|69|70|71|72|73|74|75|76|77|78|79|80|81|82|83|84|85|86|87|88|89|90|91|92|93|94|95|96|97|98|99|100|101|102|103|104|105|106|107|108|109|110|111|112|113|114|115|116|117|118|119|120|121|122|123|124|125|126|127|128|129|130|131|132|133|134|135|136|137|138|139|140|141|142|143|144|145|146|147|148|149|150|151|152|153|154|155|156|157|158|159|160|161|162|163|164|165|166|167|168|169|170|171|172|173|174|175|176|177|178|179|180|181|182|183|184|185|186|187|188|189|190|191|192|193|194|195|196|197|198|199|200|201|202|203|204|205|206|207|208|209|
|W1|COLUMNHEADER |1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35|36|37|38|39|40|41|42|43|44|45|46|47|48|49|50|51|52|53|54|55|56|57|58|59|60|61|62|63|64|65|66|67|68|69|70|71|72|73|74|75|76|77|78|79|80|81|82|83|84|85|86|87|88|89|90|91|92|93|94|95|96|97|98|99|100|101|102|103|104|105|106|107|108|109|110|111|112|113|114|115|116|117|118|119|120|121|122|123|124|125|126|127|128|129|130|131|132|133|134|135|136|137|138|139|140|141|142|143|144|145|146|147|148|149|150|151|152|153|154|155|156|157|158|159|160|161|162|163|164|165|166|167|168|169|170|171|172|173|174|175|176|177|178|179|180|181|182|183|184|185|186|187|188|189|190|191|192|193|194|195|196|197|198|199|200|201|202|203|204|205|206|207|208|209|
|W1.1|A|1|12|23|34|45|56|67|78|89|100|111|122|133|144|155|166|177|188|199|
|W1.2|B|2|13|24|35|46|57|68|79|90|101|112|123|134|145|156|167|178|189|200|
|W1.3|C|3|14|25|36|47|58|69|80|91|102|113|124|135|146|157|168|179|190|201|
|W1.4|D|4|15|26|37|48|59|70|81|92|103|114|125|136|147|158|169|180|191|202|
|W1.5|E|5|16|27|38|49|60|71|82|93|104|115|126|137|148|159|170|181|192|203|
|W1.6|F|6|17|28|39|50|61|72|83|94|105|116|127|138|149|160|171|182|193|204|
|W1.7|G|7|18|29|40|51|62|73|84|95|106|117|128|139|150|161|172|183|194|205|
|W1.8|H|8|19|30|41|52|63|74|85|96|107|118|129|140|151|162|173|184|195|206|
|W1.9|I|9|20|31|42|53|64|75|86|97|108|119|130|141|152|163|174|185|196|207|
|W1.10|J|10|21|32|43|54|65|76|87|98|109|120|131|142|153|164|175|186|197|208|
|W1.11|K|11|22|33|44|55|66|77|88|99|110|121|132|143|154|165|176|187|198|209|
|W2|Function |1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35|36|37|38|39|40|41|42|43|44|45|46|47|48|49|50|51|52|53|54|55|56|57|58|59|60|61|62|63|64|65|66|67|68|69|70|71|72|73|74|75|76|77|78|79|80|81|82|83|84|85|86|87|88|89|90|91|92|93|94|95|96|97|98|99|100|101|102|103|104|105|106|107|108|109|110|111|112|113|114|115|116|117|118|119|120|121|122|123|124|125|126|127|128|129|130|131|132|133|134|135|136|137|138|139|140|141|142|143|144|145|146|147|148|149|150|151|152|153|154|155|156|157|158|159|160|161|162|163|164|165|166|167|168|169|170|171|172|173|174|175|176|177|178|179|180|181|182|183|184|185|186|187|188|189|190|191|192|193|194|195|196|197|198|199|200|201|202|203|204|205|206|207|208|209|
|W2.1|Input Condition|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35|36|37|38|39|40|41|42|43|44|45|46|47|48|49|50|51|52|53|54|55|56|57|58|59|60|61|62|63|64|65|66|67|68|69|70|71|72|73|74|75|76|77|78|79|80|81|82|83|84|85|86|87|88|89|90|91|92|93|94|95|96|97|98|99|100|101|102|103|104|105|106|107|108|109|110|
|W2.2|Output Assertion|111|112|113|114|115|116|117|118|119|120|121|122|123|124|125|126|127|128|129|130|131|132|133|134|135|136|137|138|139|140|141|142|143|144|145|146|147|148|149|150|151|152|153|154|155|156|157|158|159|160|161|162|163|164|165|166|167|168|169|170|171|172|173|174|175|176|177|178|179|180|181|182|183|184|185|186|187|188|189|190|191|192|193|194|195|196|197|198|199|200|201|202|203|204|205|206|207|208|209|
|W3|Expression |1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35|36|37|38|39|40|41|42|43|44|45|46|47|48|49|50|51|52|53|54|55|56|57|58|59|60|61|62|63|64|65|66|67|68|69|70|71|72|73|74|75|76|77|78|79|80|81|82|83|84|85|86|87|88|89|90|91|92|93|94|95|96|97|98|99|100|101|102|103|104|105|106|107|108|109|110|111|112|113|114|115|116|117|118|119|120|121|122|123|124|125|126|127|128|129|130|131|132|133|134|135|136|137|138|139|140|141|142|143|144|145|146|147|148|149|150|151|152|153|154|155|156|157|158|159|160|161|162|163|164|165|166|167|168|169|170|171|172|173|174|175|176|177|178|179|180|181|182|183|184|185|186|187|188|189|190|191|192|193|194|195|196|197|198|199|200|201|202|203|204|205|206|207|208|209|
|W3.1|{"determiner":"The","past_participle_verb":"received","noun":"packet","predicate_verb":"is","attribute":"an Initial packet","description":"that fully conforms to the specification."}|1|2|3|4|5|6|7|8|9|10|11|
|W3.2|{"determiner":"The","past_participle_verb":"received","noun":"packet","predicate_verb":"is","attribute":"a Handshake packet","description":"that arrived before any server response."}|12|13|14|15|16|17|18|19|20|21|22|
|W3.3|{"determiner":"The","past_participle_verb":"received","noun":"packet","predicate_verb":"is","attribute":"type","description":"0-RTT (Zero Round-Trip Time)."}|23|24|25|26|27|28|29|30|31|32|33|
Updated by Andrew Feng 9 months ago
Okay I see the redundant info in the column index, but I don't get what you mean by sifting the data as-is. In other words, I am not getting why the column index is there. Maybe I am not familiar with the sifting process?
Updated by Joseph Potvin 9 months ago
[Sorry for the delayed response -- was meeting a deadline.]
Let's hop on a videocall to discuss the data structure and how it is used. Do you prefer morning or evening?
-