Project

General

Profile

Actions

Design #347

open

Comment on the DWD data structure for lookup tables

Added by Joseph Potvin 8 months ago. Updated 4 months ago.

Status:
New
Priority:
Normal
Category:
Research
Start date:
02/08/2026
Due date:
% Done:

0%

Estimated time:
(Total: 0.00 h)

Description

Andrew and Charles, I'd appreciate comments and questions on the following. For reference, you could look at the attached CSV table in a spreadsheet. The purpose of the DWD data structure is to simultaneously optimize for simplicity (only one added character, which is pipe: '|'); compact storage (on a single row of a reference table); easy human auditability; fast processing; and a clean consistent structure that can be auto-adapted to any table.

The DWD Coordinate List looks like this:

INDEX|DATA|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|
|W1|COLUMNHEADER|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|
|W1.1|Function|1|7|13|19|25|
|W1.2|Attribute|2|8|14|20|26|
|W1.3|1960s-1970s|3|9|15|21|27|
|W1.4|1980s-1990s|4|10|16|22|28|
|W1.5|2000s-2010s|5|11|17|23|29|
|W1.6|2020s-2030s|6|12|18|24|30|
|T_W1.1|Communication Devices|1|
|T_W1.2|Wired|2|
|T_W1.3|Landline Phones|3|
|T_W1.4|Fax Machines|4|
|T_W1.5|ISDN/DSL; Fibre-Optic|5|
|T_W1.6|Multiplexed Fibre-Optic; Brain-Computer|6|
|T_W1.1|Communication Devices|7|
|T_W1.2|Wireless|8|
|T_W1.3|Radio; Television|9|
|T_W1.4|Wireless Phones|10|
|T_W1.5|Smartphones|11|
|T_W1.6|IoT; 5G/6G; Satellite Internet|12|
|T_W1.1|General-Purpose Computers|13|
|T_W1.2|Platform|14|
|T_W1.3|Mainframe Computers|15|
|T_W1.4|Desktop Computers|16|
|T_W1.5|Laptops/Tablets; Smartphones|17|
|T_W1.6|IoT; mRNA|18|
|T_W1.1|General-Purpose Computers|19|
|T_W1.2|Mobility|20|
|T_W1.3|Stationary|21|
|T_W1.4|Movable|22|
|T_W1.5|Mobile|23|
|T_W1.6|Ubiquitous; Ambient|24|
|T_W1.1|General-Purpose Computers|25|
|T_W1.2|Storage|26|
|T_W1.3|Magnetic Tapes|27|
|T_W1.4|Floppy Disks; CDs|28|
|T_W1.5|Flash Drives; Central Cloud|29|
|T_W1.6|ReRAM/3D-NAND Drives; IPFS Cloud; DNA|30|

The expanded DWD Array for that data is like this (which is easier to 'see' when right-justified):

|INDEX|DATA|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|
|W1|COLUMNHEADER|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|
|W1.1|Function|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|
|W1.2|Attribute|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|
|W1.3|1960s-1970s|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|
|W1.4|1980s-1990s|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|
|W1.5|2000s-2010s|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|
|W1.6|2020s-2030s|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|
|T_W1.1|Communication Devices|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.2|Wired|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.3|Landline Phones|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.4|Fax Machines|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.5|ISDN/DSL; Fibre-Optic|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.6|Multiplexed Fibre-Optic; Brain-Computer|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.1|Communication Devices|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.2|Wireless|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.3|Radio; Television|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.4|Wireless Phones|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.5|Smartphones|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.6|IoT; 5G/6G; Satellite Internet|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.1|General-Purpose Computers|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.2|Platform|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.3|Mainframe Computers|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.4|Desktop Computers|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.5|Laptops/Tablets; Smartphones|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.6|IoT; mRNA|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.1|General-Purpose Computers|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.2|Mobility|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|
|T_W1.3|Stationary|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|
|T_W1.4|Movable|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|
|T_W1.5|Mobile|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|
|T_W1.6|Ubiquitous; Ambient|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|
|T_W1.1|General-Purpose Computers|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|
|T_W1.2|Storage|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|
|T_W1.3|Magnetic Tapes|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|
|T_W1.4|Floppy Disks; CDs|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|
|T_W1.5|Flash Drives; Central Cloud|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|
|T_W1.6|ReRAM/3D-NAND Drives; IPFS Cloud; DNA|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|

The CSV, shown below, would be difficult for a human to audit, especially if this table were much larger, because it's not obvious to distinguish secondary keys from values. Beyond a tiny table, it becomes just a data blob. By comparison, the DWD data structure is more intuitive and, with the API, would be faster for a computer to process.

Function,Attribute,1960s-1970s,1980s-1990s,2000s-2010s,2020s-2030s
Communication Devices,Wired,Landline Phones,Fax Machines,ISDN/DSL; Fibre-Optic,Multiplexed Fibre-Optic; Brain-Computer
Communication Devices,Wireless,Radio; Television,Wireless Phones,Smartphones,IoT; 5G/6G; Satellite Internet
General-Purpose Computers,Platform,Mainframe Computers,Desktop Computers,Laptops/Tablets; Smartphones,IoT; mRNA
General-Purpose Computers,Mobility,Stationary,Movable,Mobile,Ubiquitous; Ambient
General-Purpose Computers,Storage,Magnetic Tapes,Floppy Disks; CDs,Flash Drives; Central Cloud,ReRAM/3D-NAND Drives; IPFS Cloud; DNA

The SQL could look like this:

CREATE TABLE technology_categories (
    id INT PRIMARY KEY,
    category_name VARCHAR(50) NOT NULL
);

CREATE TABLE technology_attributes (
    id INT PRIMARY KEY,
    category_id INT,
    attribute_name VARCHAR(50) NOT NULL,
    FOREIGN KEY (category_id) REFERENCES technology_categories(id)
);

CREATE TABLE technology_evolution (
    id INT PRIMARY KEY,
    attribute_id INT,
    era VARCHAR(20),
    technologies TEXT,
    FOREIGN KEY (attribute_id) REFERENCES technology_attributes(id)
);

With data:
sql

INSERT INTO technology_categories VALUES
(1, 'Communication Devices'),
(2, 'General-Purpose Computers');

INSERT INTO technology_attributes VALUES
(1, 1, 'Wired'),
(2, 1, 'Wireless'),
(3, 2, 'Platform'),
(4, 2, 'Mobility'),
(5, 2, 'Storage');

INSERT INTO technology_evolution VALUES
(1, 1, '1960s-1970s', 'Landline Phones'),
(2, 1, '1980s-1990s', 'Fax Machines'),
(3, 1, '2000s-2010s', 'ISDN/DSL; Fibre-Optic'),
(4, 1, '2020s-2030s', 'Multiplexed Fibre-Optic; Brain-Computer'),
(5, 2, '1960s-1970s', 'Radio; Television'),
(6, 2, '1980s-1990s', 'Wireless Phones'),
(7, 2, '2000s-2010s', 'Smartphones'),
(8, 2, '2020s-2030s', 'IoT; 5G/6G; Satellite Internet'),
(9, 3, '1960s-1970s', 'Mainframe Computers'),
(10, 3, '1980s-1990s', 'Desktop Computers'),
(11, 3, '2000s-2010s', 'Laptops/Tablets; Smartphones'),
(12, 3, '2020s-2030s', 'IoT; mRNA'),
(13, 4, '1960s-1970s', 'Stationary'),
(14, 4, '1980s-1990s', 'Movable'),
(15, 4, '2000s-2010s', 'Mobile'),
(16, 4, '2020s-2030s', 'Ubiquitous; Ambient'),
(17, 5, '1960s-1970s', 'Magnetic Tapes'),
(18, 5, '1980s-1990s', 'Floppy Disks; CDs'),
(19, 5, '2000s-2010s', 'Flash Drives; Central Cloud'),
(20, 5, '2020s-2030s', 'ReRAM/3D-NAND Drives; IPFS Cloud; DNA');

Files

computing-history-table.csv computing-history-table.csv 620 Bytes Joseph Potvin, 10/10/2025 03:38 PM
computing-history_dwd-array.png View computing-history_dwd-array.png 63.7 KB Joseph Potvin, 10/12/2025 09:03 AM

Subtasks 1 (1 open0 closed)

Output 2: Rule Authoring Software - Feature #359: DWD format design: uniform metadata fieldsNewJoseph Potvin02/08/2026

Actions
Actions #1

Updated by Joseph Potvin 8 months ago

  • Tracker changed from Bug to Design

Oops I made a mistake. I forgot to group the first two columns as keys! Well, lets make this an example of data quality auditing. Here is the corrected DWD Array (and the corrected DWD Coordinate List below):

|INDEX DATA|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|
|W1|COLUMNHEADER|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|
|W1.1 1960s-1970s|1|0|0|0|1|0|0|0|1|0|0|0|1|0|0|0|1|0|0|0|
|W1.2|1980s-1990s|0|1|0|0|0|1|0|0|0|1|0|0|0|1|0|0|0|1|0|0|
|W1.3|2000s-2010s|0|0|1|0|0|0|1|0|0|0|1|0|0|0|1|0|0|0|1|0|
|W1.4|2020s-2030s|0|0|0|1|0|0|0|1|0|0|0|1|0|0|0|1|0|0|0|1|
|W2|Function|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|
|W2.1|Communication Devices|1|1|1|1|1|1|1|1|0|0|0|0|0|0|0|0|0|0|0|0|
|W2.2|General-Purpose Computers|0|0|0|0|0|0|0|0|1|1|1|1|1|1|1|1|1|1|1|1|
|W3|Attribute|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|
|W3.1|Wired|1|1|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|W3.2|Wireless|0|0|0|0|1|1|1|1|0|0|0|0|0|0|0|0|0|0|0|0|
|W3.3|Platform|0|0|0|0|0|0|0|0|1|1|1|1|0|0|0|0|0|0|0|0|
|W3.4|Mobility|0|0|0|0|0|0|0|0|0|0|0|0|1|1|1|1|0|0|0|0|
|W3.5 Storage|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|1|1|1|
|T_W1.1_W2.1_W3.1|Landline Phones|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.2_W2.1_W3.1|Fax Machines|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.3_W2.1_W3.1|ISDN/DSL; Fibre-Optic|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.4_W2.1_W3.1|Multiplexed Fibre-Optic; Brain-Computer|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.1_W2.1_W3.2|Radio; Television|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.2_W2.1_W3.2|Wireless Phones|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.3_W2.1_W3.2|Smartphones|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.4_W2.1_W3.2|IoT; 5G/6G; Satellite Internet|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.1_W2.2_W3.3|Mainframe Computers|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.2_W2.2_W3.3|Desktop Computers|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|
|T_W1.3_W2.2_W3.3|Laptops/Tablets; Smartphones|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|
|T_W1.4_W2.2_W3.3|IoT; mRNA|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|
|T_W1.1_W2.2_W3.4|Stationary|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|
|T_W1.2_W2.2_W3.4|Movable|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|
|T_W1.3_W2.2_W3.4|Mobile|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|
|T_W1.4_W2.2_W3.4|Ubiquitous; Ambient|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|
|T_W1.1_W2.2_W3.5|Magnetic Tapes|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|
|T_W1.2_W2.2_W3.5|Floppy Disks; CDs|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|
|T_W1.3_W2.2_W3.5|Flash Drives; Central Cloud|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|
|T_W1.4_W2.2_W3.5|ReRAM/3D-NAND Drives; IPFS Cloud; DNA|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|

And here is the correct DWD Coordinate List:

|INDEX DATA|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|
|W1|COLUMNHEADER|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|
|W1.1 1960s-1970s|1|5|9|13|17|
|W1.2 1980s-1990s|2|6|10|14|18|
|W1.3 2000s-2010s|3|7|11|15|19|
|W1.4|2020s-2030s|4|8|12|16|20|
|W2|Function|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|
|W2.1|Communication Devices|1|2|3|4|5|6|7|8|
|W2.2|General-Purpose Computers|9|10|11|12|13|14|15|16|17|18|19|20|
|W3|Attribute|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|
|W3.1 Wired|1|2|3|4|W3.2 Wireless|5|6|7|8|
|W3.3|Platform|9|10|11|12|
|W3.4|Mobility|13|14|15|16|
|W3.5 Storage|17|18|19|20|
|T_W1.1_W2.1_W3.1|Landline Phones|1|
|T_W1.2_W2.1_W3.1| Fax Machines|2|
|T_W1.3_W2.1_W3.1|ISDN/DSL; Fibre-Optic|3|
|T_W1.4_W2.1_W3.1|Multiplexed Fibre-Optic; Brain-Computer|4|
|T_W1.1_W2.1_W3.2|Radio; Television 5|
|T_W1.2_W2.1_W3.2|Wireless Phones|6|
|T_W1.3_W2.1_W3.2|Smartphones|7|
|T_W1.4_W2.1_W3.2|IoT; 5G/6G; Satellite Internet|8|
|T_W1.1_W2.2_W3.3|Mainframe Computers|9|
|T_W1.2_W2.2_W3.3 Desktop Computers|10|
|T_W1.3_W2.2_W3.3|Laptops/Tablets; Smartphones|11|
|T_W1.4_W2.2_W3.3|IoT; mRNA|12|
|T_W1.1_W2.2_W3.4|Stationary|13|
|T_W1.2_W2.2_W3.4|Movable|14|
|T_W1.3_W2.2_W3.4|Mobile|15|
|T_W1.4_W2.2_W3.4 Ubiquitous; Ambient|16|
|T_W1.1_W2.2_W3.5|Magnetic Tapes|17|
|T_W1.2_W2.2_W3.5|Floppy Disks; CDs|18|
|T_W1.3_W2.2_W3.5|Flash Drives; Central Cloud|19|
|T_W1.4_W2.2_W3.5|ReRAM/3D-NAND Drives; IPFS Cloud; DNA|20|

Actions #2

Updated by Joseph Potvin 8 months ago

  • File computing-history_dwd-array.png added
Actions #3

Updated by Joseph Potvin 8 months ago

  • File deleted (computing-history_dwd-array.png)
Actions #4

Updated by Joseph Potvin 8 months ago

A right-justified DWD Array image is attached here.

Actions #5

Updated by Charles Langlois 7 months ago

Joseph Potvin wrote:

Andrew and Charles, I'd appreciate comments and questions on the following. For reference, you could look at the attached CSV table in a spreadsheet. The purpose of the DWD data structure is to simultaneously optimize for simplicity (only one added character, which is pipe: '|'); compact storage (on a single row of a reference table); easy human auditability; fast processing; and a clean consistent structure that can be auto-adapted to any table.

The DWD Coordinate List looks like this:

INDEX|DATA|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|
|W1|COLUMNHEADER|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|
|W1.1|Function|1|7|13|19|25|
|W1.2|Attribute|2|8|14|20|26|
|W1.3|1960s-1970s|3|9|15|21|27|
|W1.4|1980s-1990s|4|10|16|22|28|
|W1.5|2000s-2010s|5|11|17|23|29|
|W1.6|2020s-2030s|6|12|18|24|30|
|T_W1.1|Communication Devices|1|
|T_W1.2|Wired|2|
|T_W1.3|Landline Phones|3|
|T_W1.4|Fax Machines|4|
|T_W1.5|ISDN/DSL; Fibre-Optic|5|
|T_W1.6|Multiplexed Fibre-Optic; Brain-Computer|6|
|T_W1.1|Communication Devices|7|
|T_W1.2|Wireless|8|
|T_W1.3|Radio; Television|9|
|T_W1.4|Wireless Phones|10|
|T_W1.5|Smartphones|11|
|T_W1.6|IoT; 5G/6G; Satellite Internet|12|
|T_W1.1|General-Purpose Computers|13|
|T_W1.2|Platform|14|
|T_W1.3|Mainframe Computers|15|
|T_W1.4|Desktop Computers|16|
|T_W1.5|Laptops/Tablets; Smartphones|17|
|T_W1.6|IoT; mRNA|18|
|T_W1.1|General-Purpose Computers|19|
|T_W1.2|Mobility|20|
|T_W1.3|Stationary|21|
|T_W1.4|Movable|22|
|T_W1.5|Mobile|23|
|T_W1.6|Ubiquitous; Ambient|24|
|T_W1.1|General-Purpose Computers|25|
|T_W1.2|Storage|26|
|T_W1.3|Magnetic Tapes|27|
|T_W1.4|Floppy Disks; CDs|28|
|T_W1.5|Flash Drives; Central Cloud|29|
|T_W1.6|ReRAM/3D-NAND Drives; IPFS Cloud; DNA|30|

The expanded DWD Array for that data is like this (which is easier to 'see' when right-justified):

|INDEX|DATA|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|
|W1|COLUMNHEADER|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|
|W1.1|Function|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|
|W1.2|Attribute|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|
|W1.3|1960s-1970s|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|
|W1.4|1980s-1990s|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|
|W1.5|2000s-2010s|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|
|W1.6|2020s-2030s|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|
|T_W1.1|Communication Devices|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.2|Wired|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.3|Landline Phones|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.4|Fax Machines|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.5|ISDN/DSL; Fibre-Optic|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.6|Multiplexed Fibre-Optic; Brain-Computer|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.1|Communication Devices|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.2|Wireless|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.3|Radio; Television|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.4|Wireless Phones|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.5|Smartphones|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.6|IoT; 5G/6G; Satellite Internet|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.1|General-Purpose Computers|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.2|Platform|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.3|Mainframe Computers|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.4|Desktop Computers|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.5|Laptops/Tablets; Smartphones|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.6|IoT; mRNA|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.1|General-Purpose Computers|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.2|Mobility|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|
|T_W1.3|Stationary|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|
|T_W1.4|Movable|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|
|T_W1.5|Mobile|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|
|T_W1.6|Ubiquitous; Ambient|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|
|T_W1.1|General-Purpose Computers|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|
|T_W1.2|Storage|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|
|T_W1.3|Magnetic Tapes|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|
|T_W1.4|Floppy Disks; CDs|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|
|T_W1.5|Flash Drives; Central Cloud|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|
|T_W1.6|ReRAM/3D-NAND Drives; IPFS Cloud; DNA|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|

The CSV, shown below, would be difficult for a human to audit, especially if this table were much larger, because it's not obvious to distinguish secondary keys from values. Beyond a tiny table, it becomes just a data blob. By comparison, the DWD data structure is more intuitive and, with the API, would be faster for a computer to process.

Function,Attribute,1960s-1970s,1980s-1990s,2000s-2010s,2020s-2030s
Communication Devices,Wired,Landline Phones,Fax Machines,ISDN/DSL; Fibre-Optic,Multiplexed Fibre-Optic; Brain-Computer
Communication Devices,Wireless,Radio; Television,Wireless Phones,Smartphones,IoT; 5G/6G; Satellite Internet
General-Purpose Computers,Platform,Mainframe Computers,Desktop Computers,Laptops/Tablets; Smartphones,IoT; mRNA
General-Purpose Computers,Mobility,Stationary,Movable,Mobile,Ubiquitous; Ambient
General-Purpose Computers,Storage,Magnetic Tapes,Floppy Disks; CDs,Flash Drives; Central Cloud,ReRAM/3D-NAND Drives; IPFS Cloud; DNA

The SQL could look like this:

CREATE TABLE technology_categories (
    id INT PRIMARY KEY,
    category_name VARCHAR(50) NOT NULL
);

CREATE TABLE technology_attributes (
    id INT PRIMARY KEY,
    category_id INT,
    attribute_name VARCHAR(50) NOT NULL,
    FOREIGN KEY (category_id) REFERENCES technology_categories(id)
);

CREATE TABLE technology_evolution (
    id INT PRIMARY KEY,
    attribute_id INT,
    era VARCHAR(20),
    technologies TEXT,
    FOREIGN KEY (attribute_id) REFERENCES technology_attributes(id)
);

With data:
sql

INSERT INTO technology_categories VALUES
(1, 'Communication Devices'),
(2, 'General-Purpose Computers');

INSERT INTO technology_attributes VALUES
(1, 1, 'Wired'),
(2, 1, 'Wireless'),
(3, 2, 'Platform'),
(4, 2, 'Mobility'),
(5, 2, 'Storage');

INSERT INTO technology_evolution VALUES
(1, 1, '1960s-1970s', 'Landline Phones'),
(2, 1, '1980s-1990s', 'Fax Machines'),
(3, 1, '2000s-2010s', 'ISDN/DSL; Fibre-Optic'),
(4, 1, '2020s-2030s', 'Multiplexed Fibre-Optic; Brain-Computer'),
(5, 2, '1960s-1970s', 'Radio; Television'),
(6, 2, '1980s-1990s', 'Wireless Phones'),
(7, 2, '2000s-2010s', 'Smartphones'),
(8, 2, '2020s-2030s', 'IoT; 5G/6G; Satellite Internet'),
(9, 3, '1960s-1970s', 'Mainframe Computers'),
(10, 3, '1980s-1990s', 'Desktop Computers'),
(11, 3, '2000s-2010s', 'Laptops/Tablets; Smartphones'),
(12, 3, '2020s-2030s', 'IoT; mRNA'),
(13, 4, '1960s-1970s', 'Stationary'),
(14, 4, '1980s-1990s', 'Movable'),
(15, 4, '2000s-2010s', 'Mobile'),
(16, 4, '2020s-2030s', 'Ubiquitous; Ambient'),
(17, 5, '1960s-1970s', 'Magnetic Tapes'),
(18, 5, '1980s-1990s', 'Floppy Disks; CDs'),
(19, 5, '2000s-2010s', 'Flash Drives; Central Cloud'),
(20, 5, '2020s-2030s', 'ReRAM/3D-NAND Drives; IPFS Cloud; DNA');


Joseph Potvin wrote:

Andrew and Charles, I'd appreciate comments and questions on the following. For reference, you could look at the attached CSV table in a spreadsheet. The purpose of the DWD data structure is to simultaneously optimize for simplicity (only one added character, which is pipe: '|'); compact storage (on a single row of a reference table); easy human auditability; fast processing; and a clean consistent structure that can be auto-adapted to any table.

The DWD Coordinate List looks like this:

INDEX|DATA|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|
|W1|COLUMNHEADER|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|
|W1.1|Function|1|7|13|19|25|
|W1.2|Attribute|2|8|14|20|26|
|W1.3|1960s-1970s|3|9|15|21|27|
|W1.4|1980s-1990s|4|10|16|22|28|
|W1.5|2000s-2010s|5|11|17|23|29|
|W1.6|2020s-2030s|6|12|18|24|30|
|T_W1.1|Communication Devices|1|
|T_W1.2|Wired|2|
|T_W1.3|Landline Phones|3|
|T_W1.4|Fax Machines|4|
|T_W1.5|ISDN/DSL; Fibre-Optic|5|
|T_W1.6|Multiplexed Fibre-Optic; Brain-Computer|6|
|T_W1.1|Communication Devices|7|
|T_W1.2|Wireless|8|
|T_W1.3|Radio; Television|9|
|T_W1.4|Wireless Phones|10|
|T_W1.5|Smartphones|11|
|T_W1.6|IoT; 5G/6G; Satellite Internet|12|
|T_W1.1|General-Purpose Computers|13|
|T_W1.2|Platform|14|
|T_W1.3|Mainframe Computers|15|
|T_W1.4|Desktop Computers|16|
|T_W1.5|Laptops/Tablets; Smartphones|17|
|T_W1.6|IoT; mRNA|18|
|T_W1.1|General-Purpose Computers|19|
|T_W1.2|Mobility|20|
|T_W1.3|Stationary|21|
|T_W1.4|Movable|22|
|T_W1.5|Mobile|23|
|T_W1.6|Ubiquitous; Ambient|24|
|T_W1.1|General-Purpose Computers|25|
|T_W1.2|Storage|26|
|T_W1.3|Magnetic Tapes|27|
|T_W1.4|Floppy Disks; CDs|28|
|T_W1.5|Flash Drives; Central Cloud|29|
|T_W1.6|ReRAM/3D-NAND Drives; IPFS Cloud; DNA|30|

The expanded DWD Array for that data is like this (which is easier to 'see' when right-justified):

|INDEX|DATA|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|
|W1|COLUMNHEADER|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|
|W1.1|Function|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|
|W1.2|Attribute|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|
|W1.3|1960s-1970s|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|
|W1.4|1980s-1990s|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|
|W1.5|2000s-2010s|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|
|W1.6|2020s-2030s|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0|0|1|
|T_W1.1|Communication Devices|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.2|Wired|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.3|Landline Phones|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.4|Fax Machines|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.5|ISDN/DSL; Fibre-Optic|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.6|Multiplexed Fibre-Optic; Brain-Computer|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.1|Communication Devices|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.2|Wireless|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.3|Radio; Television|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.4|Wireless Phones|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.5|Smartphones|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.6|IoT; 5G/6G; Satellite Internet|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.1|General-Purpose Computers|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.2|Platform|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.3|Mainframe Computers|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.4|Desktop Computers|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.5|Laptops/Tablets; Smartphones|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.6|IoT; mRNA|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.1|General-Purpose Computers|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|
|T_W1.2|Mobility|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|
|T_W1.3|Stationary|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|
|T_W1.4|Movable|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|
|T_W1.5|Mobile|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|
|T_W1.6|Ubiquitous; Ambient|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|
|T_W1.1|General-Purpose Computers|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|
|T_W1.2|Storage|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|
|T_W1.3|Magnetic Tapes|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|
|T_W1.4|Floppy Disks; CDs|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|
|T_W1.5|Flash Drives; Central Cloud|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|
|T_W1.6|ReRAM/3D-NAND Drives; IPFS Cloud; DNA|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|

The CSV, shown below, would be difficult for a human to audit, especially if this table were much larger, because it's not obvious to distinguish secondary keys from values. Beyond a tiny table, it becomes just a data blob. By comparison, the DWD data structure is more intuitive and, with the API, would be faster for a computer to process.

Function,Attribute,1960s-1970s,1980s-1990s,2000s-2010s,2020s-2030s
Communication Devices,Wired,Landline Phones,Fax Machines,ISDN/DSL; Fibre-Optic,Multiplexed Fibre-Optic; Brain-Computer
Communication Devices,Wireless,Radio; Television,Wireless Phones,Smartphones,IoT; 5G/6G; Satellite Internet
General-Purpose Computers,Platform,Mainframe Computers,Desktop Computers,Laptops/Tablets; Smartphones,IoT; mRNA
General-Purpose Computers,Mobility,Stationary,Movable,Mobile,Ubiquitous; Ambient
General-Purpose Computers,Storage,Magnetic Tapes,Floppy Disks; CDs,Flash Drives; Central Cloud,ReRAM/3D-NAND Drives; IPFS Cloud; DNA

The SQL could look like this:

CREATE TABLE technology_categories (
    id INT PRIMARY KEY,
    category_name VARCHAR(50) NOT NULL
);

CREATE TABLE technology_attributes (
    id INT PRIMARY KEY,
    category_id INT,
    attribute_name VARCHAR(50) NOT NULL,
    FOREIGN KEY (category_id) REFERENCES technology_categories(id)
);

CREATE TABLE technology_evolution (
    id INT PRIMARY KEY,
    attribute_id INT,
    era VARCHAR(20),
    technologies TEXT,
    FOREIGN KEY (attribute_id) REFERENCES technology_attributes(id)
);

With data:
sql

INSERT INTO technology_categories VALUES
(1, 'Communication Devices'),
(2, 'General-Purpose Computers');

INSERT INTO technology_attributes VALUES
(1, 1, 'Wired'),
(2, 1, 'Wireless'),
(3, 2, 'Platform'),
(4, 2, 'Mobility'),
(5, 2, 'Storage');

INSERT INTO technology_evolution VALUES
(1, 1, '1960s-1970s', 'Landline Phones'),
(2, 1, '1980s-1990s', 'Fax Machines'),
(3, 1, '2000s-2010s', 'ISDN/DSL; Fibre-Optic'),
(4, 1, '2020s-2030s', 'Multiplexed Fibre-Optic; Brain-Computer'),
(5, 2, '1960s-1970s', 'Radio; Television'),
(6, 2, '1980s-1990s', 'Wireless Phones'),
(7, 2, '2000s-2010s', 'Smartphones'),
(8, 2, '2020s-2030s', 'IoT; 5G/6G; Satellite Internet'),
(9, 3, '1960s-1970s', 'Mainframe Computers'),
(10, 3, '1980s-1990s', 'Desktop Computers'),
(11, 3, '2000s-2010s', 'Laptops/Tablets; Smartphones'),
(12, 3, '2020s-2030s', 'IoT; mRNA'),
(13, 4, '1960s-1970s', 'Stationary'),
(14, 4, '1980s-1990s', 'Movable'),
(15, 4, '2000s-2010s', 'Mobile'),
(16, 4, '2020s-2030s', 'Ubiquitous; Ambient'),
(17, 5, '1960s-1970s', 'Magnetic Tapes'),
(18, 5, '1980s-1990s', 'Floppy Disks; CDs'),
(19, 5, '2000s-2010s', 'Flash Drives; Central Cloud'),
(20, 5, '2020s-2030s', 'ReRAM/3D-NAND Drives; IPFS Cloud; DNA');

One small remark/suggestion: I would recommend a file format versioning specifier in the header section. This allows a file to be consumed in the presence of multiple possible formats; this helps keep the specification evolvable and implementations reliable and retrocompatible: if an implementation is able to easily detect the presence of an older format of the file, it may decide to parse it using the older mechanism, or simply inform that it is not supported; if an older implementation encounters a newer file format that it does not support, it can fail reliably/safely instead of trying to parse it and potentially produce silent or confusing errors.
e.g.

|VERSION|1.0|
|INDEX DATA|...|
...
Actions #6

Updated by Charles Langlois 7 months ago

Charles Langlois wrote in #note-5:

One small remark/suggestion: I would recommend a file format versioning specifier in the header section. This allows a file to be consumed in the presence of multiple possible formats; this helps keep the specification evolvable and implementations reliable and retrocompatible: if an implementation is able to easily detect the presence of an older format of the file, it may decide to parse it using the older mechanism, or simply inform that it is not supported; if an older implementation encounters a newer file format that it does not support, it can fail reliably/safely instead of trying to parse it and potentially produce silent or confusing errors.
e.g.

|VERSION|1.0|
|INDEX DATA|...|
...
Actions #7

Updated by Charles Langlois 7 months ago

Joseph Potvin wrote:

The DWD Coordinate List looks like this:

INDEX|DATA|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|
|W1|COLUMNHEADER|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|
...

Questions:

  • why are we enumerating these indices? They will always be a enumeration of data row indices(1 to ), right? How is it used?
  • What is the purpose of the INDEX row and the "W1|COLUMNHEADER" row? is it only to specify the number of rows? In which case it could be more concisely expressed as a single integer, i.e. |INDEX DATA|30| or a pair of integers i.e. |INDEX DATA|1|30|
  • Does the COLUMNHEADER indices always match the INDEX DATA?
Actions #8

Updated by Charles Langlois 7 months ago

Finally understood the format design once I loaded it into a spreadsheet. Sorry that didn't click sooner, I am used to dealing with text formats very literally, and use spreadsheets only sparingly.

I think I understand now that this is trying to represent multi-dimensional tables, in a way that can be manipulated as two-dimensional tables in spreadsheet-like interfaces.

I understand where the design comes from, but my gut is feeling uncomfortable about the efficiency of the format from a machine processing and network transmission point of view.
Ultimately, I think there should be tools to go from human-oriented encodings, and ones that are more machine/network/simplicity optimized.

Maybe that would be satisfied simply if, like the transformation from "coordinates list" to the expanded "truth table", a set of transformations would result in variants of the format, optimized for network transmission and processing (like compressing/removing the 'INDEX|DATA' and 'W1|COLUMNHEADER' rows, which I think have no other uses than as visual indicators).

Actions #9

Updated by Joseph Potvin 7 months ago

RE: "a file format versioning specifier in the header section"

That's in the metadata, e.g.

|rule_id|8840e3db-57ec-4ccb-ac67-b51e6cc01790|
|rulereserve_nodes|*|
|version_standard_url|https://semver.org/|
|ruledata_version|0.0.0|

See pg 16 of my IRTF presentation for an example.
https://datatracker.ietf.org/meeting/124/materials/slides-124-rasprg-03-jpotvin-rasprg-00.pdf

RE: "Ultimately, I think there should be tools to go from human-oriented encodings, and ones that are more machine/network/simplicity optimized."

Since the reference case I work on is monetary/financial, security audits are optimized with WYSIWYR (What you see is what you run.)

RE: "What is the purpose of the INDEX row and the "W1|COLUMNHEADER" row? is it only to specify the number of rows? In which case it could be more concisely expressed as a single integer, i.e. |INDEX DATA|30| or a pair of integers i.e. |INDEX DATA|1|30|" and "a set of transformations would result in variants of the format, optimized for network transmission and processing (like compressing/removing the 'INDEX|DATA' and 'W1|COLUMNHEADER' rows, which I think have no other uses than as visual indicators)."

I wondered about that to:
https://xalgorithms.redminepro.net/issues/239#note-12

After some discussion with Don and Nhamo, we thought it best to leave these because our notional reference use case type is high-volume/high-frequency financial transactions, where every nano-second matters. At the upper edge of high-volume/high-frequency forex trading, for example, the most sophisticated trading systems can execute approximately 10,000 to 50,000 transactions per second per firm. And this is a highly regulated sector, so transactions need to be validated against many rules. DataKinetic in Kanata (whose founder is also co-founder of Xalgorithms) handles this in-memory on centralized mainframes: https://www.dkl.com/tablebase/ and https://dkl.com/wp-content/uploads/documents/whitepapers/DataKinetics-tableBASE-In-Memory-Whitepaper.pdf But DWDS is decentralized and distributed across heterogeneous platforms. We want to have RuleTaker (RT) be able to complete an RT [is.dwd] request, RR [ought1.dwd], RT [ought2.dwd], substrate app execution with the lowest feasible latency, and without loss of direct (WYSIWTR) log-inspection auditability. We use the column indices for data matching, and put a high priority on avoiding any transformation or routine that would slow down the process and increase the energy cost .

Some legacy published references on DataKinetics' methods:

Table Driven Design, DataKinetics (Wayne Cunneyworth ... now retired in B.C., but in hot pursuit of a geometrical method for prediction of prime numbers)
https://web.archive.org/web/20160610160908/http://www.dkl.com/wp-content/uploads/2016/05/DataKinetics-Table-Driven-Design.pdf

IBM Redbook: C/C++ Applications on z/OS and OS/390 UNIX
Chaper 8: "net.TABLES Application"
Describes the net.TABLES ported to OS/390 UNIX
https://www.redbooks.ibm.com/abstracts/sg245992.html

IBM Redbook: XPLink: OS/390 Extra Performance Linkage
Section 6.4.2: "net.TABLES from Data Kinetics Ltd."
Source: http://www.redbooks.ibm.com/redbooks/pdfs/sg245991.pdf

Actions #10

Updated by Charles Langlois 7 months ago

Joseph Potvin wrote in #note-9:

RE: "a file format versioning specifier in the header section"

That's in the metadata, e.g.

|rule_id|8840e3db-57ec-4ccb-ac67-b51e6cc01790|
|rulereserve_nodes|*|
|version_standard_url|https://semver.org/|
|ruledata_version|0.0.0|

Alright, make sense. I wasn't sure what this was refering to, and it's not in all examples. I assume the data format specification would require some metadata attributes, while others are optional.
The nomenclature for these metadata attributes interest me. Is there a specification of that part somewhere?
If it's still flexible, something like dwd_format_version would be more intuitive. I would expect to see dwd as a prefix in metadata and other nomenclatures more often, whereas "rule" is a generic/vague term for a concept that is used in this setting, but not unique to it. The use of consistent prefixes is something I think would generally help, especially when multiple parties may be involved in building up and providing the data and metadata.
Any metadata specified by the base format specification, could be prefixed consistently with dwd, while additional metadata introduced by third party providers/users of the system, would use their own prefixes to distinguish.

I'm not clear on whether that data format is already well specified and put in use, or a work in progress yet to be specified and implemented.
Is there a reference implementation that currently uses this format? I looked around the other gitlab projects but have only found code referring to other formats.

We use the column indices for data matching, and put a high priority on avoiding any transformation or routine that would slow down the process and increase the energy cost .

Okay, guess I'll have to see the code doing this, or a description of the algorithm, to understand.

Making choices for human auditability necessarily implies tradeoffs with machine performance and network transmission, so those choices are certainly key to the design.
As a programmer, my intuition is to design a format that is optimally simple and concise for machine processing and network utilisation, and then think about how to present it to humans in useful forms as necessary. If it's text and csv based, it's already a tradeoff towards human auditability(and interoperability) and away from raw performance.
Any sequence of characters in the file that presents information redundantly, such as being implied otherwise or repeated, is overhead that goes into performance cost, especially at high volume. Choosing what is valuable and necessary and what is not is the challenge.

Anyway that's an optimization suggestion, not a functional one.

Actions #11

Updated by Joseph Potvin 7 months ago

  • Assignee changed from Andrew Feng to Charles Langlois

Charles, thanks for these elements. Replies below...

RE: metadata... not in all examples...

Correct, because the logic gate / lookup table data structure is sometimes the sole focus. (I think going forward I will explicitly mention when the metadata has been snipped.)

RE: require some metadata attributes, while others are optional

Correct. For clarity we should append an asterisk to the mandatory fields within the DWD Rules-as-Data form. #350

RE: The nomenclature for these metadata attributes interest me. Is there a specification of that part somewhere?
If it's still flexible, something like dwd_format_version would be more intuitive. I would expect to see dwd as a prefix in metadata and other nomenclatures more often, whereas "rule" is a generic/vague term for a concept that is used in this setting, but not unique to it. The use of consistent prefixes is something I think would generally help, especially when multiple parties may be involved in building up and providing the data and metadata.

Agreed, and your assistance in tightening this would be appreciated. It seems time now to register the .dwd MIME type with IATA. I was not sure some years ago if .dwd would stick, but I think it will. #351

RE: Any metadata specified by the base format specification, could be prefixed consistently with dwd, while additional metadata introduced by third party providers/users of the system, would use their own prefixes to distinguish.

I would rather have the specification say that any additional metadata from 3rd-party MUST be prefixed with their preferred 3-letter code, but no prefix implies base DWD.

RE: I'm not clear on whether that data format is already well specified and put in use, or a work in progress yet to be specified and implemented.

The "specification" remains my doctoral dissertation (2023) -- now outdated in some details. Also with all the university academic stuff in there, it's too verbose. My intention has been to prepare it as a book with a bare bones "specification" section. Nevertheless I don't think the verbose explanations can be reduced very much, because there are many very specific but unusual design choices.

RE: Is there a reference implementation that currently uses this format? I looked around the other gitlab projects but have only found code referring to other formats.

Don and Huda have recently implemented .txt, probably because I've not yet registered .dwd. Seems I better get on it.

RE: "avoiding any transformation or routine that would slow down the process" ... Making choices for human auditability necessarily implies tradeoffs with machine performance and network transmission, so those choices are certainly key to the design.

That would usually be the case, however I think the only trade-off here is in some additional storage space and transmission volume. Various optimizations from the current data structure can be tested for both auditability and performance.

RE: As a programmer, my intuition is to design a format that is optimally simple and concise for machine processing and network utilisation, and then think about how to present it to humans in useful forms as necessary. If it's text and csv based, it's already a tradeoff towards human auditability (and interoperability) and away from raw performance.
Any sequence of characters in the file that presents information redundantly, such as being implied otherwise or repeated, is overhead that goes into performance cost, especially at high volume. Choosing what is valuable and necessary and what is not is the challenge.

Agreed. I'm semi-techical but my non-programmer / non-DBA inclination shows up. The cost of optimizing for the machine only is the tendency for eventual drift in two contexts: 1. the suits lose touch with what the system does; and 2. the geeks lose touch with the business use cases that the suits have in mind. The result eventually becomes optimal machine performance on sub-optimally understood business use cases. The DWD Rules-as-Data structure enables direct discussions btw suits & geeks.

Actions

Also available in: Atom PDF