Project

General

Profile

Actions

Design #222

open

Add ability to draw upon words from standard schemas when authoring sentences (& possibly to toggle to hide/display semantic tags)

Added by Joseph Potvin 7 months ago. Updated 7 months ago.

Status:
Feedback
Priority:
Normal
Assignee:
Category:
-
Start date:
07/27/2023
Due date:
% Done:

0%

Estimated time:

Description

This task follows from https://xalgorithms.redminepro.net/issues/104

Requirement

Standard semantic schemas should be easy to draw upon while using RuleMaker and RuleTaker.

Discussion

Most OASIS schemas now include a JSON representation. However overlapping inconsistencies among both complementary and competing standard semantic schemas are common, and can be expected to persist in varying degrees. So the RuleData representation, as well as the RuleMaker and RuleTaker functions, need a simple way for rule authors to employ terminology from any schema(s) they consider relevant.


Files

nibsunspsc-gsinunspsc.csv nibsunspsc-gsinunspsc.csv 1.47 MB Joseph Potvin, 07/27/2023 12:36 PM
Screenshot from 2023-07-27 06-44-14.png View Screenshot from 2023-07-27 06-44-14.png 76.9 KB Joseph Potvin, 07/27/2023 01:55 PM
Actions #1

Updated by Joseph Potvin 7 months ago

Following is an excerpt from the DWDS Specification:


The DWDS does, however, create a background incentive to use common schemas and lexicons, an approach which sidesteps the trend towards redundancy and inconsistency that has emerged among competing standard XML schemas (Sliwa & King, 2000). We have designed a practical incentive for semantic alignment to emerge through co-opetition (Brandenburger & Nalebuff, 1997), but that is left to emerge on its own, independently of the specification per se. The incentive is sufficient.

There is great value in the various domain-specific XML schemas that have been painstakingly structured and negotiated. But XML notation is optimized for the semantic Web where a browser has a small job to do in attaching semantics to displayed content. It is not optimal for high-volume, high-performance data processing. Even the 50-year-old NETL (NETwork Language) representation designed by Scott Fahlman to supply declarative real-world semantic knowledge in response to queries, would outperform XML by far in a distributed database (Fahlman, 1977) (Holland et al., 1986, p. 19). Fahlman’s original explanation is worth citing at length here, because the DWDS embodies a similar way of thinking:

“We forget about trying to avoid or minimize the deductive search, and simply do it, employing a rather extreme form of parallelism to get the job done quickly. By ‘quickly’ I mean that the search for most implicit properties and facts in this system will take only a few machine-cycles, and that the time required is essentially constant, regardless of how large the knowledge base might become. The representation of knowledge in this system is entirely declarative: the system's search procedures are very simple and they do not change as new knowledge is added. Of course, the knowledge base must contain descriptions of procedures for use by other parts of the system, including those parts that perform the more complex deductions, but this knowledge is not used by the knowledge base itself as it hunts for information and performs the simple deductions for which it is responsible. The parallelism is to be achieved by storing the knowledge in a semantic network built from very simple hardware devices: node units, representing the concepts and entities in the knowledge-base, and link units, representing statements of the relationships between various nodes. (Actually, the more complex statements are represented by structures built from several nodes and links, but. that need not concern us here.) ... The controller is not only able to specify, at every step of the propagation, exactly which types of links are to pass which markers in which directions; It is also able to use the presence of one type of marker at a link to enable or inhibit the passage of other markers. It is the precision of such a system that gives it its power, but only if we can learn to use it properly.” (Fahlman, 1977, p. 11)

The declarative non-canonical approach employed in RuleData arises from the need to process large sets of unstructured user-generated data, and this is similar to the requirements of search engines (Dean & Ghemawat, 2008b), and to the processing of natural language text (Plank, 2016). This is accomplished by constraining rule expression to a small set of metadata, and to a single syntactic structure for sentences that provide meaning to the logical relations within each rule, DWDS achieves operational simplicity.

RuleData is thus put forward as a generalized means of expressing each condition and assertion that occurs in legislation, policies, standards or agreements in a human-readable but also informatically-processable form. This can be embedded or automatically transcribed into any other programming language, making it platform-independent. Normative data expressed in RuleData must not replace or be inserted into legal documents, rather it belongs in a ‘schedule’ or some other type of attachment to a legal text. This loses nothing operationally, yet it remains subordinate to the natural language text endorsed by legislators or parties to the agreements. This way, when there is a bug to fix, it is not necessary to go back to the legislature or to the parties for re-negotiation – the original natural language text remains the legal reference.

Brandenburger, A., & Nalebuff, B. (1997). Co-Opetition (1 edition). Crown Business.

Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters (Originally published in 2004 as a technical white paper by Google Inc.). Communications of the ACM, 51(1), 107–113. https://doi.org/10.1145/1327452.1327492

Fahlman, S. E. (1977). A System for Representing and Using Real-World Knowledge. http://dspace.mit.edu/handle/1721.1/6888

Holland, J. H., Holyoak, K. J., Nisbett, R. E., & Thagard, P. R. (1986). Induction. Processes of inference, learning and discovery. MIT Press.

Plank, B. (2016). What to do about non-standard (or non-canonical) language in NLP. Unpublished research discussion paper available via Cornell University Library: arXiv digital archive. http://arxiv.org/abs/1608.07836

Sliwa, C., & King, J. (2000). B-to-B hard to spell with XML. Computerworld, 34(9), 1,97.

Actions #2

Updated by Joseph Potvin 7 months ago

  • Status changed from New to Feedback

Following is another relevant excerpt from the DWDS Specification:


Within each language there is typically more than one syntactic structural variant available for the same basic sentence. Syntactic variability and semantic dissonance within and among languages complicate natural language processing. While of course some semantic meaning can be lost based on particular words[1] and due to different syntactic sequencing, the general incentive of a conventional user of the RuleMaker application is to authentically communicate the accurate meaning in each language. People can make mistakes, but the difficulty of semantic alignment is greatly reduced in the DWDS RuleFiniteStateGrammar by working with only one sentence template that always contains the same six syntactic elements to build all sentences in all logic tables. Semantic variability can be handled more simply as communities of rule-maker agents and rule-taker agents have a shared interest to develop and choose standard schemas. Any application relying upon this structure can provide a way for users to access synonyms and multiple languages through [lookup.dwd] reference tables. [Emphasis added.]


[1] Three of our informal translators sought contextual clarification whether the word ‘box’ was meant as a physical container, or as a data entry field in an administrative form. Also certain words in this sample sentence are out of context in some languages. In the Namuy Wam language of Cauca (southwestern Colombia), there is no term for "a box" (in Spanish “una caja”). So if the intent is to refer to a moderately large package for cargo, a functional synonym in Namuy Wam is "un costal" (a large heavy-duty sack).

Actions #3

Updated by Joseph Potvin 7 months ago

RE: "Any application relying upon this structure can provide a way for users to access synonyms and multiple languages through [lookup.dwd] reference tables."

The "Lookup Table" capability recently added to RuleMaker (see the dev instance here: https://rulemaker3-dev.onrender.com/ ) provides a way for the user to import n-tuples in .csv format, or optionally, to manually create table of reference data within a standard DWDS RuleData package, and then to then publish it to RuleReserve on the distributed IPFS Internet.

I suggest to use THIS 7711-row .csv reference data table as our test example: https://open.canada.ca/data/en/dataset/588eab5b-7b16-4a26-b996-23b955965ffa

The .csv is attached below; it came from here: https://donnees-data.tpsgc-pwgsc.gc.ca/ba2/aev-bas/nibsunspsc-gsinunspsc.csv

This is a valuable test example because: (a) this table would actually be useful for DWDS use-cases in the domain of cross-border trade; (b) it is produced by a conventional institution for general use, without any tailoring particular to DWDS or Xalgorithms; and (c) at more than 7000 rows and 10 columns it provides quite a good sample for scalability in design and testing.

My initial import of the 'as is' .csv file worked well, other than that the column headers were offset by one column (a known issue that will be resolve once the first column header is accommodated above the 'outer row' labels.

Refined Requirement

A rule author using RuleMaker should be able to (a) refer to and select from one of the schemas in any of the composed sentences; and (b) set up any such sentence to also draw upon synonyms from other semantic schemas.

Actions #4

Updated by Joseph Potvin 7 months ago

Some issues arose while testing this sample .csv file. I've documented those in a separate issue: https://xalgorithms.redminepro.net/issues/224

Actions #5

Updated by Joseph Potvin 7 months ago

I think JSON-LD https://json-ld.org/ and https://www.w3.org/TR/json-ld/ solves the problem of how we should refer to other rules and lookup tables; and how DWDS can operate elegantly with terminology in DWDS rules from multiple competing semantic standards; and how to ensure we're NOT re-inventing the wheel.

For example, when a rule author using RuleMaker wants to use some particular semantic schema, in our metadata section our UI would let them drop in an IPFS CID or a URL to a field that will add the JSON-LD "@context": "http://name-of-semantic-schema.org/" I think ideally we'd want this to be the CID of a DWDS lookup table. Anyone can publish a Lookup Table as a JSON-LD @context reference. Like this:
@context:
"url":"https://ipfs.io/ipfs/Vd6gGaM2466NFLRpzknJzYDxtwXGkHuqK4Gz83YRrzNyNPx2HMHmsZ8hEGq"

When a RuleTaker client wants to restrict results to a particular semantic schema, then the "@context": "http://name-of-semantic-schema.org/" is contained in the "is.dwd" outgoing message.

There are several other useful JSON-LD signals: @id, @type, @language, @value that I recknon will be useful.

Actions

Also available in: Atom PDF