Project

General

Profile

Actions

Feature #221

open

Add a Free/Libre LLM Service to RuleMaker for Users to Optionally Pre-Parse Convoluted Source Rule Texts

Added by Joseph Potvin 9 months ago.

Status:
New
Priority:
Normal
Assignee:
Category:
Research
Start date:
07/26/2023
Due date:
% Done:

0%

Estimated time:

Description

Context:

A user of RuleMaker wants to structure into RuleMaker a rule or section of a rule.

Problem:

Often, the original text of a rule is so convoluted that a typical intelligent person would find it difficult and/or time-consuming to tease it apart into discrete declarative statements of the Input Conditions and Output Assertion statements that the RuleMaker user must begin with.

A Solution:

Add an auxiliary LLM text simplifier service to RuleMaker to enable the following sequence:

  1. A user copies the (extended unicode) text of a rule (or section of a rule) from any source, and pastes it into a field in RuleMaker (perhaps a frame called "Simplify this!").

  2. RuleMaker first prefaces the pasted text with a default "response engineered" instruction that instructs an LLM to simplify the pasted text in a way that will then be much easier for the user to shoehorn into DWDS RuleFiniteStateGrammar (RFSG). RFSG is discussed on pgs 155-159 of the DWDS specification. The default instruction to the LLM can be something like: "Re-write the following quoted text using only discrete declarative sentences in a style that conforms with the essential practices of 'RuleSpeak'. Start a new line for each sentence. Retain all semantic meaning, operational steps, and/or external references."

  3. The user of RuleMaker must be able to edit this default instruction, and then to save their customized or alternative instruction into a locally-stored lookup table. A retrieval process will enable the user to find and load from a list of default instructions.

  4. Upon "Submit" cicked by the user, RuleMaker runs the instruction with the pasted text through the LLM, and presents the result to the user in a refreshed frame (or equivalent). This result should be presented to the user in an editable text field so that they can directly modify that text.

  5. The 'Result" frame should be able to e dragged to any position on the screen and should be able to be minimimzed/maxmined with conventional [-] and [+] operations, so that the user can easily copy and paste segments from that result into a RuleMaker Logic Gate.

Method:

Initially I suggest to use MPT-30B-chat for this purpose. https://github.com/mayooear/private-chatbot-mpt30b-langchain
(See the README file at https://github.com/mayooear/private-chatbot-mpt30b-langchain#readme )

You can comparatively test various LLMs here: https://chat.lmsys.org/?model In particular, click on "Leaderboard" in the top navigator there to see that MPT-30B-chat is adjacent to the most capable LLM. Also try some examples with the "Side-by-Side" option from the navigator along the top.

Either we run it as part of a RuleMaker instance, or RuleMaker runs through a frame someone else's hosted instance of this LLP such as this one: https://huggingface.co/spaces/mosaicml/mpt-30b-chat

If we run it as part of RuleMaker, it could be trained on a database on RuleSpeak examples. But initially I reckon it's simpler for use to rely on the external hosted service.

Once RuleReserve contains a significant number of RuleData records that real people have created with RuleMaker, our own MPT-30B-chat instances can be trained on these.

No data to display

Actions

Also available in: Atom PDF