Dig.csail.mit.edu



I have attempted to boil down most of my thoughts with respect to the design of AIR and the AIR reasoner in the following list. These thoughts range from simple musings of things which are subtle, but worth thinking about, to strong recommendations for future work. I have tried to highlight a short summary of each item in bold.

1. As currently designed, the AIR parser and reasoner makes extensive use of recursion. For example, when a particular AIR rule matches and asserts a statement, the process of adding that statement to the knowledge base involves adding that statement to the indices of that knowledge base. Such an addition invokes additional work with other Rete nodes and may effectively cause other assertions to happen. However, due to the structure of the reasoner’s calls, this is nested deeper in the call stack, so that call stacks can become quite deep as a result of a single successful (goal) pattern match.

A better way to handle this would be to properly separate Retes so that an addition of a triple is treated as an independent operation which might be done on a different thread (or at least in a separate calling context) so that call stacks will be less likely to overflow in complex rulesets.

2. This strikes at a more fundamental problem, which is that Python is not well-suited for multi-threaded processing in the first place, due to the Global Interpreter Lock. This lock effectively means that two threads cannot run at the same time (Python threads are effectively useful only for situations in which I/O is blocked, such as when reading an N3 file from a remote server).

Python does allow for multi-processing, but this then requires careful management of shared state using either shared Value and Array objects (requiring serialization and deserialization of the contents of shared alpha memories), or Manager objects (again requiring careful duplication of complex data structures beyond basic dicts, lists, and process management structures like Locks and Semaphores). In short, it could be done with Python's multiprocessing module, but it would require rearchitecting any objects/classes we anticipate to be shared across multiple simultaneously running threads (e.g. the shared data store). This is likely to be tricky and time-consuming given the fact that we would effectively need to sanitize and audit most of cwm, as the data store itself is what is being shared.

3. Another option worth considering, which may be a good idea for a number of reasons beyond better threading capabilities, is to re-implement the AIR reasoner in a different runtime/programming language. In addition to making use of better or alternate parallel processing paradigms available in other languages, implementing the reasoner in another language also gains access to alternative libraries for storing and representing RDF data and may improve readability and maintainability of code by moving to a statically-typed language (this is not to say that dynamically-typed languages are bad, so much as that the complexity of the current Python code is such that it has become exceedingly difficult to determine where a function is defined, due in part to the difficulty of determining what type a given variable contains; a better architecture of the code could eliminate the [current] need for static-typing).

Although I will mention the need for other RDF libraries later, I will mention several options for languages here:

a) Java – Relatively common-place among developers; would ensure ease in getting up to speed (due to lack of struggling with syntax). This of course comes at a cost, as a greater number of coders being familiar with Java also makes it harder to determine which coders are actually proficient in Java. (Sturgeon's Law holds regardless of how many people there are, and if anything, actually works against common languages, as more competent programmers will be equally proficient (or able to get up to speed) in another language easily)

b) Scala – Has the advantage of being a bit more flexible and functionally designed (i.e. like a functional programming language) than Java, and as such avoids some of the restrictions that Java forces. It has a number of linguistic features that may be more advantageous conceptually than Java, including support for lazy evaluation, pattern matching (in code, not as a Rete), anonymous functions, and, with appropriate libraries, concurrency implementations using the Actor model rather than Java's traditional concurrency model.

Scala also runs on top of the JVM, so interoperability with existing Java code (e.g. Jena) is possible. This of course comes at a cost, which is that limitations of the JVM (e.g. lack of tail-recursion, cost of Oracle as a provider) still apply. It also has the flip-side of the problem with Java; fewer coders will know it, and as such it may be harder for newer students to pick it up. (You could also argue that, as a newer, less-proven, language, that it might be a bit too “faddish” and therefore likely to have a higher downside of being stuck on a legacy language as newer languages come along to supplant it. That said, Scala is fairly obviously “the other big JVM language” so it will probably have better staying power than, say, Clojure, a Lisp designed to work with the JVM)

c) Erlang – An interesting choice for a heavily concurrent (functional) implementation, Erlang also suffers from the flip side of the “common language” argument, especially as it uses its own runtime rather than the Java runtime. This also means that interoperability with existing libraries is correspondingly more difficult, and it is less likely that suitable RDF libraries will exist. Its syntax is also not inherited from Java or other such languages, and may be even more difficult for new students to get up to speed with. Erlang is well-suited if a focus is for concurrency and fault-tolerance, however, as it has been used for deployed telephony systems for years (as well as one of the most popular Jabber server implementations, ejabberd).

d) C/C++ – An old stand-by. Would be useful for interacting with existing C APIs (Redland, etc.) but necessarily means that concurrency will need to be built on top of a traditional POSIX threading model (on the other hand, libdispatch can be used with Clang to support implicit thread pools). The modern c++11 standard and Boost libraries make C++ a viable language for programming (e.g. smart pointers, anonymous functions, tuple types, regular expressions), but it's still C++, which means that there are likely the same efficiency and “common language” costs of Java.

All of the above languages also have the pro/con that they are compiled (well, Erlang can still be interpreted), which incurs its own costs in the development process, but this is probably not a big issue for the sake of choosing a language.

4. We should decide which reasoner model we wish to work with. At the moment, work has been done simultaneously to improve two different reasoners based on vastly different knowledge representations: the current AIR reasoner based on RDF (with N3 extensions) and the propositional reasoner. We should consider carefully which one we wish to move forward with, as they both have pros and cons. Notably:

a) The AIR reasoner has been well tested and its mechanics are relatively well understood. Although it is not especially optimized, certain implementation problems are understood well enough to justify a clean rewrite that will meet current requirements of rules.

b) On the other hand, the AIR reasoner has certain semantic restrictions inherent to RDF. It is restricted to discussing knowledge in the form of triples, making it difficult to discuss assertions of related attributes jointly (each attribute is a separate statement) and complicating an understanding of what serves as an identifier (URI) and a resolvable location (URL). Furthermore, RDF forces the AIR reasoner to have an implicit assumption of negation as failure (i.e. a negative belief can only be gleaned from the lack of a statement, which is necessarily conflated with a lack of belief entirely), and there are no mechanisms to represent the duality between “knowing something” and “knowing not something” (acceptance and rejection in the propositional model). This adds complications in so far as it strongly encourages a model involving “closing the world” (which introduces complex ordering issues which may be non-intuitive to a rule-writer) as opposed to a cleaner constraint-based model.

c) In contrast, the propositional model clearly separates a belief that a proposition is true from the belief that a proposition is false (that is, [NOT x] is explicit and related to x. Furthermore, propositions use a localized conflict-resolution mechanism, which removes the need for a complex “closing the world” scheduling algorithm. There has also been sufficient work done to demonstrate that rules may be expressed in a propositional framework.

d) But the propositional reasoner has not been thoroughly tested or gone through peer-review as a concept. Flaws in the approach may not be obvious, and the relative novelty of the approach requires careful thought as to how it will (or will not) mesh with existing commitments we have made to the RDF data model with which it is not entirely compatible. It also is problematic in that, unless the reasoner is built on top of the existing MIT/Scheme propagator network library, work will also be needed to create and maintain a propagator network library in the targeted development language.

5. There has been a not insignificant amount of work done with respect to built-in functions in the AIR language. This poses two issues:

a) Most work has been done with respect to logical built-ins (i.e. log:includes, log:semantics, air:justifies) rather than mathematical and other built-in operations. Work will need to be done to establish what other built-ins are (or remain) relevant for practical rules (e.g. sums, dates, SPARQL, etc.)

b) Any re-implementation of the AIR reasoner will need to take care to enumerate and implement built-in functions, as these are required for backwards compatibility. This will necessarily complicate the task of building a reasoner as it requires properly sorting built-ins based on variable dependencies and writing the code to actually evaluate the built-in function at an appropriate time. Some proposed functions may even need access to manipulate the justification tree directly (e.g. remote reasoning, which must be able to parse a justification tree generated by a remote reasoner, and then “append” that justification as support for the output of the remote reasoner in the current justification tree.)

6. As mentioned above, thought should be given to the limitations of the AIR reasoner as implemented in Python. However, along with this, thought should also be given to what “backend” library should be used for storing and representing RDF (if RDF is not discarded entirely). While cwm is acceptable for Python, it is not readily maintained, and it would be far better to depend on an RDF backend which need only be patched to support N3 formulae and syntax (by, for example, bolting formulae on as a kind of named graph). In particular, a careful and intelligent choice of backend may allow for offloading support for RDFS and OWL entailment semantics to the backend (e.g. as Jena can support). This will restore these important semantics without the inefficient overhead of AIR's reasoning mechanisms. In short, performance of RDFS and OWL entailment would improve if they were delegated to special mechanisms capable of doing so. (Why should we maintain such entailment instead of delegating to a library someone else has developed?)

Delegating most data to an external library will also make it much easier to clear state, as the external library need only be reinitialized. The complexity of cwm has made it extremely difficult to ensure that all state is cleared, which has required the use of unpleasant workarounds (e.g. forking new processes) to run multiple, distinct trials.

7. AIR-based web services have thus far been derived from CGI-based scripts which construct (if need be) and pass rules and input log files to a call to policyrunner.py. This may not be the most efficient implementation, especially when deploying a practical solution, as much of the time spent in the reasoner is spent loading and parsing the policy into new data structures. Given that, in practice, log files change more than the rules (excepting those logs which are given as constant input, such as ontologies), it may be worth considering redesigning policyrunner from a “one process for each scenario trial” model to a model featuring persistent rules and (if need be) persistent partial logs (i.e. for ontologies). Naturally, how this would be done would differ if a propositional model were adopted.

Such a model would be beneficial, especially in practical deployments, in so far as ontologies and rules would only be parsed once. Thought would need to be given to ensure that new processes or threads (for each trial, that is) inherited only the “initial” state of the rules and ontologies (e.g. by duplication of the rules and ontology knowledge base, by making the ontology knowledge base a separate, read-only, knowledge base which may be “overlaid” on top of the existing store used to express new assertions, or by possibly making the Rete trees “reattachable” from one trial to another)

8. As has been discussed in the past, serious thought should be given to redesigning the AIR language. A number of features of the AIR language approximate logic and production rules well (e.g. the care taken to distinguish what air:else would mean for an @forAll variable binding), and there has yet to be a case where AIR cannot express a rule succinctly. That said, AIR is extremely verbose (air:assert [ air:statement {} ]) and obtuse (there is no air:or) in expressing some constructs, and is, generally speaking, not user-friendly in either its phrasing or its readability. New users are regularly fooled by deceptively easy (but incorrect) ways of phrasing rules in AIR. Thus, I propose a three-part process to redesign the AIR language.

1. Start a serious discussion about what are the most confusing and unintuitive “design patterns” of AIR rules. I do not expect that this conversation will happen overnight. It will likely take time to properly review what has gone before and identify places that are lacking in AIR's expressivity and syntax. However, some places are relatively obvious (such as those three items mentioned above.

2. Design a macro language for AIR. This language should ideally be compiled down to AIR, and a compiler will necessarily need to be written in the process of such design. If the language remains in the realm of N3, that is great, but we should not be afraid to break the N3 model if need be. The goal, at least in part, should be to make a usable language, and every effort should be made to ensure that that remains the number one priority.

3. Transition the AIR reasoner to use the macro language natively, if appropriate. It may be the case that terms or syntax of the macro language may actually have a more efficient computational model than that which it compiles to in AIR 2.0. If so, it may be worth considering using the macro language as the native language of the AIR reasoner to take advantage of such efficiency.

9. Previously, we have discussed the idea of exposing AIR reasoners as web services to provide for remote reasoning. We should study this closely and identify what is necessary in a remote reasoner with remote data. For example, will we need to (finally) introduce infrastructure to support zero-knowledge proofs? There may also be issues common to distributed systems that must be considered (e.g. reliability concerns, etc.)

10. On a broader level that doesn't quite have an implication on the design of the reasoner is the fact that the word “log” is heavily overloaded, and work must be done to determine exactly what we want in terms of logging versus justification. At the moment, the word “log” is used for three different concepts:

a) The log as input. This is the static (non-policy) data passed in as input to the reasoner. There's some good terminology hiding in policyrunner that I rather like which might fix this, namely a “fact formula”. The reason this is called a “log” however is because it is usually a “log” of transactions. But at the logical/reasoning level, this fact is irrelevant and distracts from other terms for log.

b) The log as output. This is the justification data provided by the reasoner. I really don't see a reason we can't just call it the “generated justification/explanation” instead of calling it a log.

c) A log of what happened during reasoning/statistics gathered during reasoning. Indistinguishable from “instrumentation” of the code. I don't see any reason this couldn't keep being called a log, but “instrumentation” is pretty unambiguous.

Of course, the latter two are related in a sense: the path of reasoning (the “instrumentation log”) necessarily explores the results until an appropriate justification has been built. We should think carefully about what exactly we want to log and what is important in the justification. I think we're already pretty decided on what these should be, but we should double check. We could certainly do with a better logging/instrumentation architecture though.

11. We should clearly establish what components of the reasoner are orthogonal to each other and improve modularity of the code. For example, the functionality of the web service depends on the reasoner and the justification renderer, true, but they really are separable components. The reasoner can run without the web service, the justification renderer is/was part of Tabulator proper, and the web service really only needs to know the APIs to invoke the reasoner and renderer appropriately. Similarly, reasoning makes use of RDF stores, but it again only depends on an API for it. In short, we should better establish modularity of the code and develop them with the least amount of cross-over between code bases as possible to improve maintainability of the code.

12. The PML-based justification language should be either actually used or dropped outright. At the moment, the PML language for expressing justifications is mere boilerplate added on top of the old-style justifications that Tabulator and the renderer depend on. It is unreasonable (to me) to have to maintain both given that only one of them is truly used (the old justification language). My gut feeling is that the readability of the PML language is a good thing, and we should really move to rework the justification renderer to parse the PML form rather than the old form.

13. How backwards compatible must the AIR reasoner be? At the moment, the AIR reasoner supports a number of out-dated constructs for the sake of backwards compatibility with older examples. This is untenable in the long-term if we wish for the AIR reasoner to be usable in practice. In the process of reconstructing the reasoner, it may be worthwhile to focus on the modern usages and construction of rules. Naturally, this will also require careful management of code to properly tag and branch the reasoner when changes are made.

At the moment, for example, we have two branches, master and refactor. The latter was an attempt to re-organize the directory structure, but has not been eliminated in part due to backwards-compatibility changes when trying to keep various repositories up to date (which depend on policyrunner and other files being in the root of the repository). Similarly, the rapid development of goal-direction in the reasoner resulted in a lack of tagged commits corresponding to the various versions of the reasoner used for testing. I would advise setting up a standard for committing code and version management in line with standard usage of Git (i.e. using named branches only for major maintenance versions, but otherwise tagging relevant commits so that they may be rolled back to for various demos). I would also advise merging refactor into master to eliminate the need to maintain those two branches separately, although care must be taken to make sure that older Git repositories that depend on the current master branch do not need to be kept up to date (because any such update will break the reasoner). This merge will likely be messy, but it might be easiest to:

1. Clone master.

2. Attempt to merge refactor into master. (This will fail)

3. Clone refactor.

4. Delete the contents of the master clone and replace with the contents of the refactor clone (NOTE: do not include the .git directory when deleting or replacing!)

5. Commit the “merge”, marking any remaining “conflicts” as resolved.

14. At the moment, we have a way to visualize the results of justifications (the legal renderer), but we do not currently have a way to visualize rules or logs. It is also possible that visualization of justifications may be improved. There is room for a project to investigate and develop effective visualizations of logs, rules, and justifications as a user experience research project.

15. AIR currently has few mechanisms, if any, to actually discover rules (for example, there are no inherent semantics which label rules as belonging to a particular statute). Such mechanisms would be incredibly useful for long-term maintenance and deployment of AIR reasoning in real-world scenarios. While the recently added import mechanism will assist in discovering rules and importing them for the purposes of policy reasoning, the mechanism is incredibly incomplete. While it is functional, the activation and importation of rules is not included in the justification as it very probably should be. This would require modifying the RuleImport class to make sure to support some sort of “Importation” action in the TMS by virtue of the rules that led to the importation. Likewise, any imported rules should probably be supported by that Importation action.

16. AIR needs unit tests. This is not just a matter of good coding practice to ensure that changes to the reasoner do not break existing policies, but it has actually become necessary due to the inherent complexity of the AIR reasoner, which has already resulted in breaking older policies during the attempts to add goal-direction. If a unit test mechanism was present and used, detection of broken functionality would be immediate (as long as the ethos of unit testing were embedded in the development process) rather than hidden from view, as happened multiple times during the development of goal-direction.

17. As has been hinted at above, the current architecture of the AIR reasoner is very poor. Assuming that the reasoner is not rebuilt in another language or to take advantage of another paradigm, it must at least be redesigned to feature better modularity. At the moment, the AIR reasoner features excessive cross-dependencies between modules, many of which are poorly documented, so that it is difficult, if not impossible, to decipher linkages between the modules. For example, the policyreasoner module necessarily depends on the rete module for construction of the pattern-match Rete trees. However, the rete module itself depends back not only on the policyrunner module, but on the TMS structure, in order to assert built-ins and make sure that they are properly supported in any justifications! Although this has been somewhat ameliorated by simply letting the “Rule” object in policyrunner pass a callback to be used to assert a triple for a built-in function, I am not convinced this is a particularly good architecture.

18. One idea that has been discussed on several occasions is the concept of “finding what's missing for a proof.” I do not propose here a mechanism by which that would be possible, but merely suggest that that would be a powerful mechanism which may be worth additional research, especially as an added feature of the AIR language or reasoner. This would likely require modifications of the AIR language however, and may require the addition of statistical measurement of behaviors so that “most probable” solutions would be explored first. Statistical measurement and analysis of previous queries would also be helpful to assist in optimizing the scheduler and exploration of known, well-tested rulesets (e.g. “9 times out of 10 we take rule branch x to reach a goal, so let's try branch x first”). Related research would be in the sphere of query optimization, so researchers with some experience with database systems may be ideally suited to this research.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download