Friday, November 11, 2011

MJSON 1.1 Released

NOTE: The Library described here has grown and moved to Github with official website http://bolerio.github.io/mjson/It still remains faithful to the original goal, it's still a single Java source file, but the API has been polished and support for JSON Schema validation was included. And documentation is much more extensive.

I few months, we made an official release of a compact, minimal, concise JSON library that we called MJSON. See the JSON Library blog post for an introduction to this library. After some experience with it, some bug reports on that blog post, we have released an improved, fully backward compatible version 1.1. The original download & documentation links now point to the new version.
Pointers

Documentation: http://www.sharegov.org/mjson/doc
Download: http://www.sharegov.org/mjson/mjson.jar
Source code: http://www.sharegov.org/mjson/Json.java
Maven
The latest 1.1 release is available on Maven central:
<dependency>
    <groupId>org.sharegov</groupId>
    <artifactId>mjson</artifactId>
    <version>1.1</version>
</dependency>
Also, both 1.0 and 1.1 versions are available from our Maven repository at http://repo.sharegov.org/mvn. This is depracated and henforth we'll be publishing only on Maven central. Incude that repository in your POM or in a settings.xml profile like so:
<repositories>
  <repository>
  <releases>
  <enabled>true</enabled>
  </releases>
  <snapshots>
  <enabled>true</enabled>
  </snapshots>
  <id>sharegov</id>
  <url>http://repo.sharegov.org/mvn</url>
  </repository>
  </repositories>
Then include a dependency like so:
<dependency>
  <groupId>mjson</groupId>
  <artifactId>mjson</artifactId>
  <version>1.1</version>
  <scope>compile</scope>
</dependency>
List of Improvements

The following bugs were fixed in this new release:
  • The example from the Javadocs was missing a final 'up()' call.
  • The NumberJson implementation appropriately return true in its isNumber method now.
  • A parsing bug.
  • Some warnings were removed and explicitly disabled.
The following additional features were implemented:
  • Addition of  top-level methods: is(String, Object) for objects and is(int, Object) for arrays. Those methods return true if the given named property (or indexed element) is equal to the passed in Object as the second parameter. They return false if an object doesn't have the specified property or an index array is out of bounds. For example is(name, value) is equivalent to 'has(name) && at(name).equals(make(value))'.
  • Addition of a dup() method that will clone a given Json entity. This method will create a new object even for the immutable primitive Json types. Objects and arrays are cloned (i.e. duplicated) recursively.
  • Addition of a Factory interface that allows plugging of your own implementation of Json entities (both primitives and aggregates) as well as customized mapping of Java objects to Json. More on this below.
The Factory Interface - Customizing MJSON

The Factory interface, declared as an inner interface, within the scope of the Json class, looks like this:
public static interface Factory
{
Json nil();
Json bool(boolean value);
Json string(String value);
Json number(Number value);
Json object();
Json array();
Json make(Object anything);
}

You can implement this interface if you need to customize how the Json types are actually represented. For instance, objects are represented using standard Java HashMap. But you may want to has a different representation, say a LinkedHashMap or a more efficient variant, optimizing for strings (say, a Trie based map of some sorts). Or you may want strings to be case-insensitive in which case you'd have a Json derived class representing strings, but whose equals method will actually do equalsIgnoreCase etc. 

The make allows you to customize how arbitrary Java objects are translated into Json. It should be easy for example to implement a make method that handles Java beans with introspection, which is something that we don't want by default as part of the API.

The methods in that interface are used internally every time a new Json instance has to be constructed, either from a known type, as a default empty structure or from an arbitrary Java type. Thus you can pretty much customize the representation any way you like while relying on the same simple API.

The default implementation of this factory is public: Json.DefaultFactory. Therefore you can extend that default implementation and only customize certain aspects of the Json representation. 

That's it for this release. Enjoy!

Cheers,
Boris

Tuesday, August 23, 2011

From Rules to Workflows

Some business object types have workflows associated with them. A business object type is identified with an OWL class.  A workflow defines the process through which a business object of that type goes from its creation up to its deletion (or archiving). This can be an enterprise-wide business process involving many other systems and human actors, or a simple one-time interaction with an end user.
In our framework, a business object is always modeled as an OWL individual and it's state is always modeled as OWL properties. Furthermore, an object has a lifetime with a beginning and an end. The process governing that lifetime is specified through a set of SWRL rules. The SWRL rules are dynamically converted into a business process that gets executed on the business object. This blog entry outlines the algorithm that translates sets of rules into a process workflow. Details will be given in future blog entries.
Assumptions
Here are what assumptions made about the set of rules defining a business object workflow:
  1. The rules are defined in a single ontology with an IRI that follows the naming pattern http://www.miamidade.gov/swrl/<OWL Class> where <OWL Class> is the OWL classname of the business object.
  2. At least one rule must have a goal atom in its head. Goal atoms are specific to the business object type.
  3. Unless specifically stated, all variables are local to the rules in which they appear.
  4. A global variable named "bo" is reserved and refers to the business object on which the set of rules operates.  
Sketch
If no rule has a goal atom in its conclusion, then no workflow is created at all.
  1. First rules are evaluated iteratively against the current state of the business object. When evaluating a rule, the atoms in its body (i.e. its premises) are evaluated and if they are all true, then the atoms in its head are asserted in the current BO ontology. If any of those newly asserted atoms is actually and end-goal, then no workflow is constructed because there's nothing to do. If at least one of the atoms in a rule's body is known to be false, then the rule is subsequently ignored. If there are unknown atoms in the rule's body (but no false ones), the whole rule is deemed "unknown" and will be included in the construction of the workflow. When there are no new atom assertions from any rule during the current iteration, the evaluation process is complete. 
  2. At the end of the evaluation process, we are left with a set of "unresolved" rules, that is rules where we have unknown atoms in their bodies. Each rule is annotated with extra information and wrapped in a class called AppliedRule that contains values of instantiated variables and dependencies between the atoms within the rule. Such dependencies are infered through some heuristics and assumptions described below.
  3. At this stage, we have several unresolved rules remaining some of which contain a goal atom in their head (if none of them does, then the workflow is empty). Starting from those rules containing goal atoms in their head (i.e. their conclusion) and going in a backward-chaining fashion, we enumerate all possible logical paths to satisfy the rules and reach their conclusion. Each unknown atom X of the currently examined rule body is added to the deduction path and then recursively unknown atoms from rules where X appears in the conclusion. During the enumeration process each such atom is converted into a WorkflowPathElement instance, which is a helper class that holds an atom and how it depends on other atoms in the eventually constructed workflow. So a dependency graph between the unknown SWRL atoms is created, where the edges are SWRL variables that get propagated from atom to atom. A logical dependency b/w an atom in the rule's body and an atom in that rule's head is represented by a predefined variable implies SWRL variable. This dependency graph is important in that it defines which task in the workflow must be evaluated before which other task.
  4. The next stage converts each logical path found in the previous stage into a sequence of workflow tasks to be executed to reach the goal. The sequence is constructed starting from WorkflowPathElements that don't have dependencies and then adding their dependencies recursively as subsequent tasks to be executed. In addition to that, tasks to assert all possible conclusions are added as soon as possible. That is, every time a task is added to the current sequence, a search is made to find all rules whose premises would be satisfied were all tasks up to this point to succeed, then the conclusions of those rules are added as AssertAtomTasks. In this way, every logical path deducing an end goal is converted into a sequence of steps where no step downstream in the sequence depends on a step upstream (for the value of a variable or logically) and where all possible conclusions from unresolved rules are asserted as soon as the premises of those rules become true.
  5. At this point, we have a bunch of linear sequence of tasks. Each of those sequence can be executed step by step to reach an end goal for the business object, but each intermediary step has the potential of failing. When a step fails, we want to branch to another sequence that may succeed, but we don't want to repeat the execution of steps that have already succeeded and we don't want to branch to a sequence that we know will fail. The construction of the workflow from this set of sequence is based on them being ordered appropriately. Each task in a sequence is assigned a cost and independent tasks within a sequence are thus naturally ordered by their cost. Dependent tasks are assigned costs as the sum of their dependencies so at the end the order of all tasks in a workflow path is determined entirely by this cost number. The task paths themselves are ordered by cost where the cost of a task path is simply the sum of the costs of all its tasks.
  6. Given this setup, the workflow is constructed as follows: start with the least costly sequence. Its first task is the starting point othe workflow. Each task, except atom assertion tasks, results in a true/false result. So add a boolean branching node after each task that branches to the next task in the current path on "true" and to the "first possible task" in subsequent paths on "false". This "first possible task" is obtained by examining each of the more costly paths in turn, and looking for a task not already on the current path and such that all of its preceeding tasks are on the current path. 
The workflow thus constructed looks like a decision tree where branching is done based on the truth of OWL axioms and whenever the truth of an OWL axiom is not known, there's a node to "find out" - prompt the user, call some software etc. In a subsequent blog I will give more details about how variable dependencies are managed, how costs are assigned to tasks so that goal paths can be ordered, and what assumptions and heuristics are being used.
This is work in progress. To handle more real-life workflow scenarios, our next steps is to represent asynchronous processes and events. The simplest strategy would be to put the whole workflow in a state of limbo, whenever there's unknown information and progress on the workflow cannot be made. When in such state, the operations service accepts changes to the BO ontology and resume the workflow with the newly found information. In general, it is possible to execute the workflow right from the beginning any time we want because tasks are performed only for missing information and will be skipped on a second execution if the information is already there (e.g. OWL properties have been stated etc.). So for example when a business object is "edited" with some random properties changing, the workflow can be replayed from the beginning instead of trying to figure out what decision to backtrack and what to keep etc.
Boris

Wednesday, August 10, 2011

Externalizing Data Property Values

Externalize data property values into it's own table. It will improve performance by two means. The first, we will avoid duplicates literal values reducing the amount of records. The second is by having typed representations of the data i.e., VALUE_AS_DATE, VALUE_AS_NUMBER we can query on by type without having to do an explicit cast. The initial table should look like so:

create table CIRM_OWL_DATA_VALUE
(
ID number(19,0) not null, -- the sequence, the VALUE column in CIRM_OWL_DATA_PROPERTY will now be a fk constraint on this column.
VALUE_HASH varchar2(28) , -- lookup column.
VALUE_AS_VARCHAR varchar2(4000 char), --a string representation of the value
VALUE_AS_CLOB clob , -- storage column for large string values
VALUE_AS_DATE timestamp, -- a typed representation of the value as date
VALUE_AS_DOUBLE double precision, -- a typed representation of the value as double
VALUE_AS_INTEGER number(19,0), -- a typed representation of the value as an integer.
primary key (ID)
)


The RelationalStore class will need to be modified accordingly to reflect the changes in schema. We would need to check if an existing VALUE exists, this can be done by first checking the VALUE_HASH column. Use of the java hashCode() is not recommended as table records can get quite large and hash collision is a concern. Perhaps a better approach would be to utilize a cryptographic hash function ie, MD5, SHA-1 to hash the string value and store it's Base64 encoding in the column. SHA-1 hashing should suffice. A base64 encode of a SHA-1 digest would result in a 28 byte hash hence the 28 byte column length.

Also, the VALUE_HASH column could be included in the DATA_PROPERTY TABLE along with the id for rapid equivalence queries (avoid a join).

Tuesday, August 9, 2011

Querying the Operations Database

Triple store query translate results in a query that mirrors what is described in this document:

http://www.w3.org/2002/05/24-RDF-SQL/

The noMappingTranslate works with equivalence based queries. Here is an example of a query with a ‘nested’ type:

{

"sortBy":"hasDateLastModified",

"hasServiceRequestStatus":"*",

"atAddress":{

"Street_Name":"1ST",

"type":"Street_Address"

},

"currentPage":1,

"sortDirection":"desc",

"type":"Garbage_Missed_Complaint",

"hasDateCreated":"06/14/2011",

"itemsPerPage":20

}

The query translator translates this to:

SELECT CIRM_CLASSIFICATION.SUBJECT

FROM CIRM_CLASSIFICATION

JOIN CIRM_OWL_OBJECT_PROPERTY ON CIRM_OWL_OBJECT_PROPERTY.SUBJECT = CIRM_CLASSIFICATION.SUBJECT

JOIN CIRM_OWL_DATA_PROPERTY ON CIRM_OWL_DATA_PROPERTY.SUBJECT = CIRM_CLASSIFICATION.SUBJECT

WHERE (CIRM_CLASSIFICATION.OWLCLASS = ?)

OR (CIRM_OWL_OBJECT_PROPERTY.PREDICATE = ?)

OR (CIRM_OWL_OBJECT_PROPERTY.PREDICATE = ? AND CIRM_OWL_OBJECT_PROPERTY.OBJECT IN ( SELECT CIRM_CLASSIFICATION.SUBJECT

FROM CIRM_CLASSIFICATION

JOIN CIRM_OWL_DATA_PROPERTY ON CIRM_OWL_DATA_PROPERTY.SUBJECT = CIRM_CLASSIFICATION.SUBJECT

WHERE (CIRM_CLASSIFICATION.OWLCLASS = ?)

OR (CIRM_OWL_DATA_PROPERTY.PREDICATE = ? AND TO_CHAR(CIRM_OWL_DATA_PROPERTY.VALUE) = '1ST') ) )

OR (CIRM_OWL_DATA_PROPERTY.PREDICATE = ? AND TO_CHAR(CIRM_OWL_DATA_PROPERTY.VALUE) = '06/14/2011')

Parameters:

1 = 181()

2 = 191()

3 = 194()

4 = 182()

5 = 184()

6 = 185()

Here is a list of functions and operators that the query translator supports:

Function/Operator Translation Sample JSON property expression

greaterThan SQL GREATER THAN "hasDateCreated":"greaterThan(\"2011-06-15T19:18:06.552Z\")"

lessThan SQL LESS THAN "hasDateCreated":"lessThan(\"2011-06-15T19:18:06.552Z\")"

like SQL LIKE With '%' post append "hasName": "like(\"Zues\")"

between SQL BETWEEN "hasDateCreated":"between(\"2011-06-15T19:18:06.552Z\",\"2011-06-15T19:18:06.552Z\")"

*contains

*in

*startsWith

*notLike

= SQL EQUALS "hasName": "= \"Zues\""

>= SQL GREATER THAN OR EQUAL "hasCount": ">= 1"

<= SQL LESS THAN OR EQUAL "hasCount": "<= 1"

> SQL GREATER THAN "hasCount": "> 1"

< SQL GREATER THAN "hasCount": "< 1"

Note: Literal Values are translated to an equals operation on the SQL side.

* Incomplete. Expression parsing is there but translation is incomplete.

Sunday, June 5, 2011

JSON Library

NOTE: The Library described here has grown and moved to Github with official website http://bolerio.github.io/mjson/. It still remains faithful to the original goal, it's still a single Java source file, but the API has been polished and support for JSON Schema validation was included. And documentation is much more extensive.

JSON (JavaScript Object Notation) is a lightweight data-interchange format. You knew that already. If not, continue reading on http://www.json.org.
It's supposed to be about simplicity and clarity. Something minimal, intuitive, direct. Yet, I couldn't find a Java library to work with it in this way. The GSON project is pretty solid and comprehensive, but while working with REST services and coding some JavaScript with JSON in between, I got frustrated of having to be so verbose on the server-side while on the client-side manipulating those JSON structures is so easy. Yes, JSON is naturally embedded in JavaScript, so syntactically it could never be as easy in a Java context, but it just didn't make sense all that strong typing of every JSON element when the structures are dynamic and untyped to being with. It seemed like suffering the verbosity of strong typing without getting any of the benefits. Especially since we don't map JSON to Java or anything of the sort. Our use of JSON is pure and simple: structured data that both client and server can work with.
After a lot of hesitation and looking over all Java/JSON I could find (well, mostly I examined all the libraries listed on json.org), I wrote yet another Java JSON library. Because it's rather independent from the rest of the project, I separated it. And because it has a chance of meeting other programmers' tastes, I decided to publish it. First, here are the links:
Documentation: http://www.sharegov.org/mjson/doc
Download: http://www.sharegov.org/mjson/mjson.jar
Source code: http://www.sharegov.org/mjson/Json.java
The library is called mjson for "minimal JSON". The source code is a single Java file (also included in the jar). Some of it was ripped off from other projects and credit and licensing notices are included in the appropriate places. The license is Apache 2.0.
The goal of this library is to offer a simple API to work with JSON structures, directly, minimizing the burdens of Java's static typing and minimizing the programmer's typing (pun intended).
To do that, we emulate dynamic typing by unifying all the different JSON entities into a single type called Json. Different kinds of Json entities (primitives, arrays, objects, null) are implemented as sub-classes (privately nested) of Json, but they all share the exact same set of declared operations and to the outside world, there's only one type. Most mutating operations return this which allows for a method chaining. Constructing the correct concrete entities is done by factory methods, one of them called make which is a "do it all" constructor that takes any Java object and converts it into a Json. Warning: only primitives, arrays, collections and maps are supported here. As I said, we are dealing with pure JSON, we are not handling Java bean mappings and the likes. Such functionality could be added, of course, but....given enough demand.
As a result of this strategy, coding involves no type casts, much fewer intermediary variables, much simpler navigation through a JSON structure, no new operator every time you want to add an element to a structure, no dealing with a multitude of concrete types. Overall, it makes life easier in the current era of JSON-based REST services, when implemented in Java that is.
In a sense, we are flipping the argument from the blog Dynamic Languages Are Static Languages and making use of the universal type idea in a static language. Java already has a universal type called Object, but it doesn't have many useful operations. Because the number of possible JSON concrete types is small and well-defined, taking the union of all their interfaces works well here. Whenever an operation doesn't make sense, it will throw an UnsupportedOperationException. But this is fine. We are dynamic, we can guarantee we are calling the right operation for the right concrete type. Otherwise, the tests would fail!
Here's a quick example:
import mjson.Json;

Json x = Json.object().set("name", "mjson")
                      .set("version", "1.0")
                      .set("cost", 0.0)
                      .set("alias", Json.array("json", "minimal json"));
x.at("name").asString(); // return mjson as a Java String
x.at("alias").at(1); // returns "minimal json" as a Json instance
x.at("alias").up().at("cost").asDouble(); // returns 0.0

String s = x.toString(); // get string representation

x.equals(Json.read(s)); // parse back and compare => true
For more, read the documentation at the link above. No point in repeating it here.
This is version 1.0 and suggestions for further enhancements are welcome. Besides some simple nice-to-haves, such as pretty printing or the ability to stream to an OutputStream, Java bean mappings might turn out to be a necessity for some use cases. Also, jQuery style selectors and a richer set of manipulation operations. Closures in JDK 7 would certainly open interesting API possibilities. For now, we are keeping it simple. The main use case is if you don't have a Java object model for the structured data you want to work with, you don't want such a model, or you don't want it to be mapped exactly and faithfully as a JSON structure.
Cheers,
Boris

Tuesday, March 15, 2011

Templating

User interfaces are made out of static content, interactive components and dynamic data display as HTML. All those elements are tied together in an HTML document (or a fragment). And we use templates to define those HTML fragments. Templates are stored in the ontology and associated with ontology individuals or classes. In addition template associations are contextualized depending on the kind of display needed (e.g. standalone, as part of a list, full/short etc.). Thus when performing a look for a template to display a particular object, additional contextual parameters determine the actual template to be used. For instance a template displaying an object as standalone, using the whole space available on a page, will likely display all of its properties, while a display as an element of a list will use only most relevant/distinguising properties. 

We are using an officially supported jQuery plugin:

http://api.jquery.com/jquery.tmpl/

It has some limitations. Like nearly all JavaScript templating engines, it starts with some sort of philosophy of how things should be done. But it has enough flexibility by allowing arbitrary function calls and logic inside templates, while dealing with simple cases the same way others do.

Because often the data object needed for a template is an aggregate of several different pieces, sometimes metadata, sometime operational data, often a combination of both, we needed an easy mechanism to assemble it and only then call the template. The main problem is that the template engine can't deal with data obtained asynchronously - it would basically need to delay evaluation and the call to apply the template will need to be asynchronous itself. But there's no support in the jquery.tmpl API for that. So we implemented such a mechanism of delayed evaluation ourselves. One can construct a JavaScript object where each value is actually an instance of AsyncCall:

var obj = { x : new AsyncCall({url:"server:port/etc", async:true, etc...}, function(value) { alert('got x!'); return value; }}

An AsyncCall instance encapsulates the AJAX parameters (mandatory) to use with $.ajax and a callback function (optional) invoked when that particular object property was received at the client. Note that the callback function actually receives the value returned by the server as a parameter and it is expected to return the final value assigned to 'x'. That is, the callback function can "massage" that returned value or return something else completely. 

Given a JavaScript object with AsyncCall values, one can synchronize on the event that they all have been obtained like this:

onObjectReady(obj, function (obj) { alert('all values of ' + obj + ' were received'); };

Below is a complete example of a preliminary version of ServiceDirect-like re-implementation with this framework. The metaService.delay(...) is just a convenience method to construct an AsyncCall for a particular REST service.


load : function (app, parent) {
this.parentElement = parent;
var self = this;
var components = {
InquiryTypesComponent: undefined,
InquiryTableComponent: undefined,
InquiryTable: metaService.delay("/individuals/InquiryTable", function (obj) {
this.InquiryTableComponent = uiEngine.getRenderer(obj.type)(obj);
return "<div id='InquiryTableTopDiv'>";
}),
InquiryTypes: metaService.delay("/classes/sub?parentClass=Inquiry", function (obj) {
self.inquiries = obj.classes;
this.InquiryTypesComponent = self.makeInquiriesDropDown();
return "<div id='InquiryTypesTopDiv'>";
})
};
onObjectReady(components, function() {
if (app.hasTemplate) {
var div = document.createElement("div");
$.tmpl(app.hasTemplate.hasContents, components).appendTo(div);
self.setui(div);
$("#InquiryTypesTopDiv").append(components.InquiryTypesComponent);
$("#InquiryTableTopDiv").append(components.InquiryTableComponent);
}
else
throw new Error('Missing application top level template');
});
}

And here is a portion of the template that is being displayed:

<table width="100%" height="100%" border="0" cellspacing="0" cellpadding="0">
<tbody><tr height="100%">
<td class="contentGrey" dir="ltr">
<table>
<tbody><tr><td>
<table border="0"><tbody><tr><td><span style="width: 1000px">To request a service or report a problem start by selecting the desired service type from the pull-down list. </span></td></tr>
<tr><td><span style="width: 1000px">NOTE: If you do not see the service you would like to request included in the pull-down
list, please contact us by dialing 3-1-1. Call Specialists will be glad to assist you.
If you live outside of Miami-Dade County, please call 305-468-5900.</span></td></tr>
</tbody></table>
{{html InquiryTypes}}
</td></tr>
<tr><td>
</td></tr>
</tbody></table>
</td>
</tr>
</tbody></table>
{{html InquiryTable}}



Note that we use the {{html }} templating tag so the template engine doesn't escape HTML tags from the insert text. Note also that the two object properties used in the template, InquiryTypes and InquiryTable both have a callback associated to convert the data received from the server into HTML text. The callbacks work in tandem with the main (top-level) callback: they construct DOM trees for UI to be inserted and just return 'div' placeholders where those DOM trees are inserted by the top-level callback. This avoids serializing a DOM tree into HTML text and then parsing it back. In the case of InquiryTypes, the data is a list of service/inquiry types and we dynamically create a dropdown box from it. In the case of InquiryTable, the data is an actual UI component described in the ontology and rendered by the client-side UIEngine.

Some more work is perhaps needed to abstract further and simplify this example. Clearly many templates won't need such acrobatics. For instance, templates that simply display data returned by a database query would be rendered in a much simpler way. But here we have a full dynamic interface with complete UI components streamed as metadata from the ontology. So some extra gluing work at the client-side is necessary.

Friday, March 11, 2011

Querying the Operations Database

Querying for lists of individuals matching a set of criteria is a fundamental functionality of the OperationService. Because the arbitrariness of kinds of entities and of their complexity, querying should be fairly generic, yet intuitive and simple enough to perform from client-side JavaScript. 

The /op/list URL is going to be used to query for a list of entities. The criteria are specified as a query parameter named 'q'. The result is a JSON array of individuals. The query itself is too a JSON object that serves both to specify the selection criteria and the content of the result. The query object specifies a pattern to be matched and completed for each entity found at the back-end. It is inspired the MQL query language developed by MetaWeb for the Freebase semantic database. For more on MQL itself, consult the MQL Manual

In our implementations of those ideas, we won't follow MQL exactly because our meta model is OWL, which is different from their meta model. We will implement various features as the need arises. 

This approach to querying is generally known as query-by-example, or pattern-matching. An alternative would be to specify a logical query expression, with logical operators and or not. Constructing query expressions is usually harder than providing a pattern. However, even with the pattern-matching approach we may need some logical operators. When matching a list of values of the same property for example, we may need to specify whether we want any (an OR operation) or all (an AND operation) of those values to be matched. For a functional property, the interpretation can only be any, but when a property can actually have multiple values, as is common in OWL, a choice has to be made. Perhaps the "ignored prefix idea" from MQL (see example below) can be adapted with meaningful property prefixes, e.g. { "either:prop" : value1, "either:prop" : value2 }.

More on this to come as we make progress....

Some Examples

Here is how to obtain all inquiries (service requests) submitted by a particular user and that are still open:


query = { "type" : "Inquiry",
"submittedBy" : "user123",
"1:hasStatus" : "ServiceRequestInProgress", "2:hasStatus" : "ServiceRequestStarted",
"submittedOn" : null,
"lastUpdatedOn" : null,
"title": null
}

queryResult = opService.get("/list", {q : query});

$(queryResult).each(function (x) {
document.write(x.title + ", " + x.hasStatus + ", " + x.lastUpdatedOn);
});

The code above will obtain all inquiries (note that the type criterion also matches subclasses of Inquiry) submitted by user123 with one of the listed statuses and will return a JavaScript array where each entity has the properties listed in the query. When a property is listed with a value null, that simply means that we want that property returned as part of the query results.  Note that the hasStatus property appears twice, with different prefixes 1: and 2:. Those prefixes are ignored, they are simply a device to list the same property name more than once. 

Monday, February 7, 2011

Ontology Web Services

Added to the ontology semantics which describe soap style web services.

OWL Classes consist of the following:

  1. http://www.miamidade.gov/ontology#WebService - Top level class which describes the service. A service has zero or more inputs, and outputs.
  2. http://www.miamidade.gov/ontology#WebParameter - Web Service inputs and outputs must be of this type.
  3. http://www.miamidade.gov/ontology#WebArgumentMapping - A class that allows for the decoupling of WebService parameters and the arguments that are to be used as values for/of parameters.
  4. http://www.miamidade.gov/ontology#WebArgument - An argument to be used in a WebArgumentMapping.

Describing a WebService in the ontology:

Step 1: Create an individual of type http://www.miamidade.gov/ontology#WebService. Specify the following data properties: Wsdl (URL of the wsdl document), Endpoint(The url at which to invoke the service), PortName(Taken from the wsdl document), SOAPAction(Taken from wsdl document), Namespace (Taken from wsdl document), ServiceName (Taken from wsdl document).

Step 2: Create all inputs and output parameters as individuals. WebParameters have the following data properties: ParameterName(arbitrary name for the parameter), XPathExpression (An xpath expression pointing to an xml node containing the data for this parameter. Must be prefixed or use default prefix of :), DataType(The data type for this parameter).

Step 3: Add all inputs and outputs to the WebService created in step 1. Inputs and Outputs are added as object properties hasInput(WebParameter) and hasOutput(WebParameter).

Step 4: Create all WebArgument(s) as individuals. There more than likely will be a 1-1 match of WebArgument(s) to WebParameter(s). A WebArgument has the following data properties: ArgumentType ("input" or "output"), ArgumentIndex (Index at which to expect the argument). A WebArgument also has the following object property: parameter (WebParameter) the web parameter to which this argument references.

Step 5: Create a WebArgumentMapping individual. Add all WebArgument(s) created in Step 4 as object properties using the 'hasArgument' property. Add one final object property 'forWebService' and select the individual service created in Step 1.

 

Java Execution Class and usage

import org.sharegov.cirm.services.OWLWebServiceCall;

...

public void test()

{

OWLIndividual svc = individual("GeoAddressToXYService");

OWLWebServiceCall call = new  OWLWebServiceCall(svc);

OWLDataFactory df = MetaService.get().getDataFactory();

OWLLiteral[] args = new OWLLiteral[]{''111 NW 1 ST"};

System.out.println(call.execute(args));

}

...

Rest Execution Endpoint

http://localhost:8182/services/all - All Web services described in ontology

Output(application/json): ["http://www.miamidade.gov/ontology#GarbageTruckRouteCheckService","http://www.miamidade.gov/ontology#GeoAddressToXYService"]

http://localhost:8182/services/GeoAddressToXYService - Display service details.

Output(application/json): Lengthy...

http://localhost:8182/services/GeoAddressToXYService/description - Displays argument mappings.

Output(application/json): Lengthy...

http://localhost:8182/services/GeoAddressToXYService/call?argument=111%20NW%201%20ST - Executes a call to this web service.

Output(application/json):["111 NW 1 ST","920433.62502828","525114.81238802"]

 

As Pellet BuiltIn

While attempting to create a Pellet Builtin to invoke webservices from within rules as function calls we faced the following issue: If using a reasoner from within the Custom Function to reason over the ontology that contained the Custom Function call a circular loop is created in which the reasoner will continuously call the custom function. Thus, the Builtin is incomplete. 

 

GeoAddressToXYService

This is a service provided by ETSD/GIS. It takes an address as an input and outputs X,Y. It was modeled to proof pre-existing web services.

 

 

 

Thursday, January 20, 2011

A no-risk proposition

Open source used to evoke the scrunching of executive eyebrows and a patronizing -"it's still young and risky" comment. Things have surely changed, and the open source "movement" has proven its maturity and is fulfilling its potential.

So why are gov't executives still rolling their eyes and tensing up at the mere mention of "open source." Government is in so many cases the proverbial "late adopter". Well, better late than never. The government development and open source fit is a natural one. The transparency promise (a promise of virtually every government administration) presupposes that software developed in government IT shops is a community resource ... like any other government asset - it is public property.

This is the bottom line of what we are trying to achieve here, with our Sharegov initiative - to align the themes firmly embedded in the philosophy of public service (openness, low cost to the public, objectivity, public participation), with our software development practice. Much overdue. And substantially less risky than permanently committing to some expensive software product only because a charming sales engineer from a big company swept us off our feet during a powerpoint presentation.

We dare to suggest a little formula that will help government developers start sharing more and form a dynamic, energetic community: a little bit of extra, focused effort a couple of times a week! Just like exercise - start with regularity, and maintain discipline - working out will pay off in overall health, stamina, nimbleness... in our case, for our government organizations, for our e-government applications and projects We realize no one will give us the extra time and money to dedicate to kick-off a gov't app development "co-op", so we will give our own time and effort to get things going. We encourage you to do so too - by commenting on our blogs, taking a look at the applications we're making available, by sharing your issues, projects and your own applications. Please don't let inertia and organizational fatigue take over. We're fighting it every day ourselves.

OWL to RDBMs

Started looking at the problem of storing OWL instances to RDBMs:
  1. If an OWL instance is to be persisted in a relational database, everything about it should be persisted there. It should be completely recoverable from the RDBMs.
  2. It would be impossible to create a classic ER schema to manage OWL instances since a lot of properties can be very dynamic. So there should be a generic schema to store any OWL individual.
  3. On the other hand, we'd like to use some of the SQL capabilities for querying large data sets, data warehousing and business analytics, so an actual RDBMs schema would be useful in practice.
  4. Hence, an OWL individual is potentially stored in two portions: a generic "set of assertions" portion that is disjoint from a "fixed entity with attributes SQL table" portion.
Some resources I found that have to do with the problem of OWL <-> RDBMS:
RELATIONAL DATABASE SCHEMA TO ONTOLOGY MAPPING APPROACHES:http://www.unbsj.ca/sase/csas/data/awoss2-2010/AWOSS_Yassaman%20Zand-Moghaddam.pdf
Relation.OWL extracts OWL from DB schemas: http://sourceforge.net/projects/relational-owl/
Mapping between Relational Databases and OwL: http://www.lu.lv/materiali/apgads/raksti/756_pp_99-117.pdf
Mapping between Relational Database Schema and OWL Ontology for Deep Annotation: http://portal.acm.org/citation.cfm?id=1249215&CFID=4471127&CFTOKEN=65330774
Maintaining Mappings between Conceptual Models and Relational Schemas: http://www.ischool.drexel.edu/faculty/yan/publications/JDM-21(3)-36-68-2010.pdf
Resource space model, OWL and database: Mapping and integration : http://www.knowledgegrid.net/~H.zhuge/data/ACM-TOIT-Zhuge-final.pdf
There is also the converse problem of dealing with legacy database schemas and some of the resources above talk about that. There is OntoBase and some plugins for older Protege that would transform an RDBMs schema into an ontology. That will be useful if we need to write code within our semantic dev framework that talks to legacy databases (which is likely to be the case).
In meta terms however, the relational (i.e. entity relationship) meta model needs to be described in the ontology with concepts such as "Database_Schema", "Table", "Column", "Database_Type" type etc. I didn't come across to anything already done in that area, it seems trivial so we may just do it from scratch.
In any event, we've stipulated in the initial architecture that we'd like the ability to keep older versions of everything. That doesn't necessarily mean that everything should be automatically versioned. The decision should be made depending on meta data about the particular individual.
So the simplest thing to do is to start with a generic RDBMs schema for storing arbitrary OWL individuals and think about optimizations from there. That schema should already support versioning. The next step would be mapping of (portion of the information about) OWL individuals to RDBMs entities (tables) and specifying the abstract rules that govern what goes where. Those rules should work for all 4 CRUD operations.
Business Object Ontologies
The schema for storing individuals doesn't need to also store the full ontology. It doesn't need class or property information as those remain in the metadata database. The information about each individual is encapsulated in an ontology of its own, a mini-ontology so to speak, the contains only the relevant assertions pertaining to it. We establish a naming schema for those "mini ontologies" that has the following form:
{base-iri}/bo/{main-type-iri}/{object-id}
where
base-iri=http://www.miamidade.gov
'bo' is short for "business object"
main-type-iri=The IRI of the unique class this individual is explicitly declared to be an instance of.
object-id=The unique integer identifier of this business object within the scope of its main type.
For example, the ontology describing an individual of a service type Garbage_Complaint and that has an id 67 will be:
http://www.miamidade.gov/bo/Garbage_Complaint/67
The IRI of the individual itself within that ontology is formed by appending the string '#bo' to the ontology IRI. In other words, the individual has a local name 'bo' within its own ontology:
http://www.miamidade.gov/bo/Garbage_Complaint/67#bo
The RDBMs schema
  • Needs a dedicated table mapping IRI <-> numerical IDs (NUMBER(19)) - IRI_TABLE. This will make the other tables smaller, more compact, and since most of the IRIs deal with meta-data, caching could be used to avoid consulting IRI_TABLE. An important advantage of this table is renaming: the IRI of an object can be changed with having to modify all its references everywhere.
  • Seems better to separate the object properties and data properties in two different tables, since object properties have a fixed form/size: they are just triplets of integer identifiers, while data properties have to deal with data as big as blobs, or arbitrary long strings.
  • The object property table, OWL_OBJECT_PROPERTIES has the whole row (triplet) as a primary key. But the that's unfeasiable for the OWL_DATA_PROPERTY table because potentially big data values, so each data property assertion will have its own unique ID.
  • Naturally, we can use various indexes to improve lookup when those tables get large.
Versioning
There are two strategies to maintain versions of data: real time and logical time. By logical time I mean incrementing version numbers for each piece of data. That logical clock can be global or it can be a separate sequence for each type of data. However, clearly, one would want a snapshot of several related objects at a particular point in time so the clock must be global. Using real time as the clock has the advantage of allowing queries by specific time ranges. On the other hand, real time can be problematic with unsynchronized computers.
One way to maintain versions is to augment each table with columns carrying version information:
[attr1, attr2, ..., version_number]
where version number could be a timestamp or some sort of a logical clock. To get the latest version, one would have to do a 'max' on the version column. To get a snapshot of a set of related objects "as of version N", one needs to get, for each object, the max version <= N. Again, a lot of aggregation, and the queries are not very simple. An alternative schema is to store pairs of timestamps:
[attr1, attr2, ..., from, to]
where the from and to indicate an interval of the time clock where that particular version is valid. A null value of the 'to' column (or the maximal possible number) indicates that this is the latest version. Thus the latest version is found more directly and it's also easier to retrieve object "as of" a particular time (time >= from and time < to).
The second version takes up more space, but querying is faster, so we will use that.