Kamil Choudhury

#define ZERO -1 // oh no it's technology all the way down

What's Cooking, OpenARC Edition

Dude, you're lazy

Cut me some slack: in the space of the last year, I had another kid, moved to Los Angeles, then Las Vegas, and then back to Doha. Now that the sleep deprivation is slowly wearing off, I have resumed work on OpenARC, and am happy to report that the project has gone from being a shitty ORM with lots of boilerplate overhead to a shitty ORM with slightly less boilerplate overhead.

Introducing Minimal Definition OAGs

With this commit, we can now define graph nodes (OAGs) like so:

class OAG_SubNode1(OAG_RootNode):
    ...

    @staticproperty
    def dbstreams(cls): return {
        'field_sn1'   : [ 'int', 0 ]
    }

class OAG_SubNode2(OAG_RootNode):
    ...

    @staticproperty
    def dbstreams(cls): return {
        'field_sn2'   : [ 'int', 0 ]
    }

class OAG_AutoNode(OAG_RootNode):
    ...

    @staticproperty
    def dbstreams(cls): return {
        'field2'   : [ 'int', 0 ],
        'field3'   : [ 'int', 0 ],
        'subnode1' : [ OAG_SubNode1 ],
        'subnode2' : [ OAG_SubNode2 ]
    }

    @oagprop
    def CalcDeriv(self):
        return self.subnode1.field_sn1 + self.subnode2.field_sn2

newOAG = OAG_AutoNode1()

The instantiation of OAG_AutoNode on the last line results in the cascading creation of tables in the database for OAG_AutoNode, OAG_SubNode1 and OAG_SubNode2, along with corresponding basic indicies and foreign key constraints among all the newly created tables.

In-code definition of database constructs and their automated lazy creation in the database should drastically lower the amount of database administration that needs to be done when creating new OAG types. On an aesthetic level, I find the new definition style much nicer than the previous state of affairs.

It is worth pointing out a few things about the new defintion framework:

  • It supports the addition of new dbstreams: Missing dbstreams are added to existing tables as they are detected.
  • Deleting or changing dbstreams is NOT supported: If you make a mistake and need to delete or rename dbstreams, manual database surgery is still necessary. The library detects missing and/or removed dbstreams and thows OAGraphIntegrityError exceptions, so at least you will know when you are screwed.
  • There is no cycle detection: OpenARC object graphs are intended to be directed acyclic graphs, but the framework does not yet do any kind of cycle detection. To make sure your current graphs are compatible with upcoming functionality, do not introduce cycles in your object graphs.
  • Old-style OAG definitions are still supported: A project with only 36 commits should not have legacy baggage, but here we are: OpenARC now has two ways to define OAGs. For now, there are no plans to deprecate or eliminate the old definition style in favor of the new one.

The Way Forward

OpenARC bills itself as a functional reactive graph database, but has so far only partially delivered on its functional reactive promises. The aim over the next few months is to flesh out the project's functional reactive story, with effort concentrated on the following projects.

Project 1: Event Propagation Up The Graph

Before jumping into how to extend OpenARC's FR model, a quick review of what it currently looks like is in order:

sn1 = OAG_SubNode2().create({ 'field_sn1' : 1 })
sn2 = OAG_SubNode2().create({ 'field_sn2' : 2 })

newOAG =\
    OAG_AutoNode1().create({
        'field2'   : 10,
        'field3'   : 11,
        'subnode1' : sn1,
        'subnode2' : sn2
    })

print newOAG.CalcDeriv
# output: 3

newOAG.sn1.field_sn1 = 2
newOAG.sn1.field_sn2 = 3
print newOAG.CalcDeriv
# output: 5

newOAG.clear()
print newOAG.CalcDeriv
# output: 3

Events on subnodes (in this case a change of value for several dbstreams) are "pulled" up the graph in order to calculate a desired derived value (the CalcDeriv oagprop on newOAG). While the clear() call gives us the ability to remove in-memory edits on the graph and carry out rapid "what if" experiments, applications looking for state changes are still limited to inefficiently polling calculated oagprops.

Allowing events to be bubbled up the DAG as dbstreams are edited mitigates this limitation and ensures that work is done only when needed. A sample session would look something like this:

# Define event handlers
class OAG_AutoNode(OAG_RootNode):
    ...

    @staticproperty
    def dbstreams(cls): return {
        'field2'   : ['int',         0    , none                  ],
        'field3'   : ['int',         0    , none                  ],
        'subnode1' : [ OAG_SubNode1, None , subnode_update_handler],
        'subnode2' : [ OAG_SubNode2, None , subnode_update_handler]
    }

    def subnode_update_handler(self):
        print "A subnode has been updated. CalcDeriv is now [%s]" % self.CalcDeriv

    @oagprop
    def CalcDeriv(self):
        return self.subnode1.field_sn1 + self.subnode2.field_sn2

sn1 = OAG_SubNode2().create({ 'field_sn1' : 1 })
sn2 = OAG_SubNode2().create({ 'field_sn2' : 2 })

newOAG =\
    OAG_AutoNode1().create({
        'field2'   : 10,
        'field3'   : 11,
        'subnode1' : sn1,
        'subnode2' : sn2
    })

print newOAG.calcDeriv
# output: 3

newOAG.subnode2.field_sn2 = 3

# aysnc output: A subnode has been updated. CalcDeriv is now 4.

On-graph event propagation will simplify any programming task that requires calculations carried out off the back of event changes, all for the marginal cost of defining an extra event handler.

Project 2: Remotely Accessible OAGs

Currently, an OAG is accessible only from the process that created it. After this change, OAGs will be globally addressable and accessible over the network.

A typical session would look something like this: on the first host, an OAG is instantiated and made remotely accessible:

exportOAG = OAG_SubNode1().create({'field_sn1' : 1024})

print exportOAG.oagurl
# Throws OAError("OAG is not accessible remotely")

print exportOAG.public
# output: False

exportOAG.public = True

print exportOAG.oagurl
# output: oag://ardentprayer.anserinae.net:7843/df0e896590484364ac1704893f5e4fb1

On the second host, exportOAG can be accessed and incorporated into local OAGs as if it is local, allowing for the same event propagation semantics as fully local graphs:

importOAG = OAG_SubNode1(oagurl="oag://ardentprayer.anserinae.net:7843/df0e896590484364ac1704893f5e4fb1")

print importOAG.field_sn1
# output: 1024

superOAG =\
    OAG_SubNode().create({
        'field2'   : 10,
        'field3'   : 11,
        'subnode1' : importOAG,
        'subnode2' : sn2
    })

The implications of this change are massive: changing dbstreams on one host can trigger events on another, making OpenARC an implicit RPC framework that supports distributed functional reactive programming.

Project 3: Defining the OAG Protocol

Centralizing and codifying the protocol used by OAGs to talk to each other and the underlying storage layer will make it possible to implement graph nodes in any language and use any storage backend. Codification will involve not only a written spec, but also a test suite against which any other OpenARC protocol implementation can be tested.

The motivation for this project is simple: OpenARC is currently implemented in Python, which does not offer parallelism, and we must plan for its eventual re-implementation in a language that does (Rust perhaps?) if we hope to support truly high-performance applications. Clarifying inter-OAG interfaces and providing a clear implementation test suite will make this (and any other) re-implementation more comprehensive and less error-prone.

Project 4: Documentation

This is self explanatory: high quality projects have high quality documentation.

The project aspires to documentation that is thorough and example-driven. OpenARC is already in use over at Levelcompute, and to the extent that it is possible I will be pulling actual production examples to show how the project can be used in production in a safe and robust manner.

Timelines

As you may have guessed, OpenARC is being actively developed as part of my work at Levelcompute. There is a pressing business need for its development, so you can expect a fairly rapid development cadence. We are hoping for feature-rich minor releases once a month, followed by an 1.0 release by mid-year 2018.

If you want to talk to me about the project, you can reach out to me on Twitter, Github or via e-mail: I'm always open to feedback!