Hi!

My name is Wes Garland. I am a senior software developer with Kings Distributed Systems, working on the Distributed Compute Protocol for Distributed Compute Labs. We are building — without exaggeration — a next-generation super computer. This is a very exciting time to be a software developer!

I bring to KDS more than two decades of experience developing software and leading development and operations teams. My experience building GPSEE (a pre-cursor to NodeJS) and its complete software ecosystem, working on CommonJS/2.0, and building BravoJS have proven invaluable in my current position.

This is my personal page, where I will be posting things which interest me from time-to-time. Python Monkey

Comments? Questions? Find me on Twitter - @wesgarland4

Saturday, June 19, 2021

This morning I was designing a fast object cache for DCP's new proxy key authentication feature, which left me curious about the fastest way to detect the existence of an Object property in our environment. We are currently running Node.js version 12-LTS; my workstation is equipped with v12.21.0, which uses version 7.8.279.23 of Google's v8 JavaScript engine.

I sat down and thought about this problem for a few minutes, and could think of 5 decent ways to do this.

`if (testObject.hasOwnProperty(propName))`: this method checks if the property exists on the object I am testing, and does not test if it exists on the prototype chain (ie. is inherited). This is my usual "go-to" for this type of test; besides the fact that I believe it to be quite efficient, it does not yield surprising results when third-party libraries modify Object.prototype. Frameworks such as MooTools and Prototype.JS do this, or at least did in the past, so I have developed a healthy wariness around arbitrary object property tests.
`if (testObject[propName])`: this method checks if the property exists on the object I am testing, and also if it exists on its prototype chain. All of my objects for this cache began life as object literals, so this is an equivalent-enough test for the purposes of my cache. I know for a fact I will not have any cache keys which collide with default object property names (eg. `constructor`, `hasOwnProperty`, `__proto__`, and so on). In theory, this test will need to coerce the property object to a boolean, but I expect the JIT will be able to optimze that away for the most part -- my test is mostly for cache misses.
`if (propName in testObject)`: this method is similar to `hasOwnProperty` in that the engine can check for the existence of a property without coercing to boolean, but it checks both the object itself and the prototype chain, since the property could be inherited. I was hoping that this purpose-made check that was relatively new in the language would perform well, since they v8 authors would have opportunity and motivation for optimization.
`if (Object.hasOwnProperty.call(testObject, propName):` this is exactly the same as the first `hasOwnProperty` test, except that I have explictly identified the `hasOwnProperty` method that I want to use, rather than letting the engine look up the `hasOwnProperty` property on the object, follow the prototype chain to the Object constructor, find the inherited `hasOwnProperty`, and use that. My test does not mutate the object nor its prototype chain in the hot loop, so I am hoping that the JIT is smart enough that this test and the other `hasOwnProperty` test will perform exactly the same.
`if (propName === undefined):` in principle, this is similar to the `in` test, except that it won't find properties whose value is `undefined` - I know I won't be caching this in my application, so that doesn't matter.

The Code

// run with  node --expose-gc

global.gc();

const performance = require('perf_hooks').performance;

var obj = {};
var tests = [];
var start, end;

function rp()
{
  return Math.random().toString(36).slice(2 + Math.random()*2);
}

for (let i=0; i < 1000; i++)
  obj[rp()] = { ran: Math.random(), now: Date.now() };

tests.push(function direct(iter, quiet) {
  start = performance.now();
  for (let i=0; i < iter; i++)
  {
    let test = rp();

    if (obj.hasOwnProperty(test))
      console.log('collision', i);
  }
  end = performance.now();
  if (!quiet)
    console.log('direct hasOwnProperty:\t', end-start);
});

tests.push(function indirect(iter, quiet) {
  start = performance.now();
  for (let i=0; i < iter; i++)
  {
    let test = rp();
    
    if (Object.hasOwnProperty.call(obj, 'test'))
      console.log('collision', i);
  }
  end = performance.now();
  if (!quiet)
    console.log('indirect hasOwnProperty:', end-start);
});

tests.push(function falsey(iter, quiet) {
  start = performance.now();
  for (let i=0; i < iter; i++)
  {
    let test = rp();
    
    if (obj[test])
      console.log('collision', i);
  }
  end = performance.now();
  if (!quiet)
    console.log('falsey:\t\t\t', end-start);
});


tests.push(function undef(iter, quiet) {
  start = performance.now();
  for (let i=0; i < iter; i++)
  {
    let test = rp();
    
    if (obj[test] !== undefined)
      console.log('collision', i);
  }
  end = performance.now();
  if (!quiet)
    console.log('undef compare:\t\t', end-start);
});

tests.push(function has(iter, quiet) {
  start = performance.now();
  for (let i=0; i < iter; i++)
  {
    let test = rp();
    
    if (test in obj)
      console.log('collision', i);
  }
  end = performance.now();
  if (!quiet)
    console.log('has:\t\t\t', end-start);
});

for (let test of tests)
  test.ran = Math.random();
tests.sort((a,b) => a.ran - b.ran);

for (let warmup of tests)
  warmup(1000, true);
for (let test of tests)
{
  global.gc();
  for (let warmup of tests)
    warmup(1000, true);
  test(1000, true);
  test(1000000, false);
}

Findings

tldr; - no real surprises.

I was disappointed to discover that `Object.hasOwnProperty.call` performs measurably better than `testObject.hasOwnProperty`, even after a million runs. I was also disappointed that `in` didn't really perform any better than its more naive counterparts. The time disparity between the two `hasOwnProperty` tests can probably be chalked up the cost to walk up the prototype chain, needed to figure out which function should be called (why can't JIT cache this?). The naive check which involve potential coercion have the worst performance, and this is probably due to the coercion actually happening. The performance of the `in` test is a bit puzzling; we know that the prototype chain traversal isn't too expensive; perhaps we are looking at the cost to traverse the chain and perform a `hasOwnProperty`-style lookup on each node until we hit end of the chain?

Q: Which style of lookup should I use in day-to-day code?
A: Whichever makes your code easiest to read.

Monday, April 6, 2020

Apologies for the lack of updates; I assumed the CTO role at KDS last summer, and have been extremely busy since then!

2020 is shaping up to be a landmark year for Kings Distributed Systems. We are currently preparing our "First Dev" milestone for release this month. This milestone gives a stable software API for use by outside developers on browser or NodeJS platforms that will allow early adopters to build software that is expected to continue to work unchanged as we add features and march down our roadmap.

Last month, in concert with CENGN (the Center for Excellence in Next Generation Computing), KDS successfully performed a 10,000-slice job modelling Tick-Borned Encephalitis across 504 core. The at-scale testing was performed largely using workers which are part of official standalone-worker package for Debian Linux (see: https://docs.distributed.computer/worker/readme.html).

Tuesday, May 21, 2019

I have started working on BravoJS again, and have published it on the npmjs.org global NPM repository.

BravoJS is an implementation of CommonJS Modules/2.0. Upcoming versions will be streamlined a bit for the modern web environment (for example, recognizing the ubiquity of console), but most importantly, will include more functionality for NodeJS interoperability and a better "write once, run everywhere" development experience across the various platforms DCP developers find themselves using; ie.

NodeJS Applications
Browser Applications
Browser Applications' Worker Threads
DCP Worker Sandboxes

In support of these goals, we will soon add the following features:

NodeJS-style module.exports [done]
NodeJS-style module.paths
Native support for cross-site module group loading (no CORS issues with naive CDNs!)
Module group server which understands NPM packages and dependencies
Ability to transparently load modules from the DCP package repository from any of the four platforms listed above
Automatic bundling of non-DCP package repository modules that are used by work functions in the Compute API for execution in DCP Worker Sandboxes, regardless of origin

Additionally, we will add Semantic versioning in module dependencies (same as npm/package.json) and develop a version snapshot — similar in principle to package-lock.json in npm — which is used to "freeze" module versions for a given job at job deployment time. This will allow DCP developers to know with confidence exactly what code their jobs are executing, even if they are not pinning them to specific versions in their dependency declarations.

Tuesday, April 30, 2019

Dan Desjardins and I will be giving a talk at St. Lawrence college for the Big Data & AI Conference on May 3 2019.
Here is a copy the slide deck, in case you missed it:

https://people.kingsds.network/wesgarland/DCL presentation - St. Lawrence

Friday, November 9, 2018 We've been very busy at KDS lately, designing and implementing the Compute API. This API will allow users to easily perform arbitrary parallel computations on the Distributed Computer with ease. We can create client applications that run in NodeJS or on the browser which can call on DCP to perform computations that would be too time-consuming to run locally. This API builds on our earlier work by targetting the needs of the scientific computing community, and exposing the DCP functionality they need in a clear, terse API.

The needs of these users have been broken down, at their most basic left, to the following requirements:

Identify a set of input data
Apply the input data to a computation function
Collate the result of the computation
Identifiy / control the cost of compute
Have a system which behaves determinalistically (within specified parameters)

This may seem like a gross oversimplification — and it kind of is — but we can bootstrap just about anything on top of these basic steps, with a correctly curated software ecosystem. In addition, a simplified view of DCP of this nature means that we can guide users toward creating compute functions which allow us to perform all kinds of interesting scheduler, performance, and bandwidth optimizations. Maybe I'll post about those in the near future.

Here's a trivial NodeJS program which uses the soon-to-be-complete Compute API 1.0 to perform a very very simple task:

const protocol = require('dcp/protocol')
const compute = require('dcp/compute')
const paymentAccount = protocol.unlock(fs.openFileSync('myKey.keystore'))

let g = compute.for([{start: 1, end: 2000}, {start: 3, end: 5}], function(i,j) {
  let best=Infinity;
  for (let x=0; x < i; x++) {
     best = Math.min(best, require('./refiner').refine(x, j))
     progress(x/i)
  }
  return best
})
let results = await g.exec(compute.safeMarketValue, paymentAccount)
console.log("The best result was: ", results.reduce((b,c)=>Math.min(b,c)))

Keen-eyed readers will notice the presence of the require keyword -- our Compute API supports a module format which is a super set of CommonJS Modules/1.1.1 that is fully dependency-managed. The same modules can be executed on NodeJS, the web browser, or from tasks within the Distributed Computer.

The Compute API also allows users to deploy applications on the Distributed Computer for others to make use of....a sort of blend between the iOS app store and a multiuser system (e.g. VM/ESA, Unix, Multics), with a modern twist. Some day, you will be able to write programs like this! —

const protocol = require('dcp/protocol')
const compute = require('dcp/compute')
const paymentAccount = protocol.unlock(fs.openFileSync('myKey.keystore'))

let vp = new compute.Application("videoProcessor", "^1.0.0")
let frames = require('videoProcessor-local').loadFrames('myMovie.mpg')
let g = vp.for(frames, "vingette")
let results = await g.exec(compute.marketPrice, paymentWallet)
require('videoProcessor-local').saveFrames('myMovie-vingette.mpg')

...and notice that .loadFrames() is not awaited..it appears to be a synchronous interface, but it could just as easily be an ES6 function* generator. We support that!

Wednesday, September 11, 2018 Kings Distributed Systems is having an open house at Kingston City Hall tomorrow.

Itinerary
0800-0900	Catered breakfast (City Hall)
0900-0930	Introductions and special announcements. (City Hall)
0930-1130	Series of mini presentations covering our tech, corporate structure, crypto-economic model, grow-the-network strategy, research and incubator model, business development, marketing campaign, and roadmap for the future. (City Hall)
1130-1300	Lunch at the Senior Officers' mess, Fort Frontenac
1300-1400	Visit KDS/DCL offices (303 Bagot Street, Suites 402-403)
1400-1800	Own time/visit Kingston
1800-...	Dinner (RSVP required) and Kingston nightlife

I will be giving two talks in the morning; here are my slides —

Hi!

Interesting Links

The Code

Findings