g.raphaelli's weblog

Entries tagged “job server”

Circumventing security policy with gearman

written by g, on Jan 12, 2009 8:42:00 PM.

Gearman can be handy for defeating security in situations like the one pictured:

Here, Host A can initiate connections to Host B but Host B is blocked by a firewall/router ACL/etc from initiating communications with Host A.

With gearman, you can call a service on Host A from Host B with a setup like this:

  1. Run a gearmand job server on host B (or any host that A and B can both reach).
  2. Register a gearman worker running on Host A with that job server.
  3. Submit work from a client running on Host B to that job server.

This idea can be taken another step by chaining workers together such that calling abc() as depicted above creates a job that Host C, which can't reach or be reached by Host B at all, can ultimately execute. That kind of setup makes it increasingly difficult to track an individual task's real status but it can be handy in a pinch.

Gearmand Rewrite Released

written by g, on Jan 9, 2009 5:43:00 PM.

There is some big news for Gearman fans out today - the first release of the C-rewrite of the gearmand job server has been announced.

Gearman is an excellent framework for farming out tasks to pools of machines, parallelizing tasks, and making calls between programming languages that speak gearman. Like memcached, it's extremely easy to get started using it. Once you do start using it, you'll never understand how you lived without it.

The new release does not provide python bindings but the code in sixapart's svn should be compatible with the new server*. The sample code provided with the both the perl and C versions is pretty trivial as they simply echo or reverse some strings. To pass interesting data between client and worker you'll need to create your own convention. YAML or simply JSON is very handy for this.

Let's take a contrived example. Of course, error handling will be omitted for clarity.

This is a gearman worker that expects a list of urls and will do something to each of them.

""" A Sample Gearman Worker """
import logging

from functools import wraps
import os
import simplejson as json
import urllib

from gearman import GearmanWorker

def job_in(fn):
    """ Decorates worker functions by calling them with a job's arguments """
    @wraps(fn)
    def new(job):
        # do something with the job object
        return fn(job.arg)
    return new

def json_in(fn):
    """ Decorates a function that may be called with a
        JSON-formatted string but expects a python object """
    @wraps(fn)
    def new(arg):
        # convert the args in JSON to a python object
        arg = json.loads(arg)
        return fn(arg)
    return new

@job_in
@json_in
def fetch(urls):
    success = 0
    for url in urls:
        logging.debug("fetching %s" % url)
        # do something with the url
        success += 1

    return json.dumps({'fetched': success})

worker = GearmanWorker(jobservers)
worker.register_function("fetch", fetch)
worker.work()

A client for this worker can look like:

import simplejson as json
from gearman import GearmanClient, Task

urls = ['http://www.flickr.com', 'http://www.yahoo.com']

client = GearmanClient(jobservers)
response = client.do_task(Task("fetch", arg=json.dumps(urls)))

print "%i urls fetched successfully" % json.loads(response)['fetched']

While this example is still quite simple, it does illustrate the idea of the convention necessary for passing real information between clients and workers. The python libraries for gearman also support tasksets which are a set of tasks submitted at once to a job server for parallel execution. This is a simple yet powerful way to speed up work and make the most of hardware investments.

I hope that this rewrite of gearmand renews interest in the python community to enhance the current bindings. I'd be very interested in a gearman protocol implementation for twisted (and might just start working on it soon).

* on Mac OSX 10.4.11 I'm getting a bus error from gearmand after successfully running a job. I haven't tracked it down or submitted a bug yet. update: bug reported via IRC (yes, I'm gilad on freenode) and fixed with a two line patch. A new release should be announced shortly.