Gearmand Rewrite Released
There is some big news for Gearman fans out today - the first release of the C-rewrite of the gearmand job server has been announced.
Gearman is an excellent framework for farming out tasks to pools of machines, parallelizing tasks, and making calls between programming languages that speak gearman. Like memcached, it's extremely easy to get started using it. Once you do start using it, you'll never understand how you lived without it.
The new release does not provide python bindings but the code in sixapart's svn should be compatible with the new server*. The sample code provided with the both the perl and C versions is pretty trivial as they simply echo or reverse some strings. To pass interesting data between client and worker you'll need to create your own convention. YAML or simply JSON is very handy for this.
Let's take a contrived example. Of course, error handling will be omitted for clarity.
This is a gearman worker that expects a list of urls and will do something to each of them.
""" A Sample Gearman Worker """ import logging from functools import wraps import os import simplejson as json import urllib from gearman import GearmanWorker def job_in(fn): """ Decorates worker functions by calling them with a job's arguments """ @wraps(fn) def new(job): # do something with the job object return fn(job.arg) return new def json_in(fn): """ Decorates a function that may be called with a JSON-formatted string but expects a python object """ @wraps(fn) def new(arg): # convert the args in JSON to a python object arg = json.loads(arg) return fn(arg) return new @job_in @json_in def fetch(urls): success = 0 for url in urls: logging.debug("fetching %s" % url) # do something with the url success += 1 return json.dumps({'fetched': success}) worker = GearmanWorker(jobservers) worker.register_function("fetch", fetch) worker.work()
A client for this worker can look like:
import simplejson as json from gearman import GearmanClient, Task urls = ['http://www.flickr.com', 'http://www.yahoo.com'] client = GearmanClient(jobservers) response = client.do_task(Task("fetch", arg=json.dumps(urls))) print "%i urls fetched successfully" % json.loads(response)['fetched']
While this example is still quite simple, it does illustrate the idea of the convention necessary for passing real information between clients and workers. The python libraries for gearman also support tasksets which are a set of tasks submitted at once to a job server for parallel execution. This is a simple yet powerful way to speed up work and make the most of hardware investments.
I hope that this rewrite of gearmand renews interest in the python community to enhance the current bindings. I'd be very interested in a gearman protocol implementation for twisted (and might just start working on it soon).
* on Mac OSX 10.4.11 I'm getting a bus error from gearmand after successfully running a job. I haven't tracked it down or submitted a bug yet. update: bug reported via IRC (yes, I'm gilad on freenode) and fixed with a two line patch. A new release should be announced shortly.
