g.raphaelli's weblog

Entries tagged “http”

Streaming Flickr Uploads in Python

written by g, on Oct 7, 2009 5:29:00 PM.

Flickr has supported up to 500 MB uploads since April. Unfortunately, uploading even a 100 MB file (easily less than 30 sec of high def video) with Flickr.API, possibly after reading this or this, uses about about 100 MB of memory, because the entire file is first read() before being sent off to Flickr. top reports this for a 94 MB video upload:
  PID COMMAND     #TH #PRTS #MREGS RPRVT  RSHRD  RSIZE  VSIZE
  627 python        1    20    133   98M   256K   101M   118M 
A quick search of pypi turned up Poster, a library for Streaming HTTP uploads and multipart/form-data encoding. After hooking this up, the same 94 MB video upload is reported as:
  PID COMMAND     #TH #PRTS #MREGS RPRVT  RSHRD  RSIZE  VSIZE
  631 python        1    20    132 4904K   256K  7524K    24M 
Notice the private memory usage drop from 98 MB to under 5 MB. The code to get it done looks like this:
import poster, poster.streaminghttp
poster.streaminghttp.register_openers()

def flop(fn):
    def flopped(a, **kwargs):
        (c, d) = fn(a, **kwargs)
        return (d,c)
    return flopped
flop swaps the (body generator, headers) tuple returned by Poster's multipart_encode into (headers, body) that Flickr.API's execute_request is expecting as seen in the next code block. multipart_encode can receive a custom boundary but we're basically just going to ignore that here. Then, when executing the upload request, this new form encoding can be used like:
photo = open(file, 'rb')
upload_request = Flickr.API.Request(url="http://api.flickr.com/services/upload",
    auth_token=token, photo=photo)
upload_response = api.execute_request(upload_request, sign=True,
    encode=flop(poster.encode.multipart_encode))
Now Poster takes care of transferring the file in 8KB chunks, instead of loading the whole file into memory.