<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title type="text">g.raphaelli's weblog</title>
  <id>http://g.raphaelli.com/tags/upload/feed.atom</id>
  <updated>2009-12-01T08:58:20Z</updated>
  <link href="http://g.raphaelli.com/" />
  <link href="http://g.raphaelli.com/tags/upload/feed.atom" rel="self" />
  <generator uri="http://zine.pocoo.org/" version="0.1.2">Zine</generator>
  <entry xml:base="http://g.raphaelli.com/tags/upload/feed.atom">
    <title type="text">Streaming Flickr Uploads in Python</title>
    <id>tag:g.raphaelli.com,2009-10-07:/entry;2009/10/7/streaming-flickr-uploads-in-python</id>
    <updated>2009-12-01T08:58:20Z</updated>
    <published>2009-10-07T07:29:00Z</published>
    <link href="http://g.raphaelli.com/2009/10/07/streaming-flickr-uploads-in-python" />
    <author>
      <name>g</name>
    </author>
    <content type="html">Flickr has supported up to 500 MB uploads since &lt;a href="http://blog.flickr.net/en/2009/04/27/hd-video-files-are-big/"&gt;April&lt;/a&gt;.  Unfortunately, uploading even a 100 MB file (easily less than 30 sec of high def video) with Flickr.API, possibly after reading &lt;a href="http://g.raphaelli.com/2009/08/06/flickr-api-helper-script"&gt;this&lt;/a&gt; or &lt;a href="http://g.raphaelli.com/2009/08/02/python-flickr-api-042"&gt;this&lt;/a&gt;, uses about about 100 MB of memory, because the entire file is first read() before being sent off to Flickr.  top reports this for a 94 MB video upload:

&lt;div class="syntax"&gt;&lt;pre&gt;  PID COMMAND     #TH #PRTS #MREGS RPRVT  RSHRD  RSIZE  VSIZE
  627 python        1    20    133   98M   256K   101M   118M 
&lt;/pre&gt;&lt;/div&gt;


A quick search of pypi turned up &lt;a href="http://pypi.python.org/pypi/poster/"&gt;Poster&lt;/a&gt;, a library for &lt;q&gt;Streaming HTTP uploads and multipart/form-data encoding&lt;/q&gt;.  After hooking this up, the same 94 MB video upload is reported as:

&lt;div class="syntax"&gt;&lt;pre&gt;  PID COMMAND     #TH #PRTS #MREGS RPRVT  RSHRD  RSIZE  VSIZE
  631 python        1    20    132 4904K   256K  7524K    24M 
&lt;/pre&gt;&lt;/div&gt;


Notice the private memory usage drop from 98 MB to under 5 MB.  The code to get it done looks like this:

&lt;div class="syntax"&gt;&lt;pre&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;poster&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nn"&gt;poster.streaminghttp&lt;/span&gt;
&lt;span class="n"&gt;poster&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;streaminghttp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;register_openers&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;flop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;flopped&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;flopped&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;


flop swaps the (body generator, headers) tuple returned by Poster's multipart_encode into (headers, body) that Flickr.API's execute_request is expecting as seen in the next code block.  multipart_encode can receive a custom boundary but we're basically just going to ignore that here.

Then, when executing the upload request, this new form encoding can be used like:

&lt;div class="syntax"&gt;&lt;pre&gt;&lt;span class="n"&gt;photo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;&amp;#39;rb&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;upload_request&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Flickr&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;API&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;http://api.flickr.com/services/upload&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;auth_token&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;photo&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;photo&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;upload_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;execute_request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;upload_request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sign&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;encode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;flop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;poster&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;encode&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;multipart_encode&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;


Now Poster takes care of transferring the file in 8KB chunks, instead of loading the whole file into memory.</content>
  </entry>
</feed>

