123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621 |
- <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
- <html> <head>
- <title>Speed Up that Library when You Can't C a Thing</title>
- <style>
- .slide {
- border: 2px solid #668888;
- background-color: #A4CAE1;
- padding: 2%;
- width: 94%;
- }
- pre {
- border: 1px solid #224444;
- background-color: #B4DAF1;
- padding: 2px;
- }
- </style>
- <script src="scripts/jquery-1.2.3.js" type="text/javascript" />
- <script src="scripts/slideshow.js" type="text/javascript" />
- </head>
- <body>
- <div class='slide'>
- <h1>Speed up that library when you can't C a thing</h1>
- <p>John Melesky
- (Open Source Bridge, June 2009)</p>
- </div>
- <div class='slide'>
- <h1>Caveats</h1>
- <ul>
- <li>We're talking about Unix and Linux<ul>
- <li>Most should be applicable on OSX once you factor in the dynlib stuff</li>
- <li>I have no idea on the Windows front; sorry</li>
- </ul>
- </li>
- </ul>
- </div>
- <div class='slide'>
- <h1>Caveats</h1>
- <ul>
- <li>We're talking about Unix and Linux<ul>
- <li>Most should be applicable on OSX once you factor in the dynlib stuff</li>
- <li>I have no idea on the Windows front; sorry</li>
- </ul>
- </li>
- <li>The example is in Ruby and Ocaml<ul>
- <li>The FFIs for Perl and Python are similar</li>
- <li>Other called languages are possible</li>
- <li>I'll try to be explicit what's general and what's example-specific</li>
- </ul>
- </li>
- </ul>
- </div>
- <div class='slide'>
- <h1>Ever in this situation?</h1>
- </div>
- <div class='slide'>
- <h1>The culprit</h1>
- <pre><code>def matrix_mul(a,b)
- rows = (0 .. a.size - 1).collect {
- |i|
- (0 .. b[0].size - 1).collect {
- |j|
- vij = 0
- 0.upto(a[i].size - 1) do
- |k|
- vij += a[i][k] * b[k][j]
- end
- vij
- }
- }
- return rows
- end
- </code></pre>
- <p>(adapted from matrix.rb)</p>
- </div>
- <div class='slide'>
- <h1>The solution</h1>
- <table>
- <tr>
- <td><b><font size="+2">That's what C is for, right?</b></td>
- <td><img src="img/dhh.jpg"></td>
- </tr>
- </table>
- </div>
- <div class='slide'>
- <h1>That's not what C is for!</h1>
- <p>C is a systems language, intended for use with 30-year-old kernels and drivers, on processors with fewer than 10k transistors (we're currently in the billion-transistor range, thankyouverymuch Moore's Law).</p>
- </div>
- <div class='slide'>
- <h1>That's not what C is for!</h1>
- <p>C is a systems language, intended for use with 30-year-old kernels and drivers, on processors with fewer than 10k transistors (we're currently in the billion-transistor range, thankyouverymuch Moore's Law).</p>
- <p>In other words, it's fast only as a side effect.</p>
- </div>
- <div class='slide'>
- <h1>Other languages are better</h1>
- <p>For raw numerics, Fortran still has C beat. APL is pretty dang fast, too.</p>
- <p>For complex datatypes, languages like Haskell or ML will all give you better type support, safer usage, and possibly faster code.</p>
- <p>Really, almost any language offers benefits over C, whether it's faster, safer, more concise, more advanced, or all of the above.</p>
- </div>
- <div class='slide'>
- <h1>What are the options?</h1>
- <p>The only constraint is that your target language needs an FFI (Foreign Function Interface) that allows C to call it. Most awesome languages allow this. I'm going to use Ocaml.</p>
- <p>The other material requirements are your language header files (<code>ruby.h</code>, <code>caml/callback.h</code>, etc.), and perserverence.</p>
- </div>
- <div class='slide'>
- <h1>What are the options?</h1>
- <p>The only constraint is that your target language needs an FFI (Foreign Function Interface) that allows C to call it. Most awesome languages allow this. I'm going to use Ocaml.</p>
- <p>The other material requirements are your language header files (<code>ruby.h</code>, <code>caml/callback.h</code>, etc.), and perserverence.</p>
- <p>(You may look warily upon the whole "allows C to call it" thing. This is understandable. Things will get worse before they get better.)</p>
- </div>
- <div class='slide'>
- <h1>What's Ocaml?</h1>
- <p>In brief, it's a statically- and strongly- typed, non-pure, strict, functional programming language. It's from the ML family (which includes SML, F#, and some others).</p>
- <p>Ocaml's pretty cool at the moment, because it's one of the fastest languages around, routinely rivalling C++ in the Shootout.</p>
- </div>
- <div class='slide'>
- <h1>Matrix Multiplication in Ocaml, round 1</h1>
- <pre><code>let matrix_mul a b =
- List.map
- (fun row ->
- mapn
- (fun column ->
- List.fold_left (+) 0
- (List.map2 ( * ) row column))
- b)
- a
- </code></pre>
- <p>(cribbed from RosettaCode)</p>
- </div>
- <div class='slide'>
- <h1>Benefits</h1>
- <pre><code>let matrix_mul a b =
- List.map
- (fun row ->
- mapn
- (fun column ->
- List.fold_left (+) 0
- (List.map2 ( * ) row column))
- b)
- a
- </code></pre>
- <p>We've acheived a 10x speedup, and our code is not actually much more complicated than the Ruby.</p>
- <p>This code uses Lists, which are pretty natural to use in functional languages. Arrays would probably be faster, though.</p>
- </div>
- <div class='slide'>
- <h1>Matrix Multiplication in Ocaml, round 2</h1>
- <pre><code>let matrix_mul x y =
- let x0 = Array.length x
- and y0 = Array.length y in
- let y1 = if y0 = 0 then 0 else Array.length y.(0) in
- let z = Array.make_matrix x0 y1 0 in
- for i = 0 to x0-1 do
- for j = 0 to y1-1 do
- for k = 0 to y0-1 do
- z.(i).(j) <- z.(i).(j) + x.(i).(k) * y.(k).(j)
- done
- done
- done;
- z
- </code></pre>
- <p>(also cribbed from RosettaCode)</p>
- </div>
- <div class='slide'>
- <h1>Benefits</h1>
- <pre><code>let matrix_mul x y =
- let x0 = Array.length x
- and y0 = Array.length y in
- let y1 = if y0 = 0 then 0 else Array.length y.(0) in
- let z = Array.make_matrix x0 y1 0 in
- for i = 0 to x0-1 do
- for j = 0 to y1-1 do
- for k = 0 to y0-1 do
- z.(i).(j) <- z.(i).(j) + x.(i).(k) * y.(k).(j)
- done
- done
- done;
- z
- </code></pre>
- <p>Ocaml Arrays are pretty darn fast, and have a great initialization function tailored to this sort of problem (make_matrix).</p>
- <p>We've now managed a speedup around 30-35x. And, again, the code is not that much more complex than the original Ruby.</p>
- </div>
- <div class='slide'>
- <h1>Gluing Pieces Together</h1>
- <p>Of course, the Ocaml code is operating on Ocaml types, and Ruby wants to pass it Ruby types. What to do, what to do?</p>
- </div>
- <div class='slide'>
- <h1>It's gonna get worse before it gets better...</h1>
- <p>Ruby can call C code...</p>
- </div>
- <div class='slide'>
- <h1>It's gonna get worse before it gets better...</h1>
- <p>Ruby can call C code...</p>
- <p>Ocaml can be called by C code...</p>
- </div>
- <div class='slide'>
- <h1>It's gonna get worse before it gets better...</h1>
- <p>Ruby can call C code...</p>
- <p>Ocaml can be called by C code...</p>
- <p>They both have facilities to translate their types into C types (and Ocaml vice-versa)...</p>
- </div>
- <div class='slide'>
- <h1>FFI Aside</h1>
- <p>Many languages provide C type definitions and functions for extensions. There are two problems with this.</p>
- <ol>
- <li>Many of these FFIs aren't merely type declarations and functions, but also macros. That means they only work in C.</li>
- <li>Even if they were macro-free, few languages have all-purpose transparent C access. Thus, wrapper code is always needed.</li>
- </ol>
- </div>
- <div class='slide'>
- <h1>Dancing the Type Two-Step</h1>
- <p>Let's work our way out, from Ocaml to Ruby, via C. First, we need to tell Ocaml to export a function to C.</p>
- <pre><code>let matrix_mul x y =
- let x0 = Array.length x
- and y0 = Array.length y in
- let y1 = if y0 = 0 then 0 else Array.length y.(0) in
- let z = Array.make_matrix x0 y1 0 in
- for i = 0 to x0-1 do
- for j = 0 to y1-1 do
- for k = 0 to y0-1 do
- z.(i).(j) <- z.(i).(j) + x.(i).(k) * y.(k).(j)
- done
- done
- done;
- z
- let _ = Callback.register "ocaml_matrix_mul" matrix_mul;;
- </code></pre>
- </div>
- <div class='slide'>
- <h1>Dancing the Type Two-Step</h1>
- <p>Next, we need some C code to call the Ocaml.</p>
- <pre><code>#include "caml/callback.h"
- void initcaml(void) {
- char *dummy_argv[] = {0};
- caml_startup(dummy_argv);
- }
- </code></pre>
- <p>First we initialize the Ocaml runtime.</p>
- </div>
- <div class='slide'>
- <h1>Dancing the Type Two-Step</h1>
- <pre><code>int interm_matrix_mul(*a, *b) {
- static value *f = NULL;
- f = caml_named_value("ocaml_matrix_mul");
- if (f == NULL)
- return -1;
- value r_caml;
- r_caml = callback2(*f, a, b);
- </code></pre>
- <p>This is how we grab and execute the exported code. Note that we haven't figured the types for <code>a</code> or <code>b</code>, nor have we converted r_caml to something C-like. Who's up for C arrays of C arrays of C <code>int</code>s? (Or should they be <code>long</code>s?)</p>
- </div>
- <div class='slide'>
- <h1>Dancing the Type Two-Step</h1>
- <p>From the Ruby side, we'd do something like this.</p>
- <pre><code>#include "ruby.h"
- VALUE OcamlMatrix = Qnil; // set up the Ruby namespace
- // prototype our init and method
- void Init_ocamlmatrix();
- VALUE method_matrix_mul(VALUE self, VALUE a, VALUE b);
- void Init_ocamlmatrix() {
- OcamlMatrix = rb_define_module("OcamlMatrix");
- rb_define_method(OcamlMatrix, "matrix_mul", method_matrix_mul, 0);
- }
- </code></pre>
- </div>
- <div class='slide'>
- <h1>Dancing the Type Two-Step</h1>
- <pre><code>VALUE method_matrix_mul(VALUE self, VALUE a, VALUE b) {
- // here we would convert a and b to C arrays
- c = interm_matrix_mul(conv_a, conv_b);
- // and then convert c from C arrays to a Ruby VALUE
- return conv_c;
- }
- </code></pre>
- </div>
- <div class='slide'>
- <h1>You probably saw this coming</h1>
- <p>You can convert directly from the Ruby types to Ocaml types without C intermediate types. This is nice, because it avoids any need for structs or pointer arithmetic.</p>
- </div>
- <div class='slide'>
- <h1>You probably saw this coming</h1>
- <p>You can convert directly from the Ruby types to Ocaml types without C intermediate types. This is nice, because it avoids any need for structs or pointer arithmetic.</p>
- <p>But... this only works if you can import both languages' headers without name conflicts (thankfully Ruby's VALUE and Ocaml's value are spelled differently).</p>
- </div>
- <div class='slide'>
- <h1>From Two-Step to Tango</h1>
- <pre><code>VALUE method_matrix_mul(VALUE self, VALUE a, VALUE b) {
- CAMLparam0 ();
- CAMLlocal2 (caml_a, caml_b);
- CAMLlocal1 (caml_interm);
- CAMLlocal1 (caml_item);
- int x = RARRAY(a)->len;
- for (int i = 0; i<x; i++) {
- VALUE row = rb_ary_entry(a, i); // row = a[i]
- int y = RARRAY(row)->len;
- caml_interm = caml_alloc(y, 0);
- for (int j = 0; j<y; j++) {
- VALUE item = rb_ary_entry(row, j);
- long c_item = NUM2LONG(item);
- caml_item = Val_long(c_item);
- Store_field(caml_interm, j, caml_item);
- }
- }
- // do the same for b
- caml_c = interm_matrix_mul(caml_a, caml_b);
- // and then convert caml_c from value to a Ruby VALUE
- CAMLreturnT(VALUE, ruby_c);
- }
- </code></pre>
- </div>
- <div class='slide'>
- <h1>Did you spot the problems?</h1>
- </div>
- <div class='slide'>
- <h1>Hmmm...</h1>
- </div>
- <div class='slide'>
- <h1>Hmmm...</h1>
- <p>Complex data is the problem.</p>
- </div>
- <div class='slide'>
- <h1>Hmmm...</h1>
- <p>Complex data is the problem.</p>
- <p>What if the data was already in Ocaml types when matrix_mul is called?</p>
- </div>
- <div class='slide'>
- <h1>Don't wrap a function, wrap a class</h1>
- </div>
- <div class='slide'>
- <h1>Tango to Salsa</h1>
- <pre><code>let new_matrix width height =
- Array.make_matrix width height 0
- let matrix_assign m x y item =
- m.(x).(y) <- item
- let matrix_retrieve m x y =
- m.(x).(y)
- let _ = Callback.register "ocaml_new_matrix" new_matrix
- let _ = Callback.register "ocaml_matrix_assign" matrix_assign
- let _ = Callback.register "ocaml_matrix_retrieve" matrix_retrieve
- let _ = Callback.register "ocaml_matrix_mul" matrix_mul
- </code></pre>
- </div>
- <div class='slide'>
- <h1>Tango to Salsa</h1>
- <pre><code>void Init_ocamlmatrix() {
- OcamlMatrix = rb_define_module("OcamlMatrix");
- rb_define_singleton_method(OcamlMatrix, "w_new_matrix", singleton_matrix_new, 2);
- rb_define_method(OcamlMatrix, "w_matrix_assign", method_assign, 3);
- rb_define_method(OcamlMatrix, "w_matrix_retrieve", method_retrieve, 2);
- rb_define_method(OcamlMatrix, "w_matrix_mul", method_matrix_mul, 1);
- initcaml();
- }
- </code></pre>
- </div>
- <div class='slide'>
- <h1>Tango to Salsa</h1>
- <pre><code>VALUE singleton_matrix_new(VALUE class, VALUE width, VALUE height) {
- CAMLparam0 ();
- CAMLlocal1 (ocaml_m);
- ocaml_m = ocaml_new_matrix(Val_long(NUM2LONG(width)),
- Val_long(NUM2LONG(height)));
- VALUE my_m = Data_Wrap_Struct(class, 0, 0, ocaml_m);
- rb_obj_call_init(my_m, 0, NULL);
- CAMLreturnT (VALUE, my_m);
- }
- </code></pre>
- </div>
- <div class='slide'>
- <h1>Tango to Salsa</h1>
- <pre><code>void method_assign(VALUE self, VALUE x, VALUE y, VALUE item) {
- CAMLparam0 ();
- CAMLlocal1 (ocaml_m);
- Data_Get_Struct(self, value, ocaml_m);
- ocaml_matrix_assign(ocaml_m, Val_long(NUM2LONG(x)),
- Val_long(NUM2LONG(y)),
- Val_long(NUM2LONG(item)));
- CAMLreturn0;
- }
- VALUE method_retrieve(VALUE self, VALUE x, VALUE y) {
- CAMLparam0 ();
- CAMLlocal1 (ocaml_m);
- Data_Get_Struct(self, value, ocaml_m);
- VALUE retval = LONG2NUM(Long_val(ocaml_matrix_retrieve(ocaml_m,
- Val_long(NUM2LONG(x)),
- Val_long(NUM2LONG(y)))
- ));
- CAMLreturnT (VALUE, retval);
- }
- </code></pre>
- </div>
- <div class='slide'>
- <h1>Tango to Salsa</h1>
- <pre><code>VALUE method_matrix_mul(VALUE self, VALUE b) {
- CAMLparam0 ();
- CAMLlocal3 (ocaml_m1, ocaml_m2, ocaml_m3);
- Data_Get_Struct(self, value, ocaml_m1);
- Data_Get_Struct(b, value, ocaml_m2);
- ocaml_m3 = ocaml_matrix_mul(ocaml_m1, ocaml_m2);
- VALUE my_m = Data_Wrap_Struct(class, 0, 0, ocaml_m);
- rb_obj_call_init(my_m, 0, NULL);
- CAMLreturnT (VALUE, my_m);
- }
- </code></pre>
- </div>
- <div class='slide'>
- <h1>Add a Ruby slipper</h1>
- </div>
- <div class='slide'>
- <h1>No Silver Bullets</h1>
- <p>But we have a couple bronze ones.</p>
- </div>
- <div class='slide'>
- <h1>Perl</h1>
- <p>Inline:: libraries on CPAN.</p>
- </div>
- <div class='slide'>
- <h1>Perl</h1>
- <pre><code>use Inline "Befunge";
- hello();
- __END__
- __Befunge__
- ;:hello - print a msg;<q_,#! #:<"Hello world!"a
- ;
- :
- a q - ;tcartsbus:;
- d
- d
- ;
- + ;not a valid func def;
- q
- </code></pre>
- </div>
- <div class='slide'>
- <h1>Python</h1>
- <p>PyCaml, PyObjC.</p>
- </div>
- <div class='slide'>
- <h1>Python Calling Python</h1>
- <p>SciPy, NumPy.</p>
- <p>Cython.</p>
- </div>
- <div class='slide'>
- <h1>Ruby</h1>
- <p>Well after i decided on using Ruby and using Ocaml...</p>
- </div>
- <div class='slide'>
- <h1>Ruby</h1>
- <p>Well after i decided on using Ruby and using Ocaml...</p>
- <p>... i found <a href="http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/256569">Rocaml</a>.</p>
- </div>
- <div class='slide'>
- <h1>Ruby</h1>
- <p>Well after i decided on using Ruby and using Ocaml...</p>
- <p>... i found <a href="http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/256569">Rocaml</a>.</p>
- <p>Also available is the Ruby clone of Perl's Inline::, including InlineFortran.</p>
- </div>
- <div class='slide'>
- <h1>Phew</h1>
- </div>
- <div class='slide'>
- <h1>Phew</h1>
- <p>Any questions?</p>
- </div>
- <div class='slide'>
- <h1>Thanks</h1>
- <p>Thanks for coming. Thanks OSBridge for putting together an awesome conference.</p>
- </div>
- <div class='slide'>
- </div>
- </body></html>
|