# Speed up that library when you can't C a thing John Melesky (Open Source Bridge, June 2009) --- # Caveats - We're talking about Unix and Linux - Most should be applicable on OSX once you factor in the dynlib stuff - I have no idea on the Windows front; sorry --- # Caveats - We're talking about Unix and Linux - Most should be applicable on OSX once you factor in the dynlib stuff - I have no idea on the Windows front; sorry - The example is in Ruby and Ocaml - The FFIs for Perl and Python are similar - Other called languages are possible - I'll try to be explicit what's general and what's example-specific --- # Ever in this situation? --- # The culprit def matrix_mul(a,b) rows = (0 .. a.size - 1).collect { |i| (0 .. b[0].size - 1).collect { |j| vij = 0 0.upto(a[i].size - 1) do |k| vij += a[i][k] * b[k][j] end vij } } return rows end (adapted from matrix.rb) --- # The solution
That's what C is for, right?
--- # That's not what C is for! C is a systems language, intended for use with 30-year-old kernels and drivers, on processors with fewer than 10k transistors (we're currently in the billion-transistor range, thankyouverymuch Moore's Law). --- # That's not what C is for! C is a systems language, intended for use with 30-year-old kernels and drivers, on processors with fewer than 10k transistors (we're currently in the billion-transistor range, thankyouverymuch Moore's Law). In other words, it's fast only as a side effect. --- # Other languages are better For raw numerics, Fortran still has C beat. APL is pretty dang fast, too. For complex datatypes, languages like Haskell or ML will all give you better type support, safer usage, and possibly faster code. Really, almost any language offers benefits over C, whether it's faster, safer, more concise, more advanced, or all of the above. --- # What are the options? The only constraint is that your target language needs an FFI (Foreign Function Interface) that allows C to call it. Most awesome languages allow this. I'm going to use Ocaml. The other material requirements are your language header files (`ruby.h`, `caml/callback.h`, etc.), and perserverence. --- # What are the options? The only constraint is that your target language needs an FFI (Foreign Function Interface) that allows C to call it. Most awesome languages allow this. I'm going to use Ocaml. The other material requirements are your language header files (`ruby.h`, `caml/callback.h`, etc.), and perserverence. (You may look warily upon the whole "allows C to call it" thing. This is understandable. Things will get worse before they get better.) --- # What's Ocaml? In brief, it's a statically- and strongly- typed, non-pure, strict, functional programming language. It's from the ML family (which includes SML, F#, and some others). Ocaml's pretty cool at the moment, because it's one of the fastest languages around, routinely rivalling C++ in the Shootout. --- # Matrix Multiplication in Ocaml, round 1 let matrix_mul a b = List.map (fun row -> mapn (fun column -> List.fold_left (+) 0 (List.map2 ( * ) row column)) b) a (cribbed from RosettaCode) --- # Benefits let matrix_mul a b = List.map (fun row -> mapn (fun column -> List.fold_left (+) 0 (List.map2 ( * ) row column)) b) a We've acheived a 10x speedup, and our code is not actually much more complicated than the Ruby. This code uses Lists, which are pretty natural to use in functional languages. Arrays would probably be faster, though. --- # Matrix Multiplication in Ocaml, round 2 let matrix_mul x y = let x0 = Array.length x and y0 = Array.length y in let y1 = if y0 = 0 then 0 else Array.length y.(0) in let z = Array.make_matrix x0 y1 0 in for i = 0 to x0-1 do for j = 0 to y1-1 do for k = 0 to y0-1 do z.(i).(j) <- z.(i).(j) + x.(i).(k) * y.(k).(j) done done done; z (also cribbed from RosettaCode) --- # Benefits let matrix_mul x y = let x0 = Array.length x and y0 = Array.length y in let y1 = if y0 = 0 then 0 else Array.length y.(0) in let z = Array.make_matrix x0 y1 0 in for i = 0 to x0-1 do for j = 0 to y1-1 do for k = 0 to y0-1 do z.(i).(j) <- z.(i).(j) + x.(i).(k) * y.(k).(j) done done done; z Ocaml Arrays are pretty darn fast, and have a great initialization function tailored to this sort of problem (make_matrix). We've now managed a speedup around 30-35x. And, again, the code is not that much more complex than the original Ruby. --- # Gluing Pieces Together Of course, the Ocaml code is operating on Ocaml types, and Ruby wants to pass it Ruby types. What to do, what to do? --- # It's gonna get worse before it gets better... Ruby can call C code... --- # It's gonna get worse before it gets better... Ruby can call C code... Ocaml can be called by C code... --- # It's gonna get worse before it gets better... Ruby can call C code... Ocaml can be called by C code... They both have facilities to translate their types into C types (and Ocaml vice-versa)... --- # FFI Aside Many languages provide C type definitions and functions for extensions. There are two problems with this. 1. Many of these FFIs aren't merely type declarations and functions, but also macros. That means they only work in C. 2. Even if they were macro-free, few languages have all-purpose transparent C access. Thus, wrapper code is always needed. --- # Dancing the Type Two-Step Let's work our way out, from Ocaml to Ruby, via C. First, we need to tell Ocaml to export a function to C. let matrix_mul x y = let x0 = Array.length x and y0 = Array.length y in let y1 = if y0 = 0 then 0 else Array.length y.(0) in let z = Array.make_matrix x0 y1 0 in for i = 0 to x0-1 do for j = 0 to y1-1 do for k = 0 to y0-1 do z.(i).(j) <- z.(i).(j) + x.(i).(k) * y.(k).(j) done done done; z let _ = Callback.register "ocaml_matrix_mul" matrix_mul;; --- # Dancing the Type Two-Step Next, we need some C code to call the Ocaml. #include "caml/callback.h" void initcaml(void) { char *dummy_argv[] = {0}; caml_startup(dummy_argv); } First we initialize the Ocaml runtime. --- # Dancing the Type Two-Step int interm_matrix_mul(*a, *b) { static value *f = NULL; f = caml_named_value("ocaml_matrix_mul"); if (f == NULL) return -1; value r_caml; r_caml = callback2(*f, a, b); This is how we grab and execute the exported code. Note that we haven't figured the types for `a` or `b`, nor have we converted r_caml to something C-like. Who's up for C arrays of C arrays of C `int`s? (Or should they be `long`s?) --- # Dancing the Type Two-Step From the Ruby side, we'd do something like this. #include "ruby.h" VALUE OcamlMatrix = Qnil; // set up the Ruby namespace // prototype our init and method void Init_ocamlmatrix(); VALUE method_matrix_mul(VALUE self, VALUE a, VALUE b); void Init_ocamlmatrix() { OcamlMatrix = rb_define_module("OcamlMatrix"); rb_define_method(OcamlMatrix, "matrix_mul", method_matrix_mul, 0); } --- # Dancing the Type Two-Step VALUE method_matrix_mul(VALUE self, VALUE a, VALUE b) { // here we would convert a and b to C arrays c = interm_matrix_mul(conv_a, conv_b); // and then convert c from C arrays to a Ruby VALUE return conv_c; } --- # You probably saw this coming You can convert directly from the Ruby types to Ocaml types without C intermediate types. This is nice, because it avoids any need for structs or pointer arithmetic. --- # You probably saw this coming You can convert directly from the Ruby types to Ocaml types without C intermediate types. This is nice, because it avoids any need for structs or pointer arithmetic. But... this only works if you can import both languages' headers without name conflicts (thankfully Ruby's VALUE and Ocaml's value are spelled differently). --- # From Two-Step to Tango VALUE method_matrix_mul(VALUE self, VALUE a, VALUE b) { CAMLparam0 (); CAMLlocal2 (caml_a, caml_b); CAMLlocal1 (caml_interm); CAMLlocal1 (caml_item); int x = RARRAY(a)->len; for (int i = 0; ilen; caml_interm = caml_alloc(y, 0); for (int j = 0; j