← raphael mukondiwa
contributing to pjrmi — d. e. shaw's python-java rpc bridge
written
diff 2 files · +89 / −4 · 1 test added
java python open source systems pr ↗ repo ↗

// why i looked at it

PJRmi caught my eye because it sits at an intersection I find interesting: systems programming and quantitative finance. D. E. Shaw built it internally to let researchers call legacy Java pricing and risk libraries directly from Python; it has been running in production at one of the biggest quant firms in the world.

// what pjrmi actually does

The idea is simple. You have Java code you trust: years of battle-tested pricing logic, a risk engine, whatever. You also have Python researchers who need to interact with it. PJRmi lets them call Java methods directly from Python as if the code were local.

Under the hood it spawns a Java child process, connects via Unix pipes, and uses reflection to read the entire class hierarchy at runtime. So every time you call class_for_name('java.util.ArrayList') from Python, PJRmi sends a request to the Java process, gets back full metadata about the class, then dynamically constructs a Python class from scratch. Methods, fields, constructors, tab completion, docstrings, operator overloading; all of it built at runtime from reflection data. The Java object lives in the JVM. The Python object is a shim that forwards every call across the process boundary.

It supports multiple transport modes: Unix FIFO pipes for local child processes, TCP sockets for remote Java servers, and an in-process JVM via C++ extension for maximum performance. The protocol is the same regardless of transport; you just change which connect_to_* method you call.

"the Java object lives in the JVM. the Python object is a shim."

// tracing the code

Before touching anything, I traced a single request end to end. The 7,000-line __init__.py is intimidating to read top to bottom but tracing one specific path through it is a lot less toiling. I followed connect_to_child_jvm all the way through to a usable Java object in Python hands:

connect_to_child_jvm()
  -> UnixFifoTransport    spawns Java subprocess, opens pipes
  -> PJRmi(transport)     wraps it in the high-level API
  -> c.connect()          handshakes with Java
  -> class_for_name()     requests class metadata from Java
  -> _request_class()     serializes the request, sends over the pipe
  -> _read_result()       blocks until Java responds
  -> _create_class()      builds the Python shim from metadata

That trace took a few hours. But it made everything else readable. Once you understand the request/response lifecycle, the rest of the codebase is variations on the same pattern.

The _create_class function is the most interesting part. It takes a dictionary of Java class metadata and dynamically constructs a Python class using Python's metaclass system; the equivalent of calling type(name, bases, dict) at runtime. Every Java method becomes a real Python function. Every Java exception becomes a Python exception you can catch. The constructor is overridden to send a Java instantiation request rather than allocating Python memory. The illusion is complete enough that most of the time you forget there is a JVM on the other side.

// what i found

While reading through the array handling layer, I noticed that slice indexing on Java array wrappers was unimplemented. If you tried to do array[1:4] from Python, it would crash the entire Java connection with an IllegalArgumentException. Not a clean error; the whole connection would die.

The interesting part was that the bug was not in the wire protocol. PythonSlice objects; Java's representation of Python's slice(start, stop, step); were being serialized correctly on the Python side and deserialized correctly on the Java side. The data crossed the wire intact. It just had nowhere to go once it arrived.

The gap was in WrappedArrayLike; the class PJRmi uses to expose Java arrays to Python. Its __getitem__ and __setitem__ handlers only had a branch for integer keys. Anything else fell through to an else clause that threw and terminated the connection.

I also found a pre-existing bug in the __setitem__ traversal loop. When walking a multi-dimensional array to reach the target dimension, the code was doing this:

javaarray = Array.get(value, index);  // wrong: value is what you are writing

When it should have been:

javaarray = Array.get(array, index);  // right: array is what you are traversing

In the common single-dimension case it does not matter. In a multi-dimensional case it reads from the wrong object entirely. Silent and harmless most of the time; semantically wrong always.

// the fix

Hypercube.java; PJRmi's NumPy-equivalent ndarray system; already handled PythonSlice correctly. I used it as the reference implementation rather than inventing new semantics.

For __getitem__: when the key is a PythonSlice, resolve start and stop from the slice fields. Null means unbounded, matching Python's None semantics. Normalize negative indices the Python way: index < 0 becomes index + length. Clamp both ends to array bounds. Allocate a new array of the right component type via reflection. Copy elements. Reject non-unit steps with UnsupportedOperationException; same behavior as Hypercube.

For __setitem__: same bounds resolution, but instead of returning a new array, write element-by-element into the target using Array.set. The slice branch lives in the final key block; after the traversal loop walks down to the target dimension. I also fixed the Array.get(value, ...) bug in the traversal loop while I was in there.

Rather than modifying the existing array test, I added a dedicated test_arraylike_slicing method covering basic slices, unbounded start and stop, negative indices, setitem, and step rejection. All 28 tests pass.

// what i took from it

Reading the call stack before writing anything was the most useful thing I did. The gap became obvious once I understood the lifecycle. If I had skipped the trace and gone straight to the code, I would have fixed the symptom without understanding why the fix was right.

Matching existing conventions is what makes a contribution mergeable. I did not invent new slice semantics; I found the pattern the codebase already used in Hypercube and extended it. Maintainers do not want new patterns. They want the existing ones applied correctly.

The hardest part of contributing to an unfamiliar codebase is not the fix itself. It is understanding the conventions well enough to know what a correct fix looks like. The PythonSlice implementation took about two hours once I had traced the codebase. Understanding the codebase well enough to trust the implementation took the rest of the time.