Can CppRef<T> be ergonomic?

Adrian Taylor
5 min readDec 22, 2023

--

In a previous post, I said that we simply can’t use Rust references to point to C++ types. This might work at small scale, but for any sizable C++ project, humans can’t promise that there are no other C++ references to the same data — so you run into aliasing violations, unexpected mutations, and the dreaded Undefined Behavior.

So, instead of using &T we’ll create a CppRef<T> to create a C++ reference (or pointer). There’s some early work here — see unsafe_references_wrapped and the linked type, plus this example). This ideally relies on a Rust feature called “arbitrary self types” which I’m working on here in an RFC along with some very fine other people (thanks!)

So far so good. But, one of the open questions has been — can CppRef<T> be ergonomic? And one specific question has come up during the RFC — is it desirable to support generic receivers? For example, is this monstrosity a good or bad idea?

impl SomeType {
fn some_method(self: impl SomeTrait < Target = Self >) { ... }
}

It turns out that these questions are related.

First, let’s talk about CppPin<T> . If you have some data which may have C++ pointers or references to it, it’s simply not OK to have a Rust reference.

There are some circumstances where the object will be stored over in C++ and all you would ever have in Rust is CppRef s to it:

  • It’s stored in something like a cxx::UniquePtr
  • A C++ method has returned you a reference to something stored over in C++ land entirely.

But, you might sometimes want to own objects in Rust and yet make them available to C++. These might be C++ types or they might be Rust types. In such a case, you need a way to ensure there are only C++ references but no Rust references. That’s what CppPin<T> is for.

CppPin::new(something) consumes the something , thus proving there are no existing Rust references. It can create new C++ references — CppPin::as_cpp_ref() -> CppRef<T> — but there’s no way to get a &T or &mut T . You can safely do weird things to this type in C++, including storing references or pointers to it which you later manipulate, and there are guaranteed to be no Rust references which you discombobulate.

CppPin might seem a strange name in that it’s not exactly about preventing things moving — but it shares lots of the same properties as the regular Pin including an inability to vend references, complexity about “pin projections”, and a general level of annoyingness. CppJail or CppBubble might be better names — opinions welcomed.

Overall, though, I think CppPin<T> is necessary and fairly straightforward.

What about field access? We can’t have &T so we can’t have some_reference.some_field . So, all field access needs to be either via function calls over into C++, or via macros based around addr_of and read (which would be in a function call itself).

So,

// entirely auto-generated code from bindings generator
struct SomeCppType {
// my_field: usize, // not actually represented
}

impl SomeCppType {
fn get_my_field(self: CppRef<Self>) -> usize { ... }
fn set_my_field(self: CppRef<Self>, val: usize) { ... }
fn get_my_field_ref(self: CppRef<Self>) -> CppRef<usize> { ... }
}

I wanted to find out if this could be made slightly more ergonomic using a macro like field!(some_value, field_name) . This may depend upon stabilization of concat_idents! . I couldn’t get it to work, but ultimately I don’t think it’s a huge deal to need to call methods to get and set field values.

What about method calls?

The awesome thing about CppRef<T> is that it’s pretty much an opaque token. You’ll most commonly get a CppRef<T> from C++, and pass it back to C++, without any need to manipulate or touch the CppRef<T> at all. Most commonly, you’ll pass it back to C++ using as the this pointer in a method call:

fn main() {
let vulture: CppRef<Vulture> = get_cpp_reference_to_vulture_from_cpp();
vulture.squawk(); // autogenerated method
}

This is what the “arbitrary self types” feature allows.

However, it would also be nice to call squawk() on a CppPin<Vulture>:

fn main() {
let vulture: CppPin<Vulture> = obtain_vulture_by_value_from_cpp();
vulture.squawk(); // autogenerated method
}

This is where the question of generic self types first comes in.

Which of these is better for our (auto-generated) squawk method signature?

impl Vulture {
// This code would be auto generated
fn squawk(self: CppRef<Self>) {} // 1
fn squawk(self: impl AsCppRef<Target=Self>) {} // 2
}

The second option seems appealing because we could implement AsCppRef even on CppPin . This works, but it turns out not to be especially ergonomic, because it consumes the CppPin each time. That is, you couldn’t do:

fn main() {
let vulture: CppPin<Vulture> = obtain_vulture_by_value_from_cpp();
vulture.squawk();
vulture.squawk();
}

You would instead have to do:

fn main() {
let vulture: CppPin<Vulture> = obtain_vulture_by_value_from_cpp();
vulture.as_cpp_ref().squawk();
vulture.squawk();
}

which is very similar to the annoying Pin::as_mut method.

Overall, it seems better to pick option 1, and force people to call as_cpp_ref() each time they want to call a method on the contents of the CppPin . This doesn’t yet seem like a sufficiently good motivation for generic self types.

Finally — what about code that wants to be generic over the type of reference? That is, code which can handle a &Vulture or a CppRef<Vulture> ? Is that even achievable?

Yes!

impl Vulture {
/// This method can accept either &Self or CppRef<Self>
/// because both of them impl a Ref trait
fn squawk(self: impl Ref<Target = Self>) -> u32 {
// What to do here?
}

fn squawk_only_in_rust(&self) {}
fn squawk_only_in_cpp(self: CppRef<Self>) {}
}

One oddity here is that Rust method calls’ autoref functionality doesn’t work here, so if we want to call this method with a &Vulture we need to say (&my_vulture_by_value).squawk() . Here’s how we’d call this:

let rust_accessible_vulture = Vulture(1);
let cpp_accessible_vulture = CppPin::new(Vulture(2));
(&rust_accessible_vulture).squawk();
cpp_accessible_vulture.as_cpp_ref().squawk();

But more importantly, what could squawk actually do here? The impl Ref is pretty useless — it’s no longer even a useful opaque token to pass back into C++. However, since both references can emit raw pointers, we can do field access. Even though CppRef<T> promises nothing about aliasing or mutability, it can still uphold C++ reference-like promises around alignment and not being null. So the squawk function here could, with suitable use of macros and autogenerated code, access fields within the Vulture and do useful work.

This is a good use for generic receivers.

So. Conclusions are:

  • CppPin<T> is necessary and I don’t think it sucks, though it is annoyingly like Pin in some ways.
  • Field access to CppRef<T> and/or impl Ref<T> are ugly and will probably need methods calls or macros, but this is OK since usually a CppRef<T> is just an opaque token which will be passed back to C++, and field access will be rare.
  • We probably do want to support generic self types, since sometimes people will want to write code that’s generic over &T or CppRef<T> .

--

--

Adrian Taylor

Ade works on Chrome at Google, and likes mountain biking, climbing, snowboarding, and usually his kids. All opinions are my own.