[C++] std::optional与RVO：最高效的std::optional实践与探究

2023-09-04 21:51 由 Maidzuki 发表于 #后端开发

返回值优化RVO

在cppreference中，是这么介绍RVO的

In a return statement, when the operand is the name of a non-volatile object with automatic storage duration, which isn't a function parameter or a catch clause parameter, and which is of the same class type (ignoring cv-qualification) as the function return type. This variant of copy elision is known as NRVO, "named return value optimization."

即在返回函数内部临时变量（非函数参数，非catch参数）时，如果该参数的的类型和函数返回值类型相同，编译器就被允许去直接构造返回值（即使copy/move构造函数具有副作用）。

std::optional

std::optional是在C++17引入的，常用于有可能构造失败的函数，作为函数的返回值。

在cppreference中，std::optional的例子如下：

#include <iostream>
#include <optional>
#include <string>
 
// optional can be used as the return type of a factory that may fail
std::optional<std::string> create(bool b)
{
    if (b)
        return "Godzilla";
    return {};
}
 
// std::nullopt can be used to create any (empty) std::optional
auto create2(bool b)
{
    return b ? std::optional<std::string>{"Godzilla"} : std::nullopt;
}
 
int main()
{
    std::cout << "create(false) returned "
              << create(false).value_or("empty") << '\n';
 
    // optional-returning factory functions are usable as conditions of while and if
    if (auto str = create2(true))
        std::cout << "create2(true) returned " << *str << '\n';
}

一个尴尬的情况是这个例子并没有介绍在函数内部构造一个左值变量然后返回的情况，于是乎网上就出现了很多种return optional的写法。本文就想探讨下究竟哪一种写法才是最高效的。

实验

参数

编译器：x86-64 gcc 13.2

编译参数 -O1 -std=c++17

基于compiler explorer

准备工作

假设我们原始的函数具有以下形式

A always_success_0(int n) {
    A temp(someFn(n));
    return temp;
}

如果单纯作为可能fail的函数的一层包装，一种很自然的想法是只把函数的返回值改为std::optional，而函数体不变，即

optional<A> introduce_option_0(int n) {
    A temp(someFn(n));
    return temp;
}

很明显这会破坏NRVO的条件，但究竟相差多少呢？有没有挽回办法？

我找了网上目前常见的写法，我们可能有以下变体

optional<A> introduce_option_0(int n) {
    A temp(someFn(n));
    return temp;
}

optional<A> introduce_option_1(int n) {
    A temp(someFn(n));
    return std::move(temp);
}

optional<A> introduce_option_2(int n) {
    A temp(someFn(n));
    return {temp};
}

optional<A> introduce_option_3(int n) {
    A temp(someFn(n));
    return {std::move(temp)};
}

为了探究NRVO的条件和优化程度，对原本的函数也使用这4种变体

A always_success_0(int n) {
    A temp(someFn(n));
    return temp;
}

A always_success_1(int n) {
    A temp(someFn(n));
    return std::move(temp);
}

A always_success_2(int n) {
    A temp(someFn(n));
    return {temp};
}

A always_success_3(int n) {
    A temp(someFn(n));
    return {std::move(temp)};
}

同时让我们定义struct A

struct A{
    int ctx;
    A(int x) noexcept {
        ctx=x+1;
        printf("default construct");
        }
    A(const A&) noexcept {
        printf("copy construct");
    }
    A(A&& ano) noexcept {
        printf("move construct");
    }
    ~A() noexcept {
        printf("destruct");
    }
};

tips:

使用noexcept使编译器允许进一步优化，否则汇编会增加一段异常处理，如下图所示

同时为了方便定位，防止编译器进一步优化，我们将someFn写成一个具有副作用的函数

int someFn(int n) {
    int x;
    scanf("%d",&x);
    return x+n;
}

现在我们有了进行编译的所有代码：

#include <cstdio>
#include <optional>
using std::optional;

int someFn(int n) {
    int x;
    scanf("%d",&x);
    return x+n;
}

struct A{
    int ctx;
    A(int x) noexcept {
        ctx=x+1;
        printf("default construct");
        }
    A(const A&) noexcept {
        printf("copy construct");
    }
    A(A&& ano) noexcept {
        printf("move construct");
    }
    ~A() noexcept {
        printf("destruct");
    }
    A& operator=(const A&) {
        printf("copy op");
    }
    A& operator=(A&&) {
        printf("move op");
    }
};

A always_success_0(int n) {
    A temp(someFn(n));
    return temp;
}

A always_success_1(int n) {
    A temp(someFn(n));
    return std::move(temp);
}

A always_success_2(int n) {
    A temp(someFn(n));
    return {temp};
}

A always_success_3(int n) {
    A temp(someFn(n));
    return {std::move(temp)};
}

optional<A> introduce_option_0(int n) {
    A temp(someFn(n));
    return temp;
}

optional<A> introduce_option_1(int n) {
    A temp(someFn(n));
    return std::move(temp);
}

optional<A> introduce_option_2(int n) {
    A temp(someFn(n));
    return {temp};
}

optional<A> introduce_option_3(int n) {
    A temp(someFn(n));
    return {std::move(temp)};
}

编译

我们可以看到always_success_0函数发生了RVO，只调用了一次构造函数。而always_success_1没有进行RVO，额外调用了移动构造函数和析构函数，这也是滥用std::move的一个后果。

再看到introduce_option_0函数，它与发生移动的always_success的汇编代码相比，只多了一行设置std::optional::_Has_value布尔值的汇编。

函数	默认构造	拷贝构造	移动构造	析构	设置bool
always_success_0	1
always_success_1	1		1	1
always_success_2	1	1		1
always_success_3	1		1	1
introduce_option_0	1		1	1	1
introduce_option_1	1		1	1	1
introduce_option_2	1	1		1	1
introduce_option_3	1		1	1	1
*modify_reference	*2		*1	1

*为UE库中一些形如以下的函数，
bool modify_reference(int n, A& out) {
    out = someFn(n);
    return true;
}
*算上了函数调用前的接收者的默认构造

*函数内会调用移动赋值=而不是移动构造

Best result

可以观察到，触发了RVO的汇编会精简很多，我们要想方设法去触发RVO。以下两种改良都可以触发RVO

A not_always_success_best(int n, bool &b) {
    A temp(someFn(n));
    b = true;
    return temp;
}

optional<A> optional_best(int n) {
    optional<A> temp(someFn(n));
    return temp;
}

可以看到这两种方式的函数体的汇编是一样的，不一样的只有参数传递时对栈的操作。

总结

std::optional最高效的写法是触发RVO的写法，即：

optional<A> optional_best(int n) {
    optional<A> temp(someFn(n));
    return temp;
}