cudaMemcpyのコピー方向とUVA

cudaMemcpy の引数はコピー先, コピー元, バイト数, そしてコピー方向。コピー方向はコピー先/元がHostかDeviceのそれぞれに応じて4種類...ともうひとつ cudaMemcpyDefault てのがあります。

cudaMemcpyDefault はコピー元/先に与えたポインタがHost/Deviceのどちらかを判別し善きに計らってくれます。が cudaMemcpyDefault を使うには「Unified Virtual Addressing をサポートしていること」て但し書きがあります。

UVA(Unified Virtual Addressing) とは、CPUとGPUのメモリ空間を仮想的にひとつの空間に配置できるてゆーよくわかんない機能でして、これが有効じゃないと与えられたポインタがHost/Deviceのどっちか判別できんとのこと。

で、UVAが有効か否かはこんなコードで確認できます:

#include <cuda_runtime.h>
#include <device_launch_parameters.h>

#include <iostream>
using namespace std;

int main() {

    {
    int device;
    cudaGetDevice(&device);
    cudaDeviceProp property;
    // 現デバイスのプロパティを取得し、
    cudaGetDeviceProperties(&property, device);
    // unifiedAddressing != 0 なら UVA有効
    if ( property.unifiedAddressing ) {
      cout << "UVA enabled, cudaMemcpyDefault can be used." << endl;
    } else {
      cout << "sorry, no-UVA" << endl;
    }
  }

  // 試しに cudaMemcpuDefault を使ってみる
  int* host_ptr;   host_ptr = new int[1];
  int* device_ptr; cudaMalloc(&device_ptr, sizeof(int));

  // host->device
  *host_ptr = 12345;
  cudaMemcpy(device_ptr, host_ptr, sizeof(int), cudaMemcpyDefault);
  // device->host
  *host_ptr = 0;
  cudaMemcpy(host_ptr, device_ptr, sizeof(int), cudaMemcpyDefault);

  if ( *host_ptr == 12345 ) {
    cout << "ok." << endl;
  } else {
    cout << "oops!." << endl;
  }

  delete[] host_ptr;
  cudaFree(device_ptr);
  cudaDeviceReset();
}

UVAは近頃の大抵のGPUでサポートしてるのですが、32bitだとダメです。32bitではGPU/CPU双方の空間をひとつにするに十分な大きさじゃないんですな。