科研实习Week 1 | 星渊小站

以下是科研实习第一周的进度

整体思路

识别系统信息

不同的系统给我们提供了一系列API供我们调用来检测系统，在CPP中有以下宏：

_WIN32 //判断是否是windows
__linux__ //判断是否是linux
__APPLE__ //判断是否是MacOS
__unix__ //判断是否是类unix系统，linux和MacOS的此值亦为1

因此我们可以使用下述方式来判断系统：

  #ifdef __linux__

  #elif _WIN32

  #elif __APPLE__

  #elif __unix__

  #endif

检测系统硬件

不同的系统对于系统硬件的检测方式各不相同，下面仅描述Linux和Windows的检测

Linux

Linux 系统自带一系列工具，我们针对的硬件主要为Nvidia GPU,AMD GPU 和 CPU，因此我们主要要成功检测是否存在GPU设备，如果有我们就将运算进行在GPU上，反之运行在CPU上。

C/C++/Fortran并没有提供一套函数来进行设备检测，因此这一部分需要我们自己来写。

由于在Linux上，安装了Nvidia驱动/AMD驱动都会提供一个CLI来提供显卡信息，我们可以通过检测是否存在驱动来判断是什么设备。

我们可以使用以下函数来检测一个CLI是否存在

bool commandExists(const std::string& command) {
    std::string cmd = "which " + command + " > /dev/null 2>&1";
    return system(cmd.c_str()) == 0;
}

因此我们可以使用：

commandExists("nvidia-smi")
commandExists("rocm-smi")

从而识别电脑的硬件。

Windows

Windows上就比较复杂，我们需要使用DXGI来识别显卡

  IDXGIFactory* pFactory = nullptr;
  IDXGIAdapter* pAdapter = nullptr;
  std::string result = "cpu";

  HRESULT hr = CreateDXGIFactory(__uuidof(IDXGIFactory), (void**)&pFactory);
  if (FAILED(hr))
    return result;
  for (UINT i = 0; pFactory->EnumAdapters(i, &pAdapter) != DXGI_ERROR_NOT_FOUND; ++i){
    DXGI_ADAPTER_DESC desc;
    pAdapter->GetDesc(&desc);

    if (desc.VendorId == 0x1002) {
      result = "amd";
      break;
    }
    else if (desc.VendorId == 0x10DE){
      result = "nvidia";
      break;
    }

    pAdapter->Release();
  }

  if (pAdapter) pAdapter->Release();
  if (pFactory) pFactory->Release();
  return result;

我们把他们放在同一个文件里，有：

#include <iostream>
#include <cstdlib>

#ifdef _WIN32
#include <windows.h>
#include <dxgi.h>
#pragma comment(lib, "dxgi.lib")
#endif

bool commandExists(const std::string& command) {
    std::string cmd = "which " + command + " > /dev/null 2>&1";
    return system(cmd.c_str()) == 0;
}


std::string judgeDevice(){
  #ifdef __linux__
  if(commandExists("nvidia-smi")){
    return "nvidia";
  }else if(commandExists("rocm-smi")){
    return "amd";
  }else {
    return "cpu";
  }
  #elif _WIN32
  IDXGIFactory* pFactory = nullptr;
  IDXGIAdapter* pAdapter = nullptr;
  std::string result = "cpu";

  HRESULT hr = CreateDXGIFactory(__uuidof(IDXGIFactory), (void**)&pFactory);
  if (FAILED(hr))
    return result;
  for (UINT i = 0; pFactory->EnumAdapters(i, &pAdapter) != DXGI_ERROR_NOT_FOUND; ++i){
    DXGI_ADAPTER_DESC desc;
    pAdapter->GetDesc(&desc);
    if (desc.VendorId == 0x1002) {
      result = "amd";
      break;
    }
    else if (desc.VendorId == 0x10DE){
      result = "nvidia";
      break;
    }
    pAdapter->Release();
  }
  if (pAdapter) pAdapter->Release();
  if (pFactory) pFactory->Release();
  return result;
  #elif __APPLE__

  #elif __unix__

  #endif
}

识别最优硬件

一般来说我们认为GPU性能优于CPU，因此我们可以认为如果识别到GPU就使用GPU，否则使用CPU运行

但是我们需要注意一点，对于多GPU来说，如果检测到核显，我们需要屏蔽掉来使用独显运算。

一般来说，存在独显的电脑，核显的显存分配不会超过1GB，因此我们可以使用：

if (desc.DedicatedVideoMemory > 512 * 1024 * 1024)  // 大于512MB显存
{
    if (desc.VendorId == 0x1002)  // AMD
    {
        result = "amd";
        break;
    }
    else if (desc.VendorId == 0x10DE)  // NVIDIA
    {
        result = "nvidia";
        break;
    }
}

而在Linux上，ROCm会自动屏蔽核显，因此无须改动

将代码运行在合适的GPU上

不同的设备有不同的加速方案，我们需要做的就是根据不同的设备来运行不同的代码

Nvidia平台可以使用Cuda、OpenACC、DirectX来加速运算

AMD平台以及国产加速硬件可以使用ROCm、OpenACC、DirectX来加速运算

CPU上可以使用MPI、OpenMP来加速运算

目前平台正在搭建中