OpenCLで利用可能なGPUメモリサイズを取得するには?

OpenCLで利用可能なGPUメモリサイズを取得するには?

OpenCLを使用して計算を実行するプログラムで使用できるGPUのメモリサイズを取得する方法暗いテーブル

いくつかの一般的な情報が提供されることを知っていますが、lspci私が探している情報ではありません。

$ sudo lspci -v -s 01:00.0
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Curacao XT [Radeon R9 270X] (prog-if 00 [VGA controller])
    Subsystem: Gigabyte Technology Co., Ltd Device 227d
    Flags: bus master, fast devsel, latency 0, IRQ 49
    Memory at d0000000 (64-bit, prefetchable) [size=256M]
    Memory at fe780000 (64-bit, non-prefetchable) [size=256K]
    I/O ports at c000 [size=256]
    Expansion ROM at fe7c0000 [disabled] [size=128K]
    Capabilities: [48] Vendor Specific Information: Len=08 <?>
    Capabilities: [50] Power Management version 3
    Capabilities: [58] Express Legacy Endpoint, MSI 00
    Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
    Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
    Capabilities: [150] Advanced Error Reporting
    Capabilities: [270] #19
    Capabilities: [2b0] Address Translation Service (ATS)
    Capabilities: [2c0] #13
    Capabilities: [2d0] #1b
    Kernel driver in use: fglrx_pci

256MBが表示されます。これは、ダークテーブルがOpenCLで動作し、少なくとも768MBが必要なため、非現実的で少なすぎます(GPUの合計メモリは4GB)。

clinfo次に、以下を提供する(clinfoパッケージ)があります。

Number of platforms:                 1
  Platform Profile:              FULL_PROFILE
  Platform Version:              OpenCL 1.2 AMD-APP (1411.4)
  Platform Name:                 AMD Accelerated Parallel Processing
  Platform Vendor:               Advanced Micro Devices, Inc.
  Platform Extensions:               cl_khr_icd cl_amd_event_callback cl_amd_offline_devices cl_amd_hsa 


  Platform Name:                 AMD Accelerated Parallel Processing
Number of devices:               2
  Device Type:                   CL_DEVICE_TYPE_GPU
  Device ID:                     4098
  Board name:                    AMD Radeon R9 200 Series
  Device Topology:               PCI[ B#1, D#0, F#0 ]
  Max compute units:                 20
  Max work items dimensions:             3
    Max work items[0]:               256
    Max work items[1]:               256
    Max work items[2]:               256
  Max work group size:               256
  Preferred vector width char:           4
  Preferred vector width short:          2
  Preferred vector width int:            1
  Preferred vector width long:           1
  Preferred vector width float:          1
  Preferred vector width double:         1
  Native vector width char:          4
  Native vector width short:             2
  Native vector width int:           1
  Native vector width long:          1
  Native vector width float:             1
  Native vector width double:            1
  Max clock frequency:               1100Mhz
  Address bits:                  32
  Max memory allocation:             1073741824
  Image support:                 Yes
  Max number of images read arguments:       128
  Max number of images write arguments:      8
  Max image 2D width:                16384
  Max image 2D height:               16384
  Max image 3D width:                2048
  Max image 3D height:               2048
  Max image 3D depth:                2048
  Max samplers within kernel:            16
  Max size of kernel argument:           1024
  Alignment (bits) of base address:      2048
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                     No
    Quiet NaNs:                  Yes
    Round to nearest even:           Yes
    Round to zero:               Yes
    Round to +ve and infinity:           Yes
    IEEE754-2008 fused multiply-add:         Yes
  Cache type:                    Read/Write
  Cache line size:               64
  Cache size:                    16384
  Global memory size:                3221225472
  Constant buffer size:              65536
  Max number of constant args:           8
  Local memory type:                 Scratchpad
  Local memory size:                 32768
  Kernel Preferred work group size multiple:     64
  Error correction support:          0
  Unified memory for Host and Device:        0
  Profiling timer resolution:            1
  Device endianess:              Little
  Available:                     Yes
  Compiler available:                Yes
  Execution capabilities:                
    Execute OpenCL kernels:          Yes
    Execute native function:             No
  Queue properties:              
    Out-of-Order:                No
    Profiling :                  Yes
  Platform ID:                   0x00007fce5d932500
  Name:                      Pitcairn
  Vendor:                    Advanced Micro Devices, Inc.
  Device OpenCL C version:           OpenCL C 1.2 
  Driver version:                1411.4 (VM)
  Profile:                   FULL_PROFILE
  Version:                   OpenCL 1.2 AMD-APP (1411.4)
  Extensions:                    cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir 


  Device Type:                   CL_DEVICE_TYPE_CPU
  Device ID:                     4098
  Board name:                    
  Max compute units:                 2
  Max work items dimensions:             3
    Max work items[0]:               1024
    Max work items[1]:               1024
    Max work items[2]:               1024
  Max work group size:               1024
  Preferred vector width char:           16
  Preferred vector width short:          8
  Preferred vector width int:            4
  Preferred vector width long:           2
  Preferred vector width float:          4
  Preferred vector width double:         2
  Native vector width char:          16
  Native vector width short:             8
  Native vector width int:           4
  Native vector width long:          2
  Native vector width float:             4
  Native vector width double:            2
  Max clock frequency:               2664Mhz
  Address bits:                  64
  Max memory allocation:             2147483648
  Image support:                 Yes
  Max number of images read arguments:       128
  Max number of images write arguments:      8
  Max image 2D width:                8192
  Max image 2D height:               8192
  Max image 3D width:                2048
  Max image 3D height:               2048
  Max image 3D depth:                2048
  Max samplers within kernel:            16
  Max size of kernel argument:           4096
  Alignment (bits) of base address:      1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                     Yes
    Quiet NaNs:                  Yes
    Round to nearest even:           Yes
    Round to zero:               Yes
    Round to +ve and infinity:           Yes
    IEEE754-2008 fused multiply-add:         Yes
  Cache type:                    Read/Write
  Cache line size:               64
  Cache size:                    32768
  Global memory size:                6258630656
  Constant buffer size:              65536
  Max number of constant args:           8
  Local memory type:                 Global
  Local memory size:                 32768
  Kernel Preferred work group size multiple:     1
  Error correction support:          0
  Unified memory for Host and Device:        1
  Profiling timer resolution:            1
  Device endianess:              Little
  Available:                     Yes
  Compiler available:                Yes
  Execution capabilities:                
    Execute OpenCL kernels:          Yes
    Execute native function:             Yes
  Queue properties:              
    Out-of-Order:                No
    Profiling :                  Yes
  Platform ID:                   0x00007fce5d932500
  Name:                      Intel(R) Core(TM)2 Duo CPU     E6750  @ 2.66GHz
  Vendor:                    GenuineIntel
  Device OpenCL C version:           OpenCL C 1.2 
  Driver version:                1411.4 (sse2)
  Profile:                   FULL_PROFILE
  Version:                   OpenCL 1.2 AMD-APP (1411.4)
  Extensions:                    cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_amd_svm 

名前にメモリを含む値がありますが、その中で利用可能な合計メモリ量は何ですか?どのユニットから?グローバルメモリサイズ512MBビットで、最大メモリ割り当て単位は256MBです。ローカルメモリサイズおそらく4GB(MB)でしょう。 clinfoにはマンページや組み込みのヘルプはありません-h

利用可能なGPUメモリの量を取得するためにこれらの値を正しく解釈する方法は?使用できる他のプログラムはありますか?

さらに:なぜOpenCLのタグがないのですか?

ベストアンサー1

今は答えを得たかもしれませんが、出力はclinfoビットではなくバイト単位です。したがって、グローバルメモリサイズは512MBではなく約3GBです。

おすすめ記事