- __m256 _mm256_cvtph_ps (__m128i a)
- __m128i _mm256_cvtps_ph (__m256 a, int rounding)
関連
Windows で CUDA を使う
>word2vec_cbow.exe -train text8 -output vectors.bin -cbow 1 -size 200 -window 7 -negative 1 -hs 1 -sample 1e-3 -threads 1 -binary 1 -save-vocab voc
Starting training using file text8
Vocab size: 71290
Words in train file: 16718843
vocab size = 71290
Alpha: 0.000009 Progress: 99.99% Words/thread/sec: 1528.12k
UPDATES SINCE RELEASE CANDIDATE
- The API of cudnnBatchNormalizationBackward has been changed to include an additional set of scaling parameters (alphaParamsDiff and betaParamsDiff) applied to the dBnScaleResult and dBnBiasResult outputs of the function.
- The prior restriction of batch size 512 in all Batch Normalization routines has been removed.
- Numerical stability and performance of cudnnBatchNormalizationBackward in some cases has been improved.
- Performance of cudnnConvolutionBackwardFilter when using Algo 1 has been improved for some cases. This code path now also requires a workspace.
NEW FEATURES
- Batch Normalization routines have been added.
- Convolution forward and backward now supports NHWC tensor format.
- FFT Tiling algorithm has been added for cudnnConvolutionForward and cudnnConvolutionBackwardData routines
- cudnnConvolutionForward now supports computation in FP16 when run on GPU with a compute capability >= 5.3
- cudnnConvolutionForward has been optimized for batch size = 1
- Pooling and activation routines have a descriptor option to propagate NaN numbers.