Yes, world-space (mm) is in MNI, voxels are not. This is a general neuroimaging thing that most people find confusing in the beginning (and some never bother to learn).. The image itself is aways a matrix with numbers and the header affine matrix defines where each pixel/voxel ends up in. Now a simple example is that you could build the same MNI space with small or large bricks. In the large brick version (say with a voxel resolution of 3x3x3 mm) voxel number 10 could end up at a coordinate of 30 mm, whereas in the 2x2x2 mm version it would end up at 20 mm. Now the header matrix makes sure the correct voxels end up in the same space.
To attain a very high resolution but small file size we export the vtas in a very high-res spacing but define the data just exactly around the VTA itself.
So yes you’re correct, it’s in MNI, but the data has a completely different resolution. Good neuroimaging viewers wouldn’t care and just overlay them. AFAIK fsl forces users to have everything in the exact same resolution (at least fslview does).