Neither. NIMH ML has the waveform ready for immediate play, before the scene starts. However, there is still an internal delay in Windows audio architecture that cannot be avoided. My paper about this is in review. The actual delay is dependent on many factors, but, if you are using XAudio2, it is usually about 37 ms.
https://monkeylogic.nimh.nih.gov/docs_AudioEngine.htmlAudioSound ends the scene after the entire waveform is played, so the duration of the scene becomes longer than 100 ms, as explained in your previous question. Assuming that the scene ended one frame later and the frame length is about 16 ms, it explains why your scene duration was 150 ms (= 37 + 100 + 16).
https://monkeylogic.nimh.nih.gov/board/read.php?3,1294,1294#msg-1294You can shorten the internal delay to about 1 ms by using the WASAPI exclusive mode. That does not make the scene duration close to 100 ms though, for a different reason. If your concern is the length of the stimulus, you should record the sound output, not the scene duration. The sound duration is exactly 100 ms, regardless of the scene duration.