It wouldn’t cost any CPU with custom software that Google can afford to write. The video is streamed by delivering blocks of data from drives where the data isn’t contiguous. It’s split across multiple drives on multiple servers. Video files are made of key frames and P frames and B in between the key frames. Splicing at key frames need no processing. The video server when sending the next block only needs a change to send blocks based on key frames. It can then inject ads without any CPU overhead.
You’re forgetting the part where the video is coming from a cache server that isn’t designed to do this
Wouldn’t it still need overhead to chose those blocks and send them instead of the video? Especially if they’re also trying to do it in a way that prevents the user from just hitting the “skip 10 seconds” button like they might if it was served as part of the regular video.
It has to know which blocks to chose to get the next part of the file anyway. Except the next part of the file is an ad. So yes there is overhead but not for the video stream server. It doesn’t need to re encode the video. It’s not any more taxing than adding the non skip ads at the beginning that they already do.