But how much data does it take to send terrain information? Why not just send the picture of the terrain every moment (stream it) rather than whatever they’re doing?
Because it requires computing power from the GPU to translate the terrain into an image of the terrain. They’re using your local GPU for that since GPUs are expensive, and also it minimizes latency between control input and view update. If you turn the camera you want that new view immediately, not 200ms later.
Data vs compute
It’s easy to send all the data in an x mile radius of the players position. Or to identify the players position, speed, camera angle, etc. render it all, compress it, and then send the computer, rendered, video fees.
That would require Microsoft to do something like running a 1:1 local render of everything the player is doing in their sim, for everyone playing the game, at all times. And then they’d have to stream that video feed to the player and somehow make sure the elsewhere-rendered terrain is synced up perfectly with the player’s local game. Doesn’t really seem reasonable.