If your processor doesn't have hardware support for a parameter/local stack but the compiler tries to implement a run-time parameter stack anyway, and if your code doesn't need to be re-entrant, you may be able to save code space by statically allocating auto variables. In some cases, this must be done manually; in other cases, compiler directives can do it. Efficient manual allocation will require sharing of variables between routines. Such sharing must be done carefully, to ensure that no routine uses a variable which another routine considers to be "in scope", but in some cases the code-size benefits may be significant.
Some processors have calling conventions that may make some parameter-passing styles more efficient than others. For example, on the PIC18 controllers, if a routine takes a single one-byte parameter, it may be passed in a register; if it takes more than that, all parameters must be passed in RAM. If a routine would take two one-byte parameters, it may be most efficient to "pass" one in a global variable, and then pass the other as a parameter. With widely-used routines, the savings can add up. They can be especially significant if the parameter passed via global is a single-bit flag, or if it will usually have a value of 0 or 255 (since special instructions exist to store a 0 or 255 into RAM).
On the ARM, putting global variables which are frequently used together into a structure may significantly reduce code size and improve performance. If A, B, C, D, and E are separate global variables, then code which uses all of them must load the address of each into a register; if there aren't enough registers, it may be necessary to reload those addresses multiple times. By contrast, if they are part of the same global structure MyStuff, then code which uses MyStuff.A, MyStuff.B, etc. can simply load the address of MyStuff once. Big win.