The conceptually simplest way to produce a W state is somewhat analogous to classical reservoir sampling, in that it involves a series of local operations that ultimately create a uniform effect.
Basically, you look at each qubit in turn and consider "how much amplitude do I have left in the all-0s state, and how much do I want to transfer into the just-this-qubit-is-ON state?". It turns out that the family of rotations you need is what I'll call the "odds gates" which have the following matrix:
M(p:q)=1p+q−−−−−√[p–√−q√q√p–√]
Using these gates, you can get a W state with a sequence of increasingly-controlled operations:
This circuit is somewhat inefficient. It has cost O(N2+Nlg(1/ϵ)) where N is the number of qubits and ϵ is the desired absolute precision (since, in an error corrected context, the odds gates are not native and must be approximated).
We can improve the efficiency by switching from a "transfer out of what was left behind" strategy to a "transfer out of what is traveling along" strategy. This adds a fixup sweep at the end, but only requires single controls on each operation. This reduces the cost to O(Nlg(1/ϵ)):
It is still possible to do better, but it starts to get complicated. Basically, you can use a single partial Grover step to get N amplitudes equal to 1/N−−−−√ but they will be encoded into a binary register (we want a one-hot register with a single bit set). Fixing this requires a binary-to-unary conversion circuit. The tools needed to do this are covered in "Encoding Electronic Spectra in Quantum Circuits with Linear T Complexity"). Here are the relevant figures.
The partial grover step:
How to perform an indexed operation (well... sort of. the closest figure had an accumulator which is not quite right for this case):
Using this more complicated approach reduces the cost from O(Nlg(1/ϵ)) to O(N+lg(1/ϵ)).