====== basics ====== * loading / storing data to vectors: * from constant array [[https://developer.arm.com/documentation/den0018/a/NEON-Intrinsics/Constructing-a-vector-from-a-literal-bit-pattern|Constructing a vector from a literal bit pattern]] * load all bytes from pointed memory ``vld1_`` * load from interleaved memory - support up to every 4th element (usefull for RGBA / CMYK): ``vld2_`` ``vld3_`` ``vld4_``. * load from interleaved/random memory - possible to load one vector element (lane) from given pointer: ''vld1_lane_'' * random element store ''ret[0]=vgetq_lane_s32(o,0)'', ''ret[1]=vgetq_lane_s32(o,1)'' etc * load using lookup tables (permutation) - possible to load 8byte vector using 2nd 8 bytes vector with memory position indexes. * combine 8 bit values