Data is organised in 5 bytes chunks to store 4 x RAW10 pixel data:
General idea:
Useful info on ARM:
Useful info:
From https://stackoverflow.com/questions/71554911/how-to-vectorize-2d-array-using-neon-intrinsics:
Consider using OpenMP, add #pragma omp parallel for before for loop and -fopenmp to the compiler cmdline
Auto vectorization in GCC:
Instead of using ARM Neon, use OpenCV wrapper which provides portability across platforms: