Neon

Bayer RAW10 packed to Y16

Data is organised in 5 bytes chunks to store 4 x RAW10 pixel data:

General idea:

Useful info on ARM:

Useful info:

From https://stackoverflow.com/questions/71554911/how-to-vectorize-2d-array-using-neon-intrinsics:

Consider using OpenMP, add #pragma omp parallel for before for loop and -fopenmp to the compiler cmdline

Auto vectorization in GCC:

Instead of using ARM Neon, use OpenCV wrapper which provides portability across platforms: