meta data for this page
  •  

This is an old revision of the document!


Neon

Bayer RAW10 packed to Y16

Data is organised in 5 bytes chunks to store 4 x RAW10 pixel data:

  • 4 bytes of 8bit MSB of 10bit
  • 1 byte of 4 x 2bit LSB

General idea:

  • Load data
    • Neon can perform interleaved loads by 4
    • Neon can use lookup table load, so we can specify what byte loads where
  • Extend 8 bit vectors to 16bit vectors
    • max register size ? 128bit so 16 x 8 bit or 8 x 16bit. Load only 8x 8bit to extend it to 8x16bit ? Or split using vget_low_s8() ?
    • with shift left by 2 ?
  • ??? How to insert 2 LSB bits ?
    • Create 3 more copies of byte with LSB bits
    • In each copy, organise bits to be in correct place.
    • Insert bits to 16bit OR - there is instruction to insert and shift ???
  • Store data:
    • Interleaved store by 3

Useful info on ARM:

Useful info:

From https://stackoverflow.com/questions/71554911/how-to-vectorize-2d-array-using-neon-intrinsics:

Consider using OpenMP, add #pragma omp parallel for before for loop and -fopenmp to the compiler cmdline

Auto vectorization in GCC:

Instead of using ARM Neon, use OpenCV wrapper which provides portability across platforms: