From: Denis Vlasenko Looks like open-coded be_to_cpu. GCC produces rather poor code for this. be_to_cpu produces asm()s which are ~4 times shorter. Compile-tested only. I am not sure whether input can be 32bit-unaligned. If it indeed can be, replace: ((u32*)(input))[I] -> get_unaligned( ((u32*)(input))+I ) Signed-off-by: Andrew Morton --- 25-akpm/crypto/sha256.c | 10 +--------- 1 files changed, 1 insertion(+), 9 deletions(-) diff -puN crypto/sha256.c~small-sha256-cleanup crypto/sha256.c --- 25/crypto/sha256.c~small-sha256-cleanup 2004-10-01 21:20:39.113354352 -0700 +++ 25-akpm/crypto/sha256.c 2004-10-01 21:20:39.117353744 -0700 @@ -63,15 +63,7 @@ static inline u32 RORu32(u32 x, u32 y) static inline void LOAD_OP(int I, u32 *W, const u8 *input) { - u32 t1 = input[(4 * I)] & 0xff; - - t1 <<= 8; - t1 |= input[(4 * I) + 1] & 0xff; - t1 <<= 8; - t1 |= input[(4 * I) + 2] & 0xff; - t1 <<= 8; - t1 |= input[(4 * I) + 3] & 0xff; - W[I] = t1; + W[I] = __be32_to_cpu( ((u32*)(input))[I] ); } static inline void BLEND_OP(int I, u32 *W) _