MD5/Implementation

From Rosetta Code
< MD5
Revision as of 05:37, 20 May 2011 by rosettacode>CRGreathouse (strengthen warning -- MD5 is totally broken and should not be used in any application requiring security)
Task
MD5/Implementation
You are encouraged to solve this task according to the task description, using any language you may know.

The purpose of this task to code and validate an implementation of the MD5 Message Digest Algorithm by coding the algorithm directly (not using a call to a built-in or external hashing library). For details of the algorithm refer to MD5 on Wikipedia or the MD5 definition in IETF RFC (1321).

  • The implementation needs to implement the key functionality namely producing a correct message digest for an input string. It is not necessary to mimic all of the calling modes such as adding to a digest one block at a time over subsequent calls.
  • In addition to coding and verifying your implementation, note any challenges your language presented implementing the solution, implementation choices made, or limitations of your solution.
  • Solutions on this page should implement MD5 directly and NOT use built in (MD5) functions, call outs to operating system calls or library routines written in other languages as is common in the original MD5 task.
  • The following are acceptable:
    • An original implementation from the specification, reference implementation, or pseudo-code
    • A translation of a correct implementation from another language
    • A library routine in the same language; however, the source must be included here.

The solutions shown here will provide practical illustrations of bit manipulation, unsigned integers, working with little-endian data. Additionally, the task requires an attention to details such as boundary conditions since being out by even 1 bit will produce dramatically different results. Subtle implementation bugs can result in some hashes being correct while others are wrong. Not only is it critical to get the individual sub functions working correctly, even small errors in padding, endianness, or data layout will result in failure.

The following verification strings and hashes come from RFC 1321:

                            hash code <== string 
   0xd41d8cd98f00b204e9800998ecf8427e <== ""  
   0x0cc175b9c0f1b6a831c399e269772661 <== "a"
   0x900150983cd24fb0d6963f7d28e17f72 <== "abc"
   0xf96b697d7cb7938d525a2f31aaf161d0 <== "message digest"
   0xc3fcd3d76192e4007dfb496cca67e13b <== "abcdefghijklmnopqrstuvwxyz"
   0xd174ab98d277d9f5a5611c2c9f419d9f <== "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
   0x57edf4a22be3c955ac49da2e2107b67a <== "12345678901234567890123456789012345678901234567890123456789012345678901234567890"

In addition, intermediate outputs to aid in developing an implementation can be found here.

The MD5 Message-Digest Algorithm was developed by RSA Data Security, Inc. in 1991.

   regsub -all -- {\[ *F +(\$.) +(\$.) +(\$.) *\]} $md5body {((\1 \& \2) | ((~\1) \& \3))} md5body
   #proc md5pure::G {x y z} {expr {(($x & $z) | ($y & (~$z)))}}
   regsub -all -- {\[ *G +(\$.) +(\$.) +(\$.) *\]} $md5body {((\1 \& \3) | (\2 \& (~\3)))} md5body
   #proc md5pure::H {x y z} {expr {$x ^ $y ^ $z}}
   regsub -all -- {\[ *H +(\$.) +(\$.) +(\$.) *\]} $md5body {(\1 ^ \2 ^ \3)} md5body
   #proc md5pure::I {x y z} {expr {$y ^ ($x | (~$z))}}
   regsub -all -- {\[ *I +(\$.) +(\$.) +(\$.) *\]} $md5body {(\2 ^ (\1 | (~\3)))} md5body
   # inline <<< (bitwise left-rotate)
   regsub -all -- {\[ *<<< +\[ *expr +({[^\}]*})\] +([0-9]+) *\]} $md5body {(([set x [expr \1]] << \2) |  (($x >> R\2) \& S\2))} md5body
   # now replace the R and S
   variable map {}
   variable i
   foreach i {

7 12 17 22 5 9 14 20 4 11 16 23 6 10 15 21

   } {

lappend map R$i [expr {32 - $i}] S$i [expr {0x7fffffff >> (31-$i)}]

   }
   # inline the values of T
   variable tName
   variable tVal
   foreach tName {

T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14 T15 T16 T17 T18 T19 T20 T21 T22 T23 T24 T25 T26 T27 T28 T29 T30 T31 T32 T33 T34 T35 T36 T37 T38 T39 T40 T41 T42 T43 T44 T45 T46 T47 T48 T49 T50 T51 T52 T53 T54 T55 T56 T57 T58 T59 T60 T61 T62 T63 T64

   } tVal {

0xd76aa478 0xe8c7b756 0x242070db 0xc1bdceee 0xf57c0faf 0x4787c62a 0xa8304613 0xfd469501 0x698098d8 0x8b44f7af 0xffff5bb1 0x895cd7be 0x6b901122 0xfd987193 0xa679438e 0x49b40821

0xf61e2562 0xc040b340 0x265e5a51 0xe9b6c7aa 0xd62f105d 0x2441453 0xd8a1e681 0xe7d3fbc8 0x21e1cde6 0xc33707d6 0xf4d50d87 0x455a14ed 0xa9e3e905 0xfcefa3f8 0x676f02d9 0x8d2a4c8a

0xfffa3942 0x8771f681 0x6d9d6122 0xfde5380c 0xa4beea44 0x4bdecfa9 0xf6bb4b60 0xbebfbc70 0x289b7ec6 0xeaa127fa 0xd4ef3085 0x4881d05 0xd9d4d039 0xe6db99e5 0x1fa27cf8 0xc4ac5665

0xf4292244 0x432aff97 0xab9423a7 0xfc93a039 0x655b59c3 0x8f0ccc92 0xffeff47d 0x85845dd1 0x6fa87e4f 0xfe2ce6e0 0xa3014314 0x4e0811a1 0xf7537e82 0xbd3af235 0x2ad7d2bb 0xeb86d391

   } {

lappend map \$$tName $tVal

   }
   set md5body [string map $map $md5body]
   # Finally, define the proc
   proc md5 {msg} $md5body
   # unset auxiliary variables
   unset md5body tName tVal map
   proc byte0 {i} {expr {0xff & $i}}
   proc byte1 {i} {expr {(0xff00 & $i) >> 8}}
   proc byte2 {i} {expr {(0xff0000 & $i) >> 16}}
   proc byte3 {i} {expr {((0xff000000 & $i) >> 24) & 0xff}}
   proc bytes {i} {
       format %0.2x%0.2x%0.2x%0.2x [byte0 $i] [byte1 $i] [byte2 $i] [byte3 $i]
   }

}</lang> Demonstration code: <lang tcl>foreach {hash <- string} {

  0xd41d8cd98f00b204e9800998ecf8427e ==> ""  
  0x0cc175b9c0f1b6a831c399e269772661 ==> "a"
  0x900150983cd24fb0d6963f7d28e17f72 ==> "abc"
  0xf96b697d7cb7938d525a2f31aaf161d0 ==> "message digest"
  0xc3fcd3d76192e4007dfb496cca67e13b ==> "abcdefghijklmnopqrstuvwxyz"
  0xd174ab98d277d9f5a5611c2c9f419d9f ==> "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
  0x57edf4a22be3c955ac49da2e2107b67a ==> "12345678901234567890123456789012345678901234567890123456789012345678901234567890"

} {

   puts "“$string” -> [md5::md5 $string] (officially: $hash)"

}</lang>