This post describes an implementation of the Blake 512 hashing algorithm in JavaScript. This algorithm is one of the finalists for the SHA-3 competition due to be completed in the next month or so. The source code for the implementation is available here: blake512.js
You can try it yourself by putting in a string to hash below:
The Algorithm
The specification of the Blake 512 algorithm is available online here: blake.pdf. This document also includes a reference C implementation, together with some test values against which to verify an implementation's correctness. If you're interested in what the algorithm actually does and how it works; that's the place to start reading! The only valid test case from the spec you can use with this implementation is the sample "2-block message" which encodes a sequence of 1152 zeros, using the standard salt (which is a sequence of 256 zeros). To run this test in your browser, open the developer console and enter:
Hopefully, the output should be:
313717D608E9CF75 8DCB1EB0F0C3CF9F C150B2D500FB33F5 1C52AFC99D358A2F 1374B8A38BBA7974 E7F6EF79CAB16F22 CE1E649D6E01AD95 89C213045D545DDE.
It must be said that JavaScript is certainly not the best language to implement this kind of hashing algorithm, and that this implementation is significantly slower than the reference C implementation. However, it should be quick enough for general usage and takes less than 1ms to encode a 2-block message on a modern browser.
Handling the Input
Since there is no native binary data type in JavaScript, this implementation uses JavaScript Strings to encode the message to hash. It follows the lead of Chris Drost's blake 32 (256) implementation, which treats the given message as a UCS-2 encoded binary sequence (i.e. JavaScript's native string encoding); e.g. the string '\uB210\u0061' ('눐a') encodes the hex sequence: B210 0061. The upside of encoding like this is that it's (hopefully) natural for the user and pretty easy to work with. Unfortunately however, it means that you can only encode messages whose length are multiples of 16 bits. Also, since the maximum possible string length is 2^32 bits (8GB), this limits the length of messages you can encode (the maximum message length the algorithm permits is actually 2^128 bits). This could probably be overcome by using arrays of string instead; as I can't be bothered - I'll leave that as a task for the interested reader!
Javascript and Numbers
The Blake algorithm requires a lot of bit level manipulation, so whilst it's appropriate to initially encode the given message as a JavaScript string, when it comes to actually manipulating it, it's far easier to convert this string into an array of numbers. The algorithm has an initial padding phase in which the given message is broken into sections of 1024 bit "blocks". Each block is considered to be comprised of a sequence of 16 64-bit "words". The bit-wise operations are then done on the individual words. Therefore, in a natural encoding, a "word" is an unsigned 64-bit integer, a block is an array of 16 words, and the message is just an array of blocks.
Unfortunately, there is no native 64-bit integer type in JavaScript. In JavaScript there is only one number type which stores them using a 64 bit floating point representation. This representation can only safely store integers values up to 2^53, see the 64 bit binary type on the wikipedia article for a reference on this (assuming no-one has maliciously altered it!).
Crucially, the binary operators such as and (&
), exclusive-or (^
) and shift (<<
, >>>
) required by the algorithm are supported by JavaScript. It performs these operations by converting the given numbers to 32 bit signed two's complement big-endian integer representations of them, performing the operation on these integers, then converting them back. An excellent reference on these operators is available from Mozilla. A side effect of this conversion in and out of 32 bit integers, is that the operation will only work if the number can be represented in this way. Also, another gotcha is that the bit-shift operators only work on the 5 least significant bits of the number to shift, so shifting by 32 bits (i.e. 100000), gives you the same result as shifting by 0 bits (i.e. a no-op). This can lead to some confusing code, for example, in Chris Drost's's 32 bit implementation, there is a line of code to get the number 2^32
which is written like so:
Although this looks bizarre at first, it actually quite sensible as you can't do 1 << 32
since you just get 1 back, nor can you do: (1 << 16) << 16
as you get 0.
So, without 64 bit integers, we're forced to hack them in some way, the standard way of doing this is to use an 2 element array of 32 bit integers to represent a single 64 bit integer, then implementing all the required operations yourself. e.g. you could represent the number: 2^64-1 (a sequence of 64 one's) as:
There are several implementations that you can use to do this, the most notable being Google's closure library's Long. However, since that code is part of the closure library, which I don't want a dependancy on, I wrote my own 64 bit number class called Word
(i.e. as it represents a single 64 bit "word" in the algorithm). This version also supports the required "bit rotation" operation and is more light-weight, in that all operations alter the state of the Word
on which it is called, rather than returning a new instance each time.
Although this Word
class is private within the algorithm's code, it might be useful for other applications. The code for it as follows:
The rest of the code is pretty much a straight implementation of the specification and is available for free use, if there's something you think's odd about it please let me know!
Using the Algorithm
The code exposes only a single global variable - the function blake512
which expects up to 3 parameters, and has the following definition:
The parameters are expected to be as follows:
string |
(string, required) the message to hash, encoded as previously described. | saltStr |
(string, optional) the salt to use when hashing. If this is defined must be a string of length 16 which is treated as a 256 bit binary sequence, encoded in the same manner as the message. If this is not defined, the default salt, consisting of a sequence of 256 zero's is used. |
outputHex |
(boolean, optional) whether the output should be a string encoded in the same manner as the message (the default behavior) or a hexadecimal sequence. |
The in the case that outputHex
is true
, the following (probably non-optimal!) code is used to convert the encoded string to a hex sequence:
I don't think there'll be any cross browser issues or other problems but if you spot something please let me know and I'll try to fix the problem (time permitting)!