Go kata - generating random strings

I like doing katas and exercise to practice the skill – and I find it especially helpful when mastering a new language, as I’m doing now with Go. So today, when my better half asked me to help her with the launch of her project, I decided to reject the first instinct to do it in Ruby, and practice some Go instead. It turned out pretty well.

The task was simple: generate 500,000 random promotional codes, to be uploaded into the database later. Codes should be 8 symbols long, and contain digits and uppercase letters. Similarly looking symbols should be removed – e.g. 0 and O, I and 1 are out.

Simple enough. It would be easy enough to slap together some hacky script to do it and call it a day (and the version was actually quite like that), but let’s approach it with the good design in mind, and make it a worthwhile exercise.

The task is clearly could be split in two parts, with independent responsibilities: the logic of generating codes, and input/output interface. Let’s start with the generator part.

NB: full final code for this exercise is available on GitHub

Generator

I start with the basic structures describing the task:

package generator

// Task setup
type Task struct {
	Chars  string
	Length int64
}

// Code represents generated codes
type Code struct {
	Index int
	Value string
}

And let’s add a constructor function, that returns a task structure with some sensible defaults. Start with a test here:

package generator_test

import (
	"morhekil/training/randstr/generator"
	"testing"
)

func TestNew(t *testing.T) {
	task := generator.New()
	if task.Length == 0 {
		t.Errorf("length does not have default value")
	}
	if task.Chars == "" {
		t.Errorf("chars does not have default value")
	}
}

The implementation is, of course, straightforward enough. I’m importing uniuri already here to use its values as the defaults:

package generator

// ... skipped earlier code ...

// New task structure
func New() Task {
	return Task{
		Chars:  string(uniuri.StdChars),
		Length: uniuri.StdLen
	}
}

Ok, now that we’re done with the boilerplate, let’s get to the fun part. One here comes the first decision to make – how to actually pass the values back to the caller? I don’t want to implement any kind of output in this package, as it isn’t its concern, and I don’t want to build out a full array of codes and pass it back – it will waste some memory that I don’t want to be wasted.

What else? In Ruby I would pass a lambda to the generator method, and let that lambda deal with the processing of generated codes. It way my first answer here, too, but then I remembered about channels. Channel is an effective to exactly this kind of problem, and it will let me to parallelise code generation and output.

Ok, with this decision in mind – let’s write a test for the generation part.

package generator_test

import (
	"morhekil/training/randstr/generator"
	"regexp"
	"testing"
)

// ... skipped earlier code ...

func TestGenerate(t *testing.T) {
	codes := make(map[string]struct{})
	var n int

	task := generator.New()
	task.Chars = "0123456789"
	task.Length = 5

	for code := range task.Generate(100) {
		// Index received in order
		if n != code.Index {
			t.Fatalf("index %d received, %d expected", code.Index, n)
		}
		n++
		// Code value is unique
		if _, dup := codes[code.Value]; dup {
			t.Fatalf("duplicate code %s received", code.Value)
		}
		// Code value has the right length
		if len(code.Value) != int(task.Length) {
			t.Fatalf("code %s has wrong length", code.Value)
		}
		// Code contains allowed characters only
		if match, _ := regexp.Match("^["+task.Chars+"]+$$", []byte(code.Value)); !match {
			t.Fatalf("code %s contains invalid characters", code.Value)
		}

		codes[code.Value] = struct{}{}
	}
	// Expected number of codes has been received
	if n != 100 {
		t.Errorf("expected 100 codes to be generated, got %d", n)
	}
}

The interesting thing to mention here is the duplication check. Often it is approached by adding all generated codes to an array, and testing every new code against that array to see if we have received it earlier. The problem with this approach is that arrays are really slow for a sequential search, so this algorithm brings a serious performance hit with it. The solution? Use map! Maps in Go are implemented as multilevel hash tables, so they can keep growing in memory as they need to, and stay effective for quick membership checks.

The second trick, that complements the map lookup, is the use of empty struct for map values. Empty struct is an interesting data structure in Go, and one where Go’s low-level optimisations really shine through again. It consumes no data storage, but it can be constructed, can be used to construct other types, and even can be used as a method receiver. Check out The Empty Struct article by Dave Cheney to learn more, here I’ll just point that a syntax to instantiate the empty struct looks like struct{}{}.

Armed with this knowledge, let’s start with the simple Generate method, that is the public API for this operation:

// Generate the given number of generated codes using the existing task setup.
// Codes are printed to stdout, and progress count is printed to stderr
func (task *Task) Generate(count int) (codes chan Code) {
	codes = make(chan Code)
	go task.generate(count, codes)
	return
}

It just creates a channel, fires off the generator to push values into the channel, and returns the channel back to the caller.

And the rest of the private methods to generate the codes:

// generate the given number of codes, and write them to the channel
func (task *Task) generate(count int, codes chan Code) {
	defer close(codes)

	for i := 0; i < int(count); i++ {
		codes <- Code{Index: i, Value: task.next()}
	}
}

// next unique code according to the setup rules
func (task *Task) next() (code string) {
	for dup := true; dup; {
		code = uniuri.NewLenChars(int(task.Length), []byte(task.Chars))
		_, dup = task.codes[code]
	}
	task.codes[code] = struct{}{}
	return
}

Again, the only thing worth pointing our here is that loop construct – as Go does not have explicit while or until loops, but only a generalised for loop, I’m using the fact that map lookup returns the second boolean value to indicate an existence of the requested key. So feeding that back ntoto the for loop, and I’ve got the logic I need.

You can find the final version of the generator code on GitHub

CLI interface

Now that I’m done with the logic that generates the codes, let’s mode on to wrapping it into a nice command line interface.

I’d like the tool to take three arguments – number of codes to generate, the length of the codes, and characters allowed in the codes. All three are mandatory.

The tool will print generated codes into stdout, and will also display a progress count in stderr. It can be launched as ./randstr > res.csv, and give a feedback about its progress as it generates the codes.

Main function is quite simple:

// main function parses command line arguments to get the number of codes
// to generate, and invokes the generator
func main() {
	var count int
	var err error
	task := generator.New()

	if count, err = setup(&task); err != nil {
		fmt.Println(err)
		os.Exit(1)
	}
	for code := range task.Generate(count) {
		fmt.Fprintf(os.Stderr, "%d\r", code.Index)
		fmt.Fprintf(os.Stdout, "%s\n", code.Value)
	}
}

The channel interface that I built into the generator allows for a nice and clean separation of concerns here – my main function just receives codes as they are generated, and prints them out in any format it likes.

setup function parses the command line arguments, and configures the task accordingly:

// New parses command line arguments and return task configuration
func setup(task *generator.Task) (int, error) {
	// Make sure we've got two CLI arguments, if we don't - print short help
	if len(os.Args) < 4 {
		return 0, errors.New("Usage: ./randstr <number_of_codes_to_generate> <length> <chars>")
	}

	// First argument should be an integer, goes for the number of codes
	count, err := strconv.ParseInt(os.Args[1], 10, 0)
	if err != nil {
		return 0, errors.New("number of codes should be an integer")
	}

	// First argument should be an integer, goes for the number of codes
	task.Length, err = strconv.ParseInt(os.Args[2], 10, 0)
	if err != nil {
		return 0, errors.New("code length should be an integer")
	}

	// Chars argument can be anything
	task.Chars = os.Args[3]

	return int(count), nil
}

Conclusion

And that is all, folks! I’m quite happy with the final version of the code, as I managed to do a clean separation of concerns between generator and interface parts of the code, and I learned a trick with empty struct along the way.

Is it overengineered? Well, for the simple task that I had at hand – yes, maybe, but the point of the exercise was to practice good code structure and design, so I’m happy with the final result being somewhat verbose.

Here’s the tool in action:

$ ./randstr
Usage: ./randstr <number_of_codes_to_generate> <length> <chars>
$ ./randstr 5 9 0123456789ABCDEF
16D4A7117
08AC29F77
0417FBD92
2B8F60C9F
0B414D0BB

Full final code is available on GitHub

Do you think I’ve done something wrong? Or is there anything that isn’t idiomatic Go in this code? I’d love to hear a critique and feedback over this exercise in the comments.

Work with me

Or maybe you would like me to write code like this for you? I’m starting to look around for an interesting project to join, so you have one and want me to help you make it awesome – now is the perfect time to get in touch!

SpeakMy.Name

Go kata - generating random strings

Generator

CLI interface

Conclusion

Work with me

Work with me!

Comments

My Stuff Online

Blog's extra