Using Ristretto to cache objects in Go

Introduction

In my previous article on Greycell, I wrote about caching objects using tags and how it can make cache invalidation easier. I used Redis as a temporary store for our objects that needed to be cached. Although Redis is really good for this, I wanted an embedded key-value store in order to avoid additional dependency of maintaining a Redis server. When I explored further, I came across many embedded key-value store like groupcache, fastcache, burrow cache and Dgraph’s Ristretto. Groupcache is an excellent distributed embedded caching library meant to replace a pool of independent Redis or Memcache nodes. Mailgun’s enhancement to groupcache adds explicit key removal, key expiry and many more helpful features. I’d like to write an article about groupcache in the future, but for now I’ll settle for Ristretto.

Ristretto is a high performance, concurrent, memory-bound Go cache. It is contention-proof, scales well and provides consistently high hit-ratios. It is really easy to use with very good documentation and a detailed blog post about it’s architecture and benchmarks.

This article provides a brief implementation of caching objects by tags using Ristretto. Let me first explain the strategy that I’ll use to keep track of tags and the cache keys that are tagged with a specific tag name.

Implementation details

Let’s say we want to tag a cache key with tag name tag1. What we want to do first is to create a set called tag1. We will then add the cache key to be tagged with tag1 to this set. This way, when we want to invalidate a cache key by it’s tag name tag1, we will delete all cache keys that are members of the set tag1.

But unlike Redis, Ristretto is a very simple key-value store. There is no set data structure that the library itself provides. But it should be trivial to create a set data structure in Go using a map. Note that we only want to store strings in the set since cache keys are almost always going to be strings. So let’s create a simple set data structure that accepts strings as it’s members.

// file: ./ds/set.go

package ds

import "sync"

// New StringSet.
func New() *StringSet {
    s := StringSet{
        d: make(map[string]struct{}),
    }
    return &s
}

// We will set the value of the key in the map to this.
var special = struct{}{}

// StringSet struct.
type StringSet struct {
    l sync.RWMutex
    d map[string]struct{}
}

// Add method to add a member to the StringSet.
func (s *StringSet) Add(member string) {
    s.l.Lock()
    defer s.l.Unlock()

    s.d[member] = special
}

// Remove method to remove a member from the StringSet.
func (s *StringSet) Remove(member string) {
    s.l.Lock()
    defer s.l.Unlock()

    delete(s.d, member)
}

// IsMember method to check if a member is present in the StringSet.
func (s *StringSet) IsMember(member string) bool {
    s.l.RLock()
    defer s.l.RUnlock()

    _, found := s.d[member]
    return found
}

// Members method to retrieve all members of the StringSet.
func (s *StringSet) Members() []string {
    s.l.RLock()
    defer s.l.RUnlock()

    keys := make([]string, 0)
    for k := range s.d {
        keys = append(keys, k)
    }
    return keys
}

// Size method to get the cardinality of the StringSet.
func (s *StringSet) Size() int {
    s.l.RLock()
    defer s.l.RUnlock()

    return len(s.d)
}

// Clear method to remove all members from the StringSet.
func (s *StringSet) Clear() {
    s.l.Lock()
    defer s.l.Unlock()

    s.d = make(map[string]struct{})
}

We can use this set data structure like this:

s := ds.New()
s.Add("hello")
s.Add("world")
isMember := s.IsMember("hello") // returns true
members := s.Members() // returns all members of the set.

We can add more methods to this set implementation, like union, intersection etc. But this should be sufficient for our use case here.

Implement a cache store

Now that we have our set data structure, let’s see how we can use this in our cache library. So, let’s initialise our Ristretto cache store.

// file: ./cache/cache.go

package cache

import (
    "log"
    "sync"
    "time"

    "github.com/abvarun226/ristretto-cache/ds"
    "github.com/dgraph-io/ristretto"
)

// New initialises a new cache store.
func New() *Store {
    c, errNew := ristretto.NewCache(&ristretto.Config{
        MaxCost:     1 << 30, // 1GB
        NumCounters: 1e7, // 10M
        BufferItems: 64,
    })
    if errNew != nil {
        log.Fatalf("failed to initialise cache: %v", errNew)
    }
    return &Store{store: c}
}

// Store is our struct for cache store.
type Store struct {
    l     sync.RWMutex
    store *ristretto.Cache
}

The above code snippet is part of our cache library and it initialises the Ristretto cache and returns our local cache store. Based on the settings used, it can store upto 1GB of data in the cache. We’ll now add methods that will help us interact with this local cache store.

// file: ./cache/cache.go

// SetByTags method to set cache by given tags.
func (s *Store) SetByTags(key, value string, expiry time.Duration, tags []string) {
    s.l.Lock()
    defer s.l.Unlock()

    for _, tag := range tags {
        set := ds.New()
        if v, found := s.store.Get(tag); found {
            set = v.(*ds.StringSet)
        }
        set.Add(key)
        s.store.Set(tag, set, 1)
    }

    s.store.SetWithTTL(key, value, 1, expiry)
}

SetByTags method will cache the object and also tag it with the tag names provided as a list of strings. You can use any of these strings to invalidate the cache object. We’ll use the locks to ensure that two or more requests don’t update the same set at the same time.

Now let’s look at the code to invalidate cache using the tags.

// file: ./cache/cache.go

// Invalidate method to invalidate cache with given tags.
func (s *Store) Invalidate(tags []string) {
    s.l.Lock()
    defer s.l.Unlock()

    keys := make([]string, 0)
    for _, tag := range tags {
        set := ds.New()
        if v, found := s.store.Get(tag); found {
            set = v.(*ds.StringSet)
        }
        keys = append(keys, set.Members()...)
        keys = append(keys, tag)
    }

    for _, k := range keys {
        s.store.Del(k)
    }
}

Again, here we use locks to synchronise any changes to the set with the given tag name. This method will get all the cache keys associated with the tag name and delete the keys from the cache store, along with the set.

As for other methods to interact with the cache store, it is pretty straight forward.

// file: ./cache/cache.go

// Get method to retrieve the value of a key. If not present, returns false.
func (s *Store) Get(key string) (string, bool) {
    var value string
    val, found := s.store.Get(key)
    if found {
        value = val.(string)
    }
    return value, found
}

// Delete method to delete a key from cache.
func (s *Store) Delete(key string) {
    s.store.Del(key)
}

// Close method clear and then close the cache store.
func (s *Store) Close() {
    s.store.Clear()
    s.store.Close()
}

Here is how you can use the cache store.

// file: ./example.go

// create a new cache.
c := cache.New()

key1 := "key1"
val1 := "value1"
tags := []string{"tag1", "tag2"}

// cache value and set the tags along with expiry.
c.SetByTags(key1, val1, expiry, tags)

// Do some work here.
time.Sleep(5 * time.Millisecond)

// check if key exists in cache.
if res, found := c.Get(key1); found {
    fmt.Printf("found the key `%s`: `%s`", key1, res)
} else {
    fmt.Printf("key `%s` not found", key1)
}

// Invalidate cache using a tag.
c.Invalidate(tags[0])

// now, check if key exists in cache.
if res, found := c.Get(key1); found {
    fmt.Printf("found the key `%s`: `%s`", key1, res)
} else {
    fmt.Printf("key `%s` not found", key1)
}

The output of the above snippet should look like this:

found the key `key1`: `value1`
key `key1` not found

Final Thoughts

Ristretto is quite performant, probably faster than using Redis, since it avoids a network call to a standalone Redis server. It is pretty easy to implement a caching strategy using Ristretto, but if you want the same cache objects to be available across different processes or instances, then you’d need to use a Redis server or probably a distributed caching library like Groupcache. I’ll write an article about how we can use Groupcache in a future post.

You can access the code here: https://github.com/abvarun226/ristretto-cache

I hope this was helpful and if you find any errors or issues in this post, please do let me know in the comments section below.

In case you would like to get notified about more articles like this, please subscribe to my substack.



Comments

comments powered by Disqus