Knowledge: Tech Blog

Where we share our thoughts on technology and design.

Data Engineering

Oliver Kenyon

Using Kafka and Grafana to monitor meteorological conditions

Apache Kafka provides distributed log store used by increasing numbers of companies and often forming the heart of systems processing huge amounts of data. This post shows how to use it for storing meteorological data and displaying this in a graphical dashboard with Graphite and Grafana

Zinat Wali

Cassandra - Achieving high availability while maintaining consistency

A discussion about Cassandra consistency levels and replication factor, which are frequently misunderstood. This post explains the Cassandra infrastructure and how its configuration can be tuned.

Bartosz Jedrzejewski

Chess data mining with Apache Spark and Lichess

Lichess makes over 100GB of chess games from 2017 available on their website. This post shows how this data can be transformed with Apache Spark and analysed. Something for Data Engineers and Chess Enthusiasts alike!

Andrew Carr

The Big Data technologies that saved BP $7bn

Yesterday the Financial Times boldly declared that BP saved $7bn since 2014 by investing in Big Data technologies. I spent a couple of hours researching Big Data technologies associated with BP members of staff to try and build up a picture of exactly which technologies they are using.

Bartosz Jedrzejewski

Successful microservices architecture with the Twelve-Factor App

Using microservices in your architecture is a very popular choice. Unfortunately it is also challenging to get it right. With the help of Twelve-Factor methodology, I will tell you how to set yourself up for a success rather than a disappointment.

Zinat Wali

Load testing Alteryx API with Gatling

A successful attempt of load testing Alteryx API with Gatling and a not-so-successful attempt with Apache JMeter

William Ferguson

Creating a Parallel Particle Simulation in Go

Following on from my previous blog post where I created a simple particle simulation using Go, I looked at adding some complexity.

William Ferguson

Creating Simple Web Services in Go

Following on from making a simple particle simulation in Go, I decided to try and implement a simple set of web services too.

James White

Cassandra vs. MariaDB, Scaling

In this post we compare how Cassandra and MariaDB can be configured to operate in clusters and how this affects response time for queries. We found Cassandra to scale well and to be highly configurable. MariaDB can be used with Galera Cluster but it does not provide horizontal scaling. Also NDB can be used to scale MySQL but it was not as configurable as Cassandra.

Ross Hendry

Keeping Secrets in Docker

Docker 1.13 introduces a simple way of providing secrets to containers

Dave Ogle

Cassandra vs. MariaDB

We've been comparing Cassandra and MariaDB in single node setups, exploring the issues of each in terms of performance and ease of use from a development perspective. In this article we explore the issues at play in such a setup such as the differences in queries, speed of response and the features that seperate these two technologies.

Dominic Ketley

StreamSets with Docker - an example HDFS integration

StreamSets Data Collector (SDC) is an open source tool for stream-based extracting, transforming and loading large quantities of data. It provides an easy to use UI on top of the underlying processing power of YARN and Spark Streaming with a large number of installable integrations with source and destination systems....

Daniel Cook

The Rise of Big Data Streaming

With the advent of the Internet of Things, the world of Big Data couldn't be more relevant. This post gives an overview of technologies that achieve processing at scale and in real time.

Darren Smith

Using Rally to benchmark Elasticsearch queries

In this post I describe how to use Elastic's Rally to generate benchmarks for your private Elasticsearch queries and clusters. I'll be creating a benchmark which allows comparison of an unscored query with one where scoring is enabled.

Bartosz Jedrzejewski

Spring Boot and MongoDB - a perfect match!

The popularity of Spring Boot in the Java world is undeniable. In this post I will show you yet another reason for this. Using Spring Boot makes working with MongoDB an absolute pleasure.

William Ferguson

Creating a Simple Particle Simulation with Go

In this post, I take a look at writing a simple particle simulation as a way of learning some of the basics of the Go language.

Chris Smith

Docker 1.12 swarm mode - round robin inside and out

This post demonstrates how Docker 1.12 swarm mode round robins the containers in a service both for incoming connections (ingress) and DNS within the swarm.

Chris Smith

Declarative CI / CD with Concourse

This post describes the Concourse build system and explains why declarative CI / CD is so compelling. No more pet build servers!

David Wybourn

Service discovery with Docker Swarm

For the last few months we’ve been working on a very DevOps focused project. As such we’ve used AWS, infrastructure as code, Docker and microservices. The different microservices were initially running all on one box, each with a different port. This solution wasn’t scalable or very practical. We couldn’t have...

James Hill

Bitcoin payments and the Lightning Network

This is the second blog post orientated around Bitcoin and its inner workings. The first post took the blockchain and broke down the algorithms which create the fundamental structure of any cryptocurrency. The post was separated into two sections; the first focusing on the block header and the second focusing...

Bartosz Jedrzejewski

Code reuse in microservices architecture - with Spring Boot

In most microservice architectures, there are many opportunities and temptations for sharing code. In this post I will give advice based on my experience on when it should be avoided and when code reuse is acceptable. The points will be illustrated with the help of an example Spring Boot project.

Ross Hendry

Writing a Docker Volume Plugin for S3

An experiment in writing a volume plugin for Docker

Chris Smith

Playing with Docker Compose and Erlang

This post uses Docker Compose to spin up a three container HTTP server. One container services the HTTP requests and delegates work to two other containers in a load-balanced way. Erlang is used for development to add a bit of extra challenge!

Andrew Carr

Why Apache Spark is getting so much momentum behind it

Apache Spark has quickly become the largest open source project in big data, but why has it suddenly got so much momentum behind it?

David Wybourn

Introduction to Hadoop and MapReduce

What is ‘Big’ Data? Big data is one of those buzz phrases that gets thrown round a lot, companies love saying they work with ‘Big’ data, but what is ‘Big’ data? When does data get so big that it can be called Big data? One Gigabyte? How about a Terabyte,...

Bartosz Jedrzejewski

Java Microservices - How to get started in minutes using Dropwizard

This blog shows how to get started with microservices using Dropwizard. It guides the reader through building a simple task-list service.

Thomas Kelly

Generalizing OData

OData Controllers offer an easy interface between data and your application, but require one controller per model type. These controllers often have a large amount of almost identical code. In this blog post, we look at using C# Generics to remove this duplication.

Ian Sullivan

Creating a High Performance Stock Ticker Using Haskell

This post demonstrates how to create an efficient stock ticker app using HTML5 WebSockets and a Haskell server.