What's the point of using Docker containers?


Originally, operating systems were designed to run a large number of independent processes. In practice, however, dependencies on specific versions of libraries and specific resource requirements for each application process led to using one operating system – and hence one server – per application. For instance, a database server typically only runs a database, while an application server is hosted on another machine.

Compute virtualization solves this problem, but at a price – each application needs a full operating system, leading to high license and systems management cost. And because even the smallest application needs a full operating system, much memory and many CPU cycles are wasted just to get isolation between applications. Container technology is a way to solve this issue.

Container isolation versus overhead

The figure above shows the relation between isolation between applications and the overhead of running the application. While running each application on a dedicated physical machine provides the highest isolation, the overhead is very high. An operating system, on the other hand, provides much less isolation, but at a very low overhead per application.

Container technology, also known as operating-system-level virtualization, is a server virtualization method in which the kernel of an operating system provides multiple isolated user-space instances, instead of just one. These containers look and feel like a real server from the point of view of its owners and users, but they share the same operating system kernel. This isolation enables the operating system to run multiple processes, where each process shares nothing but the kernel.


Containers are not new – the first UNIX based containers, introduced in 1979, provided isolation of the root file system via the chroot operation. Solaris subsequently pioneered and explored many enhancements, and Linux control groups (cgroups) adopted many of these ideas.

Containers are part of the Linux kernel since 2008. What is new is the use of containers to encapsulate all application components, such as dependencies and services. And when all dependencies are encapsulated, applications become portable.

Using containers has a number of benefits:

  • Isolation – applications or application components can be encapsulated in containers, each operating independently and isolated from each other.
  • Portability – since containers typically contain all components the embedded application or application component needs to function, including libraries, patches, containers can be run on any infrastructure that is capable of running containers using the same kernel version.
  • Easy deployment – containers allow developers to quickly deploy new software versions, as the containers they define can be moved to production unaltered.

Container technology

Containers are based on 3 technologies that are all part of the Linux kernel:

  • Chroot (also known as a jail) - changes the apparent root directory for the current running process and its children and ensures that these processes cannot access files outside the designated directory tree. Chroot was available in Unix as early as 1979.
  • Cgroups - limits and isolates the resource usage (CPU, memory, disk I/O, network, etc.) of a collection of processes. Cgroups is part of the Linux kernel since 2008.
  • Namespaces - allows complete isolation of an applications' view of the operating environment, including process trees, networking, user IDs and mounted file systems. It is part of the Linux kernel since 2002.

Linux Containers (LXC), introduced in 2008, is a combination of chroot, cgroups, and namespaces, providing isolated environments, called containers.

Docker can use LXC as one of its execution drivers. It adds Union File System (UFS) – a way of combining multiple directories into one that appears to contain their combined contents – to the containers, allowing multiple layers of software to be "stacked". Docker also automates deployment of applications inside containers.

Containers and security

While containers provide some isolation, they still use the same underlying kernel and libraries. Isolation between containers on the same machine is much lower than virtual machine isolation. Virtual machines get isolation from hardware - using specialized CPU instructions. Containers don't have this level of isolation. However, there are some operating systems, like Joyent SmartOS' offering, that run on bare metal, and providing containers with hardware based isolation using the same specialized CPU instructions.

Since developers define the contents of containers, security officers lose control over the containers, which could lead to unnoticed vulnerabilities. This could lead to using multiple versions of tools, unpatched software, outdated software, or unlicensed software. To solve this issue, a repository with predefined and approved container components and container hierarchy can be implemented.

Container orchestration

Where an operating system abstracts resources such as CPU, RAM, and network connectivity and provides services to applications, container orchestration, also known as a datacenter operating system, abstracts the resources of a cluster of machines and provides services to containers. A container orchestrator allows containers to be run anywhere on the cluster of machines – it schedules the containers to any machine that has resources available. It acts like a kernel for the combined resources of an entire datacenter instead of the resources of just a single computer.


There are many frameworks for managing container images and orchestrating the container lifecycle. Some examples are:

  • Docker Swarm
  • Apache Mesos
  • Google's Kubernetes
  • Rancher
  • Pivotal CloudFoundry
  • Mesophere DC/OS

This entry was posted on Woensdag 22 Juni 2016

Identity and Access Management

Identity and Access management (IAM) is the process of managing the identity of people or systems and their permissions on systems.

IAM is a three-way process. In an IAM solution, users or systems first announce who they are (identification – they provide their name), then their claimed account is checked (authentication – they provide for instance a password, which is checked), and then the account is granted the permissions related to their identity and the groups they belong to (authorization – they are allowed into the system).

Most systems have a way to connect identities and their permissions. For instance, the kernel of an operating system owns an administration of users and a list of user rights that describes which identities are allowed to read, write, modify, or delete files.

IAM is not only used on the operating system level, but also in applications, databases, or other systems. Often these systems have their own stand-alone IAM system, which leads to users logging in to each and every system they use. With Single sign-on (SSO), a user logs in once and is passed seamlessly, without an authentication prompt, to applications configured with it. SSO provides user friendliness, but does not necessarily enhance security – when the main login credentials are known, an attacker gains access to all systems. SSO is typically implemented using LDAP, Kerberos, or Microsoft Active Directory. 

Federated identity management extends SSO above the enterprise level, creating a trusted authority for digital identities across multiple organizations. In a federated system, participating organizations share identity attributes based on agreed-upon standards, facilitating authentication from other members of the federation and granting appropriate access to systems.

Users can be authenticated in one of three ways:

  • Something you know, like a password or PIN
  • Something you have, like a bank card, a token or a smartphone
  • Something you are, like a fingerprint or an iris scan

Many systems only use a username/password combination (something you know), but more and more systems use multi-factor authentication, where at least two types of authentication are required. An example is an ATM machine, where a bank card is needed (something you have) and a PIN (something you know).

Typically, users are members of one or more groups (typically named after their roles in the organization) and, instead of granting permissions to individual users, these groups are granted permissions. And since groups can be nested (a group is member of another group), this so-called Role Based Access Control (RBAC) is very powerful.

This entry was posted on Vrijdag 01 April 2016

Using user profiles to determine infrastructure load

To be able to predict the load a new software system will pose on the infrastructure, and to be able to create representative test scripts before the software is built, user profiling can be used.

In order to predict the load on the infrastructure, it is important to have a good indication of the future usage of the system. This can be done by defining a number of typical user groups of the new system (also known as personas) and by creating a list of the tasks they will perform on the new system.

First a list of personas must be defined – preferably less than ten personas. Representatives of these persona groups must be interviewed to understand how they will use the new system. A list can be compiled with the main tasks (like login, start the application, open a document, create a report, etc.).

For each of these tasks, an estimation can be made on how, and how often they will use the system’s functionality to perform the task. Based on these estimations, and the number of users the personas represent, a calculation can be made on how often each system task is used in a given time frame, and how these relate to infrastructure tasks. A very simplified example is given below:

Persona Number of users per persona System task Infrastructure task Frequency
Data entry officer 100 Start application Read 100 MB data from SAN Once a day
Data entry officer  100 Start application Transport 100 MB data to workstation Once a day
Data entry officer  100 Enter new data Transport 50 KB data from workstation to server 40 per hour
Data entry officer  100 Enter new data Store 50 KB data to SAN 40 per hour
Data entry officer  100 Change existing data Read 50 KB data from SAN 10 per hour
Data entry officer  100 Change existing data Transport 50 KB data from server to workstation 10 per hour
Data entry officer  100 Change existing data Transport 50 KB data from workstation to server 10 per hour
Data entry officer  100 Change existing data Store 50 KB data to SAN 10 per hour
Data entry officer  100 Close application Transport 500 KB configuration data from workstation to server Once a day
Data entry officer  100 Close application Store 500 KB data to SAN Once a day

This leads to the following profile for this persona group:

Infrastructure task
Per day Per second
Data transport from server to workstation (KB) 10,400,000 361.1
Data transport from workstation to server (KB) 2,050,000 71.2
Data read from SAN (KB) 10,400,000 361.1
Data written to SAN (KB) 2,050,000 71.2

Of course, in practice, this exercise is much more complicated. There might be many personas, complex tasks, tasks are spread in time, or show hotspots (like starting the application or logging in, which typically happens at the start of the day), the system can have background processes running, and the load on the system for a specific task can be very hard to predict.

But as this very simplified example shows, user profiles can help determining the load on various parts of the infrastructure, even before the application software is written.

This entry was posted on Zondag 21 Februari 2016

Public wireless networks

In the past years wireless networks have become more popular than wired networks for end user devices. Apart from WLANs based on Wi-Fi, public wireless networks based on GPRS, EDGE, UMTS, and HSDPA are getting more used every day. The reason is obvious – public wireless networks provide freedom to move around for mobile users and provide connectivity from places where wired connections are impossible (like on the road).

Public wireless networks are much less reliable than private. Users moving around will often temporarily lose connectivity, and bad signals lead to frequent re-sending of network packets. The bandwidth is also much lower than when using private networks; noise and other signal interference, usage of available bandwidth by (many) other users and retransmissions lead to low effective bandwidth per end point.

Global System for Mobile Communications (GSM) is the world's most popular standard for mobile telephone systems in which both signaling and speech channels are digital. This technology is also called 1G: the first-generation of mobile technology.

General packet radio service (GPRS) is a packet oriented mobile data service providing data rates of 56 to 114 kbit/s based on GSM technology. This technology is also called 2G.

Enhanced Data rates for GSM Evolution (EDGE), also known as Enhanced GPRS or 2.5G, allows improved data transmission rates as a backward-compatible extension of GSM. EDGE delivers data rates up to 384 kbit/s.

Universal Mobile Telecommunications System (UMTS) is an umbrella term for the third-generation (3G) mobile telecommunications transmission standard. UMTS is also known as FOMA or W-CDMA. Compared to GSM, UMTS requires new base stations and new frequency allocations, but it uses a core network derived from GSM, ensuring backward compatibility. UMTS was designed to provide maximum data transfer rates of 45 Mbit/s.

High Speed Downlink Packet Access (HSDPA) is part of the UMTS standard, providing a maximum speed of 7.2 Mbit/s. HSDPA+ is also known as HSDPA Evolution and Evolved HSDPA. It is an upgrade to HSDPA networks, providing 42 Mbit/s download and 11.5 Mbit/s upload speeds.

LTE (4G)
LTE (Long Term Evolution) is a 4G network technology, designed from the start to transport data (IP packets) rather than voice. LTE is a set of enhancements to UMTS. In order to use LTE, the core UMTS network must be adapted, leading to changes in the transmitting equipment. The LTE specification provides download peak rates of at least 100 Mbit/s (up to 326 Mbit/s), and an upload speed of at least 50 Mbit/s (up to 86.4 Mbit/s).

LTE is not designed to handle voice transmissions. When placing or receiving a voice call, LTE handsets will typically fall back to old 2G or 3G networks for the duration of the call. In 2015, the Voice over LTE (VoLTE) protocol is about to be rolled out to allow the decommissioning of the old 2G and 3G networks in the future.

This entry was posted on Donderdag 24 December 2015

Supercomputer architecture

A supercomputer is a computer architecture designed to maximize calculation speed. This in contrast with a mainframe, which is optimized for high I/O throughput. Supercomputers are the fastest machines available at any given time. Since computing speed increases continuously, supercomputers are superseded by new supercomputers all the time.

Supercomputers are used for many tasks, from weather forecast calculations to the rendering of movies like Toy Story and Shrek.

Originally, supercomputers were produced primarily by a company named Cray Research. The Cray-1 was a major success when it was released in 1976. It was faster than all other computers at the time and it went on to become one of the best known and most successful supercomputers in history. The machine cost $8.9 million when introduced.

Cray supercomputers used specially designed CPUs for performing calculations on large sets of data. Together with dedicated hardware for certain instructions (like multiply and divide) this increased performance.

The entire chassis of the Cray supercomputers was bent into a large C-shape. Speed-dependent portions of the system were placed on the "inside edge" of the chassis where the wire-lengths were shorter to decrease delays. The system could peak at 250 MFLOPS (Million Floating Point Operations per second).


In 1985, the very advanced Cray-2 was released, capable of 1.9 billion floating point operations per second (GFLOPS) peak performance, almost eight times as much as the Cray-1. In comparison, in 2015, the Intel Core i7 5960X CPU has a peak performance of 354 GFLOPS ; more than 185 times faster than the Cray-2!

Supercomputers as single machines started to disappear in the 1990s. Their work was taken over by clustered computers – a large number of off-the-shelf x86 based servers, connected by fast networks to form one large computer array. Nowadays high performance computing is done mainly with large arrays of x86 systems. In 2015, the fastest computer array was a cluster with more than 3,120,000 CPU cores, calculating at 54,902,400 GFLOPS, running Linux .

In some cases specialized hardware is used to realize high performance. For example, graphics processors (GPUs) can be used for fast vector based calculations and Intel CPUs now contain special instructions to speed up AES encryption.

In 2015, the NVidia's Tesla GPU PCIe card (basically a graphics card but without a graphics connector) provides hundreds of vector based computing cores and more than 8,000 GFLOPS of computing power . Four of these cards can be combined in one system for extremely high performance calculations, for just a fraction of the cost of traditional supercomputers.

This entry was posted on Vrijdag 11 December 2015

Earlier articles

What's the point of using Docker containers?

Identity and Access Management

Using user profiles to determine infrastructure load

Public wireless networks

Supercomputer architecture

Desktop virtualization

Stakeholder management

x86 platform architecture

Midrange systems architecture

Mainframe Architecture

Software Defined Data Center - SDDC

The Virtualization Model

Software Defined Computing (SDC), Networking (SDN) and Storage (SDS)

What are concurrent users?

Performance and availability monitoring in levels

Een impressie van het LAC 2014

UX/UI has no business rules

Technical debt: a time related issue

Solution shaping workshops

Architecture life cycle

Project managers and architects

Using ArchiMate for describing infrastructures

Kruchten’s 4+1 views for solution architecture

The SEI stack of solution architecture frameworks

TOGAF and infrastructure architecture

The Zachman framework

An introduction to architecture frameworks

How to handle a Distributed Denial of Service (DDoS) attack

Architecture Principles

Views and viewpoints explained

Stakeholders and their concerns

Skills of a solution architect architect

Solution architects versus enterprise architects

Definition of IT Architecture

My Book

What is Big Data?

How to make your IT "Greener"

What is Cloud computing and IaaS?

Purchasing of IT infrastructure technologies and services

IDS/IPS systems

IP Protocol (IPv4) classes and subnets

Infrastructure Architecture - Course materials

Introduction to Bring Your Own Device (BYOD)

IT Infrastructure Architecture model

Fire prevention in the datacenter

Where to build your datacenter

Availability - Fall-back, hot site, warm site

Reliabilty of infrastructure components

Human factors in availability of systems

Business Continuity Management (BCM) and Disaster Recovery Plan (DRP)

Performance - Design for use

Performance concepts - Load balancing

Performance concepts - Scaling

Performance concept - Caching

Perceived performance

Ethical hacking

Computer crime

Introduction to Cryptography

Introduction to Risk management

The history of UNIX and Linux

The history of Microsoft Windows

Engelse woorden in het Nederlands

Infosecurity beurs 2010

The history of Storage

The history of Networking

The first computers

Cloud: waar staat mijn data?

Tips voor het behalen van uw ITAC / Open CA certificaat

Ervaringen met het bestuderen van TOGAF

De beveiliging van uw data in de cloud

Proof of concept

Een consistente back-up? Nergens voor nodig.

Measuring Enterprise Architecture Maturity

The Long Tail

Open group ITAC /Open CA Certification

Human factors in security

Google outage

SAS 70

De Mythe van de Man-Maand

TOGAF 9 - wat is veranderd?

Landelijk Architectuur Congres LAC 2008

InfoSecurity beurs 2008

Spam is big business

De zeven eigenschappen van effectief leiderschap

Een ontmoeting met John Zachman

Persoonlijk Informatie Eigendom

Archivering data - more than backup

Sjaak Laan

Recommended links

Genootschap voor Informatie Architecten
Ruth Malan
Informatiekundig bekeken
Gaudi site
XR Magazine
Esther Barthel's site on virtualization


XML: RSS Feed 
XML: Atom Feed 


The postings on this site are my opinions and do not necessarily represent CGI’s strategies, views or opinions.


Copyright Sjaak Laan