…use a Proxy for speeding up docker images creation

Sometimes it is very convenient to use a proxy (squid or any other) to speed up development. When creating an image you might download some packages, maybe, change some steps, which require you to redownload thos packages. To ease this a bit, you can use a proxy, and instruct docker to use that proxy.
I started looking at configuring a proxy for the docker daemon, but once I had it finally working I realized that it was proxying only the images I downloaded from the internet, so not too much benefit. Anyway, it is documented below.
I then tried to hook a proxy between the build process and the internet. After some hours, I got to this nice post from Jerome Petazzo. His work, linked on github is more advanced what is mentioned in that post, and the docs are not very clear, so I will summarize it here, and comment a small issue that I had on my Fedora 20 (docker 1.3.0).

Proxy for images

Here is a description of the steps required to use a proxy with the daemon.

Install the proxy and configure

Installation steps are quite easy, just use yum in Fedora to install squid:
$ yum -y install squid
In fedora (20), squid config file is /etc/squid/squid.conf. We will configure for our usage.
Configuration is dependent on your preferences, this is just an example of my configuration preferences.
  • Uncomment cache_dir directive and set the allowed max size of the cache dir. Example:
cache_dir ufs /var/spool/squid 20000 16 256
sets the max cache dir size to 20 GB.
  • Add maximum_object_size directive and set its value to the largest file size you want to cache. Example:
maximum_object_size 5 GB
allows to cache DVD-sized files.
  • Optional: Disable caching from some domains. If you have some files/mirrors already on your local network and you don’t want to cache those files (the access is already fast enough), you can specify it using acl and cache directives. This example disables caching of all traffic coming from .redhat.com domain:
acl redhat dstdomain .redhat.com
cache deny redhat
  • start Squid service:
$ service squid start
We will not start squid on boot, as we do only want to use squid for Docker image development purposes.
  • Make sure iptables or SELinux do not block Squid operating on port 3128 (the default value).

Configure Docker to use a proxy

By now, we will have squid running on port export 3128 (default). We just need to instruct docker to use that while the containers go to the internet for things.
You need to establish en environment variable to the docker daemon, specifying the http_proxy.
In fedora 20, you can modify your /etc/sysconfig/docker configuration file, with the following:
HTTP_PROXY=http://localhost:3128
http_proxy=$HTTP_PROXY
HTTPS_PROXY=$HTTP_PROXY
https_proxy=$HTTP_PROXY

export HTTP_PROXY HTTPS_PROXY http_proxy https_proxy

# This line already existed. Only lines above this one has been added.
OPTIONS=--selinux-enabled
Now you need to restart the daemon:
$ systemctl daemon-reload
$ systemctl restart docker.service

Create images

Now, if you get an image, it will get proxied. If you delete it from your local, and want to fetch it again, it will get it now from the proxy cache.
This might seem as it is bt a big benefit, but if you have a local lan, you can use this to have a proxy/cache for the HUB (or registry).

Proxy for images build contents

As I said before, it usually is mor interesting to proxy what will be in the images you are developing, so if you invalidate a layer (modify the Dockerfile) next time will not go to the internet.
Following Jerome’s blog and his github what I did was:
I cloned his github repo to my local:
$ git clone https://github.com/jpetazzo/squid-in-a-can.git squid-in-a-can.git
And then I run:
fig up -d squid && fig run tproxy
You need fig, but who does not have it?
Then you just need to do a normal docker build. The first time every download will get into the “squid” container, and the later times will be fetched from there. While doing this, I hit an issue. I do not realy know if it was in my environment, in any Fedora 20/Docker 1.3.0, or any of them. The issue was that I was getting a unreachable host. It turned out that in my iptables I had a rule that was rejecting everything with icmp-host-prohibited. I solved it removing those lines from iptables.
I used:
$ iptables-save > iptables-original.conf
$ cp iptables-original.conf iptables-new.conf
Commented out his lines in iptables-new.conf
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
-A INPUT -j REJECT --reject-with icmp-host-prohibited
And load the new iptables conf:
$ iptables-restore < iptables-new.conf
I also opened a bug in squid-in-a-can github to see if Jerome’s has an answer to this.

Options

Now there are 2 options, as the container created this way stores the cached data in it, so if you remove it, you remove the cache.
  • First option is to use a volume to a local dir. For this, edit the fig.yml in the project’s source dir.
  • Second option is to use your local squid (if you already have one), so you only need to run that second container, or only the add/remove iptables rule:
    • Start to proxy (Asuming squid is running):
iptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to 3128
  • Stop to proxy:
iptables -t nat -D PREROUTING -p tcp --dport 80 -j REDIRECT --to 3128
Uncategorized

…proxying a SOAP service with 3 faults in SwitchYard

This document describes a problem that I’ve faced with SwitchYard due to one of it’s known issues/features/limitations.

Problem

I needed to create a proxy service for a SOAP Web Service where I had a contract with 3 faults.
public interface CustomerListServicePortType {
public CustomerListResponse getCustomerList(CustomerListRequest parameters) throws
GeneralError, CustomerNotFound, InvalidUserCredentials;
}
SwitchYard has one limitation, it only accepts contracts with one Exception type (as well as only accepts one single input type). When I created the initial service for this, and deployed my service, I had SwitchYard telling me about this:
org.switchyard.SwitchYardException: SWITCHYARD010005: Service operations on a Java interface can only throw one type of exception.
One option could be to modify the contract, but as this is to proxy a legacy service, I need to maintain my contract, so I looked into various options, out of which I’ll describe the one that was the easiest for me.

Solution

I created an internal Contract, for my service, to only have one single Exception:
public interface CustomerListServicePortType {
public CustomerListResponse getCustomerList(CustomerListRequest parameters) throws
CustomerListException;
}
Use transformers to map from and to the original exceptions to my new “unique” exception. As when doing SOAPFault handling, what really gets marshalled/unmarshalled is the FaultInfo, I decided to keep the original FaultInfo in my new Exception:
import org.w3c.dom.Element;

public class CustomerListException extends Exception {

private Element faultInfo;

public CustomerListException(Element cause) {
faultInfo = cause;
}

public Element getFaultInfo() {
return faultInfo;
}
}
And my tranformers where so simple, that I was happy not having to deal with DOM parsing and Element, and all that stuff.
public final class ExceptionTransformers {

@Transformer(from = "{http://common/errorcodes}invalidUserCredentials")
public CustomerListException InvalidUserCredentialsToCustomerListEx(Element from) {
CustomerListException fe = new CustomerListException(from);
return fe;
}

@Transformer(from = "{http://common/errorcodes}generalError")
public CustomerListException transformGeneralErrorToCustomerListEx(Element from) {
CustomerListException fe = new CustomerListException(from);
return fe;
}

@Transformer(from = "{http://common/errorcodes}customerNotFound")
public CustomerListException transformCustomerNotFoundToCustomerListEx(Element from) {
CustomerListException fe = new CustomerListException(from);
return fe;
}

@Transformer(to = "{http://common/errorcodes}customerNotFound")
public Element transformCustomerListExToCustomerNotFound(CustomerListException e){
return e.getFaultInfo();
}

@Transformer(to = "{http://common/errorcodes}generalError")
public Element transformCustomerListExToGeneralError(CustomerListException e){
return e.getFaultInfo();
}

@Transformer(to = "{http://common/errorcodes}invalidUserCredentials")
public Element transformCustomerListExToInvalidUserCredentials(CustomerListException e){
return e.getFaultInfo();
}
}
These transformers gets registered as Java transformers (due to the @Transform annotation).
And everything works like a charm

…Docker layer size explained

When you create a docker image, the final size of an image is very relevant, as people will have to download it from somewhere (maybe internet), at least, the first time, and also every time the image will change. (At least will have to download all the changed/new layers).
I was curious about how to optimize the size of a layer, cause I read at some time that docker internally used a “Copy-on-Write filesystem”, so every write that you made while creating a layer was there, even if you removed the software.
I decided to validate this, and to explain how it works, and how to optimize the size of a layer.
I have 3 main tests to validate the concept, using the JBoss Wildfly image, available on github as a base. But as this image, is composed of 2 base images on top of fedora, plus the wildfly image, I decided to merge everything into one single Dockerfile.

Test 1 – Every command in a separate line

This first test, demonstrates how every command creates a layer, so if you split commands in separate lines, you end up with many more layers, plus many more space being used.
The code for this dockerfiles is available on github:
Image sizes
The conclusion to this is to avoid creating unnecesary layers, or combine shell commands in docker commands, like multiple yum install && yum clean

Test 2 – Uncompressing while downloading vs remove downladed file after decompressing

In this test, I wanted to test whether the “copy-on-write” meant that even if I removed a file, it still occupy some disc space. So for this purpose, what I did was uncompressing a file while I was downloading it directly from the internet versus saving that file, decompressing it and then removing it.
The code for this dockerfiles is available on github:
Image sizes
The conclusion is that in terms of size it is the same, if it is done in a single docker command.

Test 3 – One single RUN command for most of the stuff

In this test, I have modified the mage description, to only contain one single RUN command with everything in there.
The code for this dockerfiles is available on github:
Image size
The conclusion for this test is that the benefit we obtain when having s simple layer is not so big, and every change will create a whole new layer, so it is worse on the long run

Overall conclusions

These are the summary of the conclusions I have made:
  • Layer your images for better reusability of the layers
  • Combine all the yum install && yum clean all that you’ll have in an image in a single RUN command
  • When installing software it has smaller footpring to download (via curl) tha to ADD/COPY from local filesystem as you can combine the download with the install and removing stale data.
  • Don’t combine commands in a single RUN more than needed as the benefit in terms of size can not be huge, but the lose in terms reusability it is
Uncategorized