Gentle introduction (to data engineering)

  • PC, Server, Grid, Cloud, IoT
  • Desktop OS and Python.
  • Using Docker.
  • Using Jupyter and Python inside Docker.

Install Docker CE on Ubuntu

https://docs.docker.com/install/linux/docker-ce/ubuntu/

$ sudo apt-get update
# allow apt to use https repos
$ sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    software-properties-common
# add the repo key
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

$ sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"
$ sudo apt-get update
$ sudo apt-get install docker-ce

Now following this to link the image directories to my hdd: https://forums.docker.com/t/how-do-i-change-the-docker-image-installation-directory/1169

sudo nano /etc/default/docker
DOCKER_OPTS="--dns 8.8.8.8 --dns 8.8.4.4 -g /media/sergiu/lappie/docker"
service docker restart

Setup a docker container

The GCC repo is available in the store at: https://store.docker.com/images/gcc

FROM gcc:4.9
COPY . /usr/src/myapp
WORKDIR /usr/src/myapp
RUN gcc -o myapp main.c
CMD ["./myapp"]

Simpler would be to use ubuntu as parent, and then install everything on the container, rather than the image. Untested, a more comprehensive image would have this Dockerfile:

FROM ubuntu
RUN apt update
RUN apt -y install autoconf automake libtool make gcc wget git
RUN apt -y install zlib1g-dev
RUN apt -y install awscli
# try later git wget
#  libxerces-c-dev libxerces-c-dev

WORKDIR /opt/
RUN wget http://www-us.apache.org/dist//xerces/c/3/sources/xerces-c-3.2.1.tar.gz
RUN git clone https://github.com/ctSkennerton/crass.git
RUN tar xvf xerces-c-3.2.1.tar.gz

WORKDIR /opt/xerces-c-3.2.1
RUN ./configure && make && make install

WORKDIR /opt/crass
RUN ./autogen.sh && ./configure && make && make install
# crass will be in /usr/local/bin/crass

# test
RUN crass ./test/Ill100.fx.gz

Build the image then start a container with:

cd /media/sergiu/lappie/temp/andersson/docker
docker build -t crass .
docker image ls
docker run -ti imageid /bin/bash

To return to the exited container:

# get the container id 
docker ps -a
docker start containerid
docker attach containerid

Alternatives are:

There were some custom settings, for awscli for example, that are hard to code in an image, so it is better to rebuild a new image from the current container. Then I push the image to Docker hub, and from there I will pull it onto the EC2 instance.

sudo docker ps -a
sudo docker commit ef357fab136a grokkaine/crass:awsv1
sudo docker images -a

export DOCKER_ID_USER="grokkaine"
docker login
docker tag imageid $DOCKER_ID_USER/awscrass
docker push $DOCKER_ID_USER/awscrass

In [ ]: