Update CI data / database container image for data or database schema changes.

One of the projects I work on uses GitLab . We host our codes on GitLab, review codes and commits, and use GitLab CI to do all the CI auto-testing jobs before we merge branches.

Normally mock data for TDD Unit Test shouldn't be a difficult job, but once we need to do something more complicated, like searching relations in a graph database, or doing some Integration Test, it became a mess, no matter we put them in test codes or put them in separated files and load when needed:

  • updating hundreds of records in a thousands-rows mock dataset .
  • container stats and changes will not be kept across stages and jobs, so in every jobs and stages we need to load them again and again.
  • it takes time and resources to load data into database, and we don't want to wait them long.
  • wasting resources might cause global warming.

The Solution

What we did to resolve the problem is extracting a subset of real data, obfuscate sensitive parts, load them into a data container or database container then build a customized image to upload to GitLab, and use it as our database service in the CI and automatic Unit Test process.

The HOWTO

If you're new and trying to build the first data container/image, you can still follow the process below but skip some you don't need.

If you're using a private container image Registry , please login first with docker login. For example: docker login registry.gitlab.com

Now we're going to explain the whole process step-by-step to update current data/database image running on GitLab CI.

Prepare New Data

  1. Pull the image from the Registry you use. Here we use Dgraph for example: docker pull registry.gitlab.com/bluet/service-relation/dgraph-testdata-politician For whom's new and trying to build your first data/database image/container running on GitLab CI, just pull the base image (ex, dgraph, neo4j, mysql, mariadb, postgres) you need from Docker Hub (the default source).

  2. Run a container from that image with docker run --name dgraph dgraph-testdata-politician If you prefer to mount a local folder with data files you want to import from inside the container, use docker run -it -v ${PWD}/testdata:/data/testdata --name dgraph dgraph-testdata-politician Or you can use -p or -P to expose service ports from container to your public network interface, for importing data from other computers. (Be careful, someone else might connect to it as well)

  3. Update the data you want.
    Once finished, quit and stop the container, but do not remove it.

Commit Result and Test

  1. Find the container ID with docker ps|grep dgraph

  2. Commit changes into new image (here we commit to a new tag associate with original image name locally) docker commit -a "The Cutie <[email protected]" -m "updated xxxxx testing data for xxxxx" CONTAINER_ID dgraph-testdata-politician:sprint38-evils

Run tests locally to make sure dgraph-testdata-politician:sprint38-evils is really what you want.
If yes, continue to upload and test it on GitLab CI.

  1. Set the image as an alternative version of original with docker tag docker tag dgraph-testdata-politician:sprint38-evils registry.gitlab.com/bluet/service-relation/dgraph-testdata-politician:sprint38-evils

  2. Upload the image with docker push docker push registry.gitlab.com/bluet/service-relation/dgraph-testdata-politician:sprint38-evils

  3. Create a new branch of your code project, change settings in .gitlab-ci.yml file to use the new image for testing.

Update the Default Image (tag it to latest)

If everything works like a charm, we can tag it as the latest default image, and update to remote servers.

  1. Tag the new image as the latest default one to original source repository. `docker tag dgraph-testdata-politician:sprint38-evils registry.gitlab.com/bluet/service-relation/dgraph-testdata-politician:latest
  2. Push the updates docker push registry.gitlab.com/bluet/service-relation/dgraph-testdata-politician:latest

Testing

Try to make some changes on code, commit and push, the new images will be fetched and run by your pipeline.

Hope this HOWTO could save you, a tree, and a kitten.

Add new comment