Service's hash are different from downloaded tarball and remote git

v0-9

#1

Service’s hash are different from downloaded tarball and remote git.

But if I download the tarball manually and deploy it, the hash is the same as git.


Service's hash
Milestone v0.8
#2

local tarball

➜  mesg-foundation-service-ethereum-e76695f git:(dev) ✗ docker build -t local_tarball_eth .
Sending build context to Docker daemon  246.8kB
...
Successfully tagged local_tarball_eth:latest
➜  mesg-foundation-service-ethereum-e76695f git:(dev) ✗ docker images local_tarball_eth
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
local_tarball_eth   latest              24a0da1565ce        3 hours ago         846MB

local git

➜  service-ethereum docker build -t local_git_eth .                               
Sending build context to Docker daemon  246.8kB
...
Successfully tagged local_git_eth:latest
➜  service-ethereum docker images local_git_eth    
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
local_git_eth       latest              9de6e95f4278        3 hours ago         846MB

So different hash is due to docker build (don’t know why exactly) and the Image id is changed in service struct.

To be the most compatible what what we get from docker we can use code from


#3

Here are the result of my different tests:

I add to the .dockerignore of this service both .git and .DS_Store. So it should not be polluted by automatically created files.

Deploy remote tarball

➜  core git:(dev) ✗ ./dev-cli service deploy https://api.github.com/repos/mesg-foundation/service-ethereum/tarball/master
✔ Service deployed with hash: d8f01d6f7a7193b8c7d7b6d943c97b195e9fc409

Deploy remote git

➜  core git:(dev) ✗ ./dev-cli service deploy https://github.com/mesg-foundation/service-ethereum/
✔ Service deployed with hash: 9c1b8b7efd90b72a0027e8a107cbaf1b4019e918

Deploy local dir

➜  core git:(dev) ✗ ./dev-cli service deploy /Users/nico/Developments/MESG/services/service-ethereum
✔ Service deployed with hash: 9c1b8b7efd90b72a0027e8a107cbaf1b4019e918

Manually download tarball, extract it and deploy:

➜  core git:(dev) ✗ ./dev-cli service deploy /Users/nico/service-ethereum/mesg-foundation-service-ethereum-8bf6d06
✔ Service deployed with hash: 9c1b8b7efd90b72a0027e8a107cbaf1b4019e918

As you can see, all deployment except remote tar provide the same hash except remote tarball!

I had to fight with automatically created files, so maybe the ubuntu is automatically adding file when untar the archive?


#4

@krhubert plan:

1) need constant hash

To do so, the core will download / clone the service in a temp folder, check the mesg.yml, send to docker from the local temp folder.

If the calculated hash are still not the same, we need to check for:

  • automatically generated files (eg .DS_Store, .git)
  • creation / update datetime of files
  • modification on file permission
  • anything else…

If no solution is find, then we need to calculate the hash of the service ourselve. Let’s open this discussion later if needed.

2) use docker build logic

docker is compatible with git repo (with tag and branch) and multiple archive type.
Doc: https://docs.docker.com/v17.12/edge/engine/reference/commandline/build/#git-repositories
Source: https://github.com/docker/cli/blob/master/cli/command/image/build.go#L190

3) if there is only 1 directory and enter it

to be compatible with archive from github

4) support git repo url without .git suffix only if it’s compatible with #tag and #branch

otherwise, no support for git repo url with .git.


#5

Docker hash system is NOT DETERMINISTIC.
Same computer, same source, use --no-cache flag:

$ docker build https://github.com/mesg-foundation/service-ethereum.git --no-cache
Successfully built a9279453897b
$ docker build https://github.com/mesg-foundation/service-ethereum.git --no-cache
Successfully built ab77935bbf48

To fix this issue, we are going to create a hash based on the service source, more specifically on the archive the core receive in the service.New function.

Because the source contains the mesg.yml, we can also remove the hash calculation from the Service struct to only rely on the source hash!

I will create a small POC of this solution.


#6

POC: https://github.com/mesg-foundation/core/pull/731


#7

From hubert’s comment

It won’t be deterministic here because tarball could be (un)compressed with diffrent algorithms.
Also tarball may have .git directory which will be removed when deploying.

Do we all agree to continue with the implementation of this PR, hash calculated from the archive?
Or should it be calculated from the files?


#8

It should be calculated from archive but if archive then it should be only uncompressed to get the same results.

So the PR is good, but need to fix the compression, but we can do this in different PR.


#9

or it should use the same compression all the time? gzip for instance?


#10

yes, but it’s harder because you have to uncompress all types (bz, xz) and then compress it back to gzip, but we can do this as well (for example before sending to docker)


#11

why not just supporting only tar.gz for now? it’s also widely used.


#12

Just to summarize.
This has been fixed/implemented here https://github.com/mesg-foundation/core/pull/731

  • Remove all .git folder (as files are not deterministic inside)
  • Compute the hash by hashing all the files in the service
  • Hash permission in the file and ignore all the rest (modtime not supported by git)