Skip to content

[Feature]: When all container startup attempts have failed, wait strategies should be automatically cancelled #6069

@perlun

Description

@perlun

Module

Core

Testcontainers version

1.17.5

Using the latest Testcontainers version?

Yes

Host OS

Linux

Host Arch

amd64

Docker version

Client:
 Version:           20.10.19+dfsg1
 API version:       1.41
 Go version:        go1.19.2
 Git commit:        d85ef84
 Built:             Sat Oct 15 15:47:37 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.19+dfsg1
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.19.2
  Git commit:       c964641
  Built:            Sat Oct 15 15:47:37 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.8~ds1
  GitCommit:        1.6.8~ds1-1+b1
 runc:
  Version:          1.1.4+ds1
  GitCommit:        1.1.4+ds1-1
 docker-init:
  Version:          0.19.0
  GitCommit:

What happened?

The problem we are seeing is that when the container startup has failed the configured number of times (1 in our case), the container startup isn't immediately aborted; wait strategies are (unnecessarily) given time to succeed.

We are using a number of WaitStrategy instances for waiting on certain messages on startup, like this:

        WaitAllStrategy waitAllStrategy = new WaitAllStrategy()
                .withStrategy( new LogMessageContainsWaitStrategy( "org.apache.catalina.startup.Catalina.start Server startup in" ) )
                .withStrategy( new ClientConnectorWaitStrategy( IntegrationTestConstants.CLIENT_CONNECTOR_PORT, timeout ) )
                .withStartupTimeout( timeout );

        for ( String webappPath : webappsToPaths.values() ) {
            waitAllStrategy.withStrategy( new LogMessageContainsWaitStrategy( "Deployment of web application directory [" + webappPath + "] has finished" ) );
        }

        waitingFor( waitAllStrategy );

I thought the problem could be related to the above since the timeout was exactly timeout seconds in this case (configured to be 240 seconds). Changing this 240 setting to e.g. 10 also made it abort faster, after 10 seconds.

But then I removed the waitingFor call altogether, and this made the startup wait for 60 seconds - the default. This was on Testcontainers 1.15.1 (we are still on an older versions because of some bugs related to the new version). I tested upgrading to 1.17.5 but unfortunately dropping the waitingFor call there uses the default wait strategy which seems to error out in our case, so it's hard to verify if the original problem is still present in 1.17.5. Here's the log for that failure:

[null] WARN  2022-10-27 15:09:19.187 [o.t.c.w.i.InternalCommandPortListeningCheck] :: An exception while executing the internal check: Container.ExecResult(exitCode=137, stdout=/proc/net/tcp:   1: 00000000:1F90 00000000:0000 0A 00000000:00000001 00:00000000 00000000  1000        0 2734309 2 00000000ca212241 100 0 0 10 0                   
/proc/net/tcp:   2: 0100007F:B210 0100007F:1F90 05 00000000:00000000 03:00001760 00000000     0        0 0 3 00000000793cfa31                                      
/proc/net/tcp:   3: 0100007F:1F90 0100007F:B210 08 00000000:00000001 00:00000000 00000000     0        0 0 1 0000000065057682 20 4 0 10 -1                         
/proc/net/tcp:   1: 00000000:1F90 00000000:0000 0A 00000000:00000001 00:00000000 00000000  1000        0 2734309 2 00000000ca212241 100 0 0 10 0                   
/proc/net/tcp:   2: 0100007F:B210 0100007F:1F90 05 00000000:00000000 03:00001747 00000000     0        0 0 3 00000000793cfa31                                      
/proc/net/tcp:   3: 0100007F:1F90 0100007F:B210 08 00000000:00000001 00:00000000 00000000     0        0 0 1 0000000065057682 20 4 0 10 -1                         
/proc/net/tcp:   1: 00000000:1F90 00000000:0000 0A 00000000:00000001 00:00000000 00000000  1000        0 2734309 2 00000000ca212241 100 0 0 10 0                   
/proc/net/tcp:   2: 0100007F:B210 0100007F:1F90 05 00000000:00000000 03:0000173A 00000000     0        0 0 3 00000000793cfa31                                      
/proc/net/tcp:   3: 0100007F:1F90 0100007F:B210 08 00000000:00000001 00:00000000 00000000     0        0 0 1 0000000065057682 20 4 0 10 -1                         
/proc/net/tcp:   1: 00000000:1F90 00000000:0000 0A 00000000:00000001 00:00000000 00000000  1000        0 2734309 2 00000000ca212241 100 0 0 10 0                   
/proc/net/tcp:   2: 0100007F:B210 0100007F:1F90 05 00000000:00000000 03:0000172E 00000000     0        0 0 3 00000000793cfa31                                      
/proc/net/tcp:   3: 0100007F:1F90 0100007F:B210 08 00000000:00000001 00:00000000 00000000     0        0 0 1 0000000065057682 20 4 0 10 -1                         
/proc/net/tcp:   1: 00000000:1F90 00000000:0000 0A 00000000:00000001 00:00000000 00000000  1000        0 2734309 2 00000000ca212241 100 0 0 10 0                   
/proc/net/tcp:   2: 0100007F:B210 0100007F:1F90 05 00000000:00000000 03:00001722 00000000     0        0 0 3 00000000793cfa31                                      
/proc/net/tcp:   3: 0100007F:1F90 0100007F:B210 08 00000000:00000001 00:00000000 00000000     0        0 0 1 0000000065057682 20 4 0 10 -1                         
/proc/net/tcp:   1: 00000000:1F90 00000000:0000 0A 00000000:00000001 00:00000000 00000000  1000        0 2734309 2 00000000ca212241 100 0 0 10 0                   
/proc/net/tcp:   2: 0100007F:B210 0100007F:1F90 05 00000000:00000000 03:00001716 00000000     0        0 0 3 00000000793cfa31                                      
/proc/net/tcp:   3: 0100007F:1F90 0100007F:B210 08 00000000:00000001 00:00000000 00000000     0        0 0 1 0000000065057682 20 4 0 10 -1                         
/proc/net/tcp:   1: 00000000:1F90 00000000:0000 0A 00000000:00000001 00:00000000 00000000  1000        0 2734309 2 00000000ca212241 100 0 0 10 0                   
/proc/net/tcp:   2: 0100007F:B210 0100007F:1F90 05 00000000:00000000 03:0000170B 00000000     0        0 0 3 00000000793cfa31                                      
/proc/net/tcp:   3: 0100007F:1F90 0100007F:B210 08 00000000:00000001 00:00000000 00000000     0        0 0 1 0000000065057682 20 4 0 10 -1                         
/proc/net/tcp:   1: 00000000:1F90 00000000:0000 0A 00000000:00000001 00:00000000 00000000  1000        0 2734309 2 00000000ca212241 100 0 0 10 0                   
/proc/net/tcp:   2: 0100007F:B210 0100007F:1F90 05 00000000:00000000 03:00001700 00000000     0        0 0 3 00000000793cfa31                                      
/proc/net/tcp:   3: 0100007F:1F90 0100007F:B210 08 00000000:00000001 00:00000000 00000000     0        0 0 1 0000000065057682 20 4 0 10 -1                         
/proc/net/tcp:   1: 00000000:1F90 00000000:0000 0A 00000000:00000001 00:00000000 00000000  1000        0 2734309 2 00000000ca212241 100 0 0 10 0                   
/proc/net/tcp:   2: 0100007F:B210 0100007F:1F90 05 00000000:00000000 03:000016F4 00000000     0        0 0 3 00000000793cfa31                                      
/proc/net/tcp:   3: 0100007F:1F90 0100007F:B210 08 00000000:00000001 00:00000000 00000000     0        0 0 1 0000000065057682 20 4 0 10 -1                         
/proc/net/tcp:   1: 00000000:1F90 00000000:0000 0A 00000000:00000001 00:00000000 00000000  1000        0 2734309 2 00000000ca212241 100 0 0 10 0                   
/proc/net/tcp:   2: 0100007F:B210 0100007F:1F90 05 00000000:00000000 03:000016E9 00000000     0        0 0 3 00000000793cfa31                                      
/proc/net/tcp:   3: 0100007F:1F90 0100007F:B210 08 00000000:00000001 00:00000000 00000000     0        0 0 1 0000000065057682 20 4 0 10 -1                         
/proc/net/tcp:   1: 00000000:1F90 00000000:0000 0A 00000000:00000001 00:00000000 00000000  1000        0 2734309 2 00000000ca212241 100 0 0 10 0                   
/proc/net/tcp:   2: 0100007F:B210 0100007F:1F90 05 00000000:00000000 03:000016DD 00000000     0        0 0 3 00000000793cfa31                                      
/proc/net/tcp:   4: 0100007F:1F90 0100007F:B210 08 00000000:00000001 00:00000000 00000000     0        0 0 1 0000000065057682 20 4 0 10 -1                         
/proc/net/tcp:   1: 00000000:1F90 00000000:0000 0A 00000000:00000001 00:00000000 00000000  1000        0 2734309 2 00000000ca212241 100 0 0 10 0                   
/proc/net/tcp:   2: 0100007F:B210 0100007F:1F90 05 00000000:00000000 03:000016D2 00000000     0        0 0 3 00000000793cfa31                                      
/proc/net/tcp:  11: 0100007F:1F90 0100007F:B210 08 00000000:00000001 00:00000000 00000000     0        0 0 1 0000000065057682 20 4 0 10 -1                         
/proc/net/tcp:   1: 00000000:1F90 00000000:0000 0A 00000000:00000001 00:00000000 00000000  1000        0 2734309 2 00000000ca212241 100 0 0 10 0                   
/proc/net/tcp:   2: 0100007F:B210 0100007F:1F90 05 00000000:00000000 03:000016C7 00000000     0        0 0 3 00000000793cfa31                                      
/proc/net/tcp:  12: 0100007F:1F90 0100007F:B210 08 00000000:00000001 00:00000000 00000000     0        0 0 1 0000000065057682 20 4 0 10 -1                         
/proc/net/tcp:   1: 00000000:1F90 00000000:0000 0A 00000000:00000001 00:00000000 00000000  1000        0 2734309 2 00000000ca212241 100 0 0 10 0                   
/proc/net/tcp:   2: 0100007F:B210 0100007F:1F90 05 00000000:00000000 03:000016BA 00000000     0        0 0 3 00000000793cfa31                                      
/proc/net/tcp:  12: 0100007F:1F90 0100007F:B210 08 00000000:00000001 00:00000000 00000000     0        0 0 1 0000000065057682 20 4 0 10 -1                         
/proc/net/tcp:   1: 00000000:1F90 00000000:0000 0A 00000000:00000001 00:00000000 00000000  1000        0 2734309 2 00000000ca212241 100 0 0 10 0                   
/proc/net/tcp:   2: 0100007F:B210 0100007F:1F90 05 00000000:00000000 03:000016A2 00000000     0        0 0 3 00000000793cfa31                                      
/proc/net/tcp:  12: 0100007F:1F90 0100007F:B210 08 00000000:00000001 00:00000000 00000000     0        0 0 1 0000000065057682 20 4 0 10 -1                         
/proc/net/tcp:   1: 00000000:1F90 00000000:0000 0A 00000000:00000001 00:00000000 00000000  1000        0 2734309 2 00000000ca212241 100 0 0 10 0                   
/proc/net/tcp:   2: 0100007F:B210 0100007F:1F90 05 00000000:00000000 03:0000167C 00000000     0        0 0 3 00000000793cfa31                                      
/proc/net/tcp:  12: 0100007F:1F90 0100007F:B210 08 00000000:00000001 00:00000000 00000000     0        0 0 1 0000000065057682 20 4 0 10 -1                         
, stderr=/bin/sh: 1: nc: not found
/bin/sh: 1: nc: not found
/bin/bash: connect: Cannot assign requested address
/bin/bash: line 1: /dev/tcp/localhost/9360: Cannot assign requested address
/bin/sh: 1: nc: not found
/bin/bash: connect: Cannot assign requested address
/bin/bash: line 1: /dev/tcp/localhost/9360: Cannot assign requested address
/bin/sh: 1: nc: not found
/bin/bash: connect: Cannot assign requested address
/bin/bash: line 1: /dev/tcp/localhost/9360: Cannot assign requested address
/bin/sh: 1: nc: not found
/bin/bash: connect: Cannot assign requested address
/bin/bash: line 1: /dev/tcp/localhost/9360: Cannot assign requested address
/bin/sh: 1: nc: not found
/bin/bash: connect: Cannot assign requested address
/bin/bash: line 1: /dev/tcp/localhost/9360: Cannot assign requested address
/bin/sh: 1: nc: not found
/bin/bash: connect: Cannot assign requested address
/bin/bash: line 1: /dev/tcp/localhost/9360: Cannot assign requested address
/bin/sh: 1: nc: not found
/bin/bash: connect: Cannot assign requested address
/bin/bash: line 1: /dev/tcp/localhost/9360: Cannot assign requested address
/bin/sh: 1: nc: not found
/bin/bash: connect: Cannot assign requested address
/bin/bash: line 1: /dev/tcp/localhost/9360: Cannot assign requested address
/bin/sh: 1: nc: not found
/bin/bash: connect: Cannot assign requested address
/bin/bash: line 1: /dev/tcp/localhost/9360: Cannot assign requested address
/bin/sh: 1: nc: not found
/bin/bash: connect: Cannot assign requested address
/bin/bash: line 1: /dev/tcp/localhost/9360: Cannot assign requested address
/bin/sh: 1: nc: not found
/bin/bash: connect: Cannot assign requested address
/bin/bash: line 1: /dev/tcp/localhost/9360: Cannot assign requested address
/bin/sh: 1: nc: not found
/bin/bash: connect: Cannot assign requested address
/bin/bash: line 1: /dev/tcp/localhost/9360: Cannot assign requested address
/bin/sh: 1: nc: not found
/bin/bash: connect: Cannot assign requested address
/bin/bash: line 1: /dev/tcp/localhost/9360: Cannot assign requested address
/bin/sh: 1: nc: not found
/bin/bash: connect: Cannot assign requested address
/bin/bash: line 1: /dev/tcp/localhost/9360: Cannot assign requested address
/bin/sh: 1: nc: not found
/bin/bash: connect: Cannot assign requested address
/bin/bash: line 1: /dev/tcp/localhost/9360: Cannot assign requested address
/bin/sh: 1: nc: not found
/bin/bash: connect: Cannot assign requested address
/bin/bash: line 1: /dev/tcp/localhost/9360: Cannot assign requested address
/bin/sh: 1: nc: not found
/bin/bash: connect: Cannot assign requested address
/bin/bash: line 1: /dev/tcp/localhost/9360: Cannot assign requested address
)

This makes the InternalCommandPortListeningCheck incorrectly assume that the container is listening to this port (9360), so it presumes the container is working correctly.

TL;DR

I get the feeling here that the logic is somewhat flawed (or should we say "limited") in Testcontainers currently. If the container fails to startup, this doesn't cancel any waiting WaitStrategy asynchronous operations. Instead, Testcontainers will happily wait until the WaitStrategies are done and only then will the original exception (java.lang.IllegalStateException: Container exited with code 1) be propagated to the caller. This causes unnecessary waiting time; it would be much better if the WaitStrategies could be immediately cancelled and the exception be re-thrown on the calling thread.

Relevant log output

[null] INFO  2022-10-27 15:00:18.022 [?.h.f.5.0.1-java17]] :: Container docker.example.com/example/example-tomcat-8.5:2208.0.1-java17 is starting: 47600b9ebd950dba0b26078b83e9fc1efcc3a2fc18761b48fc4a4b0fcc326d8d
[null] ERROR 2022-10-27 15:00:28.536 [?.h.f.5.0.1-java17]] :: Could not start container
java.lang.IllegalStateException: Container exited with code 1
	at org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:514)
	at org.testcontainers.containers.GenericContainer.lambda$doStart$0(GenericContainer.java:344)
	at org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:81)
	at org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:334)
	at org.testcontainers.containers.GenericContainer.start(GenericContainer.java:322)
	at com.example.test.integration.shared.ExampleWebappsTomcatContainer.start(ExampleWebappsTomcatContainer.java:196)
	at com.example.test.integration.shared.ExampleWebapps.startTomcat(ExampleWebapps.java:383)
	at com.example.test.integration.shared.ExampleWebapps.lambda$start$3(ExampleWebapps.java:132)
	at java.base/java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:718)
	at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)

Caused by: org.testcontainers.containers.ContainerLaunchException: Container startup failed
	at org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:349)
	at org.testcontainers.containers.GenericContainer.start(GenericContainer.java:322)
	at com.example.test.integration.shared.ExampleWebappsTomcatContainer.start(ExampleWebappsTomcatContainer.java:196)
	at com.example.test.integration.shared.ExampleWebapps.startTomcat(ExampleWebapps.java:383)
	at com.example.test.integration.shared.ExampleWebapps.lambda$start$3(ExampleWebapps.java:132)
	at java.base/java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:718)
	at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: org.rnorth.ducttape.RetryCountExceededException: Retry limit hit with exception
	at org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:88)
	at org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:334)
	... 9 more
Caused by: org.testcontainers.containers.ContainerLaunchException: Could not create/start container
	at org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:542)
	at org.testcontainers.containers.GenericContainer.lambda$doStart$0(GenericContainer.java:344)
	at org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:81)
	... 10 more
Caused by: java.lang.IllegalStateException: Container exited with code 1
	at org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:514)
	... 12 more

Additional Information

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions