Suggestion: Fail-fast on REST API port conflict to prevent silent misconfigurations

Currently, when multiple nodes run on the same machine with identical REST API ports, only one can bind successfully. The others start but run without accessible REST APIs, causing:

  1. Heartbeat confusion: Both nodes report with different UniqueIDs, causing alternating producer records

  2. Silent failures: Nodes run unmonitorably without REST API access

  3. Wasted resources: Multiple nodes running unnecessarily in parallel

Discovery:

Through monitoring analysis, I’ve observed numerous misconfigurations where operators accidentally run multiple node instances (master + fallback, or even more) on the same port, leading to unpredictable behavior.

Proposed solution:

Add explicit port binding check at startup. If the port is already in use, the node should fail immediately with a clear error message. This forces operators to configure distinct ports (e.g., 8080, 8081) and prevents silent misconfigurations.

Example:

func main() {
    port := config.RestAPIPort
    
    // Attempt to bind the REST API port
    listener, err := net.Listen("tcp", fmt.Sprintf(":%d", port))
    if err != nil {
        log.Fatalf("FATAL: Port %d is already in use. Cannot start node to prevent conflicts with other instances on the same machine.", port)
        os.Exit(1)
    }
    
    log.Printf("REST API successfully started on port %d", port)
    startNode(listener)
}
3 Likes