Archived

This forum has been archived. Please start a new discussion on GitHub.

IceGrid blocked when updating application

Hi,

We setup a IceGrid cluster that is composed of more than 10 IceGridNodes. Each IceGridNode running on individual server manages about 10 IceBox Services. About more than 100 other servers will access this IceGrid. The request/second of one IceBox Service reaches 2000/s, so the request/second of the whole IceGrid will reach (2000*10*10)/s. Under this pressure, If I use IceGridAdmin to move some services from one server to another server online, The IceGrid including IceGridRegistry and IceGridNodes (move from and to) often step into dead state, and the whole IceGrid step into unavailable state.

We look into the source of Ice-3.3.1 and found a global mutex on database object of IceGridRegistry, when updating application, the mutex will be locked and registry will invoke nodes to synchronize configuration. This often take a long time, at last block all IceGrid.

Could you understand what I mean? I think this is very critical weakness when using Ice in a very large cluster and take up very high access. It will be very helpful and valuable to put in more effect to study this scenario, and We look forward to getting resolution of this issue as soon as possible.

Thank you very very much

Comments

  • benoit
    benoit Rennes, France
    Hi,

    Actually, the IceGrid registry is careful not to lock anything while calling on the IceGrid nodes to notify them that the application was updated so it's not clear to me where you think this isn't the case. The scenario you're describing should definitely work.

    We would need a lot more information in order to figure out what the problem is:
    • operating system version
    • Ice version
    • configuration of your registry and nodes
    • the deployment descriptor of your application
    • your clients proxy configuration (i.e.: whether or not they use connection caching, locator cache timeout, timeouts, etc).
    • eventually, some stack traces of the registry and nodes once a hang occurs.

    You could also try enabling some tracing on the registry and nodes to see if this can help to figure out where the application update process hangs:
    IceGrid.Registry.Trace.Application=1
    IceGrid.Registry.Trace.Server=2
    IceGrid.Node.Trace.Server=3
    

    Finally, for such a deployment and if you need a resolution as soon as possible, you should consider purchasing commercial support as the free support we can provide on these forums is limited. Please contact us at info@zeroc.com for more information on our commercial support.

    Cheers,
    Benoit.
  • Ok,
    As our IceGrid is very large, it may be difficult to reproduce the hang scenario. We'll try to setup an imitation.

    Thank you.