Archived

This forum has been archived. Please start a new discussion on GitHub.

Load Balancing

Hello Ice Guys!

I was working on the server tonight, and I got the idea that I may be coupling my objects too much for what I want to achieve. I decided to pause and write this post.

Because of the dynamics of the App Store, I have no clue how many clients we will be required to support. One day it could be five clients and three hours later 1k. Obviously, we need to be prepared to deploy more instances transparently if/when the need arises. I'll layout what we need to achieve:

Clients are presented with a list of game rooms (IceStorm topics) when they obtain a Glacier2 session. Now, I guess it's easy to load balance the game rooms since I can just query IceGrid for the least used IceStorm instance and create new rooms there.

Also, I'm sure IceGrid will allow me to deploy multiple instances of Glacier2 and choose a free one for the client. I haven't looked into that just yet though.

However, it is unclear how I load balance the RoomManager in the middle. The RoomManager maintains a list of rooms, subscribes and unsubscribes clients from its respective topic, hands out room IDs to clients, etc. It also acts as a facade (Benoit :) ) handing out RoomParticipantPrxs to clients upon joining a room.

One solution is to simply create multiple instances of the RoomManager, and when a client connects, choose the least loaded RoomManager; however, this is unacceptable since it would partition clients. Clients would only see the rooms on their RoomManager instance. Do you see what I'm getting at?

As always, any advice is greatly appreciated! Also, let me know if something is unclear. After all it's 2AM!

Thanks a lot,
Pete

Comments

  • matthew
    matthew NL, Canada
    sly596 wrote: »
    ...
    However, it is unclear how I load balance the RoomManager in the middle. The RoomManager maintains a list of rooms, subscribes and unsubscribes clients from its respective topic, hands out room IDs to clients, etc. It also acts as a facade (Benoit :) ) handing out RoomParticipantPrxs to clients upon joining a room.

    One solution is to simply create multiple instances of the RoomManager, and when a client connects, choose the least loaded RoomManager; however, this is unacceptable since it would partition clients. Clients would only see the rooms on their RoomManager instance. Do you see what I'm getting at?

    As always, any advice is greatly appreciated! Also, let me know if something is unclear. After all it's 2AM!

    Thanks a lot,
    Pete

    Firstly, you should determine whether its really necessary to have multiple room managers for load balancing purposes. If a room manager is not a very "busy" objects (that is a client doesn't call on it very often), then it is highly likely that you can handle many thousands of clients with a single room manager. If you do need to split it, you will still need a central "room manager" repository to ensure the partitioning does not occur. You can see my chat server series of articles in Connections for some examples of this.
  • Hi Matthew,

    Yes, I've learned a lot from your articles. Thank you very much for those.

    I've been doing a lot of thinking and the attached diagram is what I've come up with. I included the MasterRoomManager that you suggested. Considering some really horrible software has sold > 200k copies in the AppStore, I think it's extremely important to be ready and able to deploy multiple RoomManagers if need be. Additionally, the RoomManager is a busy object seeings the client must use it to publish to IceStorm.

    The diagram shows three IceGrid nodes across the US. With this setup, as Grid load rises, we could deploy additional nodes identical to those of Dallas and Boston. If this was a local network, the diagram would show the RoomManagers connecting to both IceStorm instances. I limited it to only talking to the IceStorm instance running locally to reduce latency. Is this setup a "smart" one?

    Also, as you may expect from my diagram, I have a few questions about Glacier2. First of all, I suspect in this setup that the third IceGrid node in Chicago will also require a Glacier. Am I correct in saying that every IceGrid node will require a Glacier running on it? Next, how can a client possibly choose one of the Glaciers? The client would first need a session from a Glacier to talk to the registry! Clearly, I am missing something.

    I look forward to fixing my diagram with your input.

    Thanks,
    Pete
  • benoit
    benoit Rennes, France
    Hi Pete,

    Yes, it best to collocate services (IceStorm and your chat room manager) to avoid un-necessary latency overhead by using a slower network.

    The IceGrid nodes don't currently support connecting to the IceGrid registry through Glacier2. You will need to open ports on the firewall of each location for the IceGrid node endpoints and the IceGrid registry endpoints (at least server, internal and client endpoints). Of course, this will mean that anybody on the internet can connect to these ports so you need to make sure to use secure endpoints to restrict access to authorized components only. For more information on how to secure the IceGrid endpoints, see here in the Ice manual and the demo/IceGrid/secure demo from your Ice distribution.

    If I understand it correctly, your clients first need to figure out where the room is located (Boston or Dallas) to connect to one of them. It seems to me that they should first connect to the master room manager service in Chicago and query the room location (the Glacier2 instance they should connect to in order to access the room and the proxy of the room). This means that you'll need a Glacier2 instance running in Chicago for your clients to access the room manager master and that your clients will first establish a session with this Glacier2 instance and then with one of the Glacier2 instance located in Dallas or Boston.

    Cheers,
    Benoit.
  • Hi Benoit,

    Thanks! I updated my diagram using your feedback. Assume everything, except the clients, has a direct connection to the registry. I just left out the lines for redundancy.

    This setup appears like it would work well. Is it optimal? Unfortunately, still if the Chicago node goes down, the entire grid is screwed.

    I think that I have a good understanding of how the grid will interact now. Although, there is still one detail that I'm unsure of. When a client establishes a session with the Glacier for the MasterRoomManager, the client will end up with a proxy to one of the RoomManagers. Now, to use this proxy, does the client have to destroy his session with the Glacier in Chicago and then establish a new one with the new Glacier? Or can the client use the session he has already established with the old Glacier to access the RoomManager with the new Glacier?

    Thanks,
    Pete
  • benoit
    benoit Rennes, France
    The client should establish a new session with the other Glacier2 router, otherwise all the client communications would have to go through the Chicago Glacier2 router which is probably not what you want. It would also only work if the Chicago Glacier2 router had direct access to the servers located at the other locations.

    If you don't want the room manager master to be a single point of failure you could eventually replicate it. The master-slave replication article from Issue #23 should give you some ideas on how to do this. That being said, this is probably something I would do only once everything else works :).

    Cheers,
    Benoit.
  • Hi Benoit,

    Your suggestion about replication gave me an idea. Do you think it may be better to completely eliminate the MasterRoomManager and instead just replicate the RoomManagers? This seems like essentially the same thing since replication requires a master and slaves.

    Thanks,
    Pete
  • matthew
    matthew NL, Canada
    I would really advise against replication to avoid single point of failure until your system really needs it. It is a very non-trivial exercise.
  • Nevermind, you cannot remove the MasterRoomManager without client partitioning. I'm not sure what I was thinking.

    Benoit, great article on replication. I will probably attempt it after everything is up and running as you suggested!

    Thanks again to both of you!
    Pete
  • janos
    janos Germany

    Hi Pete,

    although it's already a couple of years old, but if it's still an issue, here's what we did.
    You should look into a Hazelcast shared cluster. That's exactly what we are using in our setup to avoid partitioning. Works just great.

    Best,
    Janos