Uploaded image for project: 'Percona Operator for MongoDB'
  1. Percona Operator for MongoDB
  2. K8SPSMDB-599

Multi-thread transaction failure when using the default mongos ClusterIP service


    • Bug
    • Status: Done
    • High
    • Resolution: Fixed
    • 1.10.0
    • 1.12.0
    • None
    • None
    • Yes
    • Yes


      When sharding is enabled (by default), 3 mongos nodes are exposed using a ClusterIP service (cluster is available through cluster-name-mongos.default.svc.cluster.local). That means, that Kubernetes load-balances traffic between mongos pods in a round-robin fashion


      When the cluster is accessed this way and multiple threads execute a transaction using the same driver instance, a transaction may end up being executed on different mongos - that results in an error like this (this is an example using the official C# driver - reproduction steps below):


      $$$ Command insert failed: cannot continue txnId 530 for session 8694022c-4d3f-4926-8e9d-9e6581831511 - uzGxMP6dwuKBbzh5yTEf0uqrg97E2XVlv3D6zMI0/QE= with txnId 536.


      Official driver documentation suggests re-using a single instance of MongoClient (Singleton pattern) in the application as it's designed to be thread-safe, and many existing applications follow that approach according to the recommendation:


      This requirement is also described in the specification from MongoDB (called mongos-pinning): https://github.com/mongodb/specifications/blob/master/source/transactions/transactions.rst#mongos-pinning 


      1. Run v1.10 Operator in any K8s cluster
      2. Deploy a cluster using cr.yaml (make sure there are multiple mongos)
      3. Create a user for application (read/write role to "Test" database)
      4. Build, dockerize and deploy the following .Net Core 6.0 application: https://gist.github.com/mmnosek/02bd09b2a682baae9b93072be9cdff08 (I can help with that). Make sure connection string is correct in advance as it's hardcoded
      5. Deploy the application to K8s
      6. Check application pod's logs = you'll see errors
      7. If there's only single mongos node behind the service, or new MongoClient instance created each time (uncomment https://gist.github.com/mmnosek/02bd09b2a682baae9b93072be9cdff08#file-program-cs-L41) - the problem does not exists



      Given that this approach is recommended, Operator-deployed cluster should support it out of the box. Otherwise we risk that many existing applications will face this issue (which is very hard to debug). Based on the above I think mongos pods should not be exposed through a single service by default.






            andrii.dema Andrii Dema
            michal.nosek Michal Nosek
            0 Vote for this issue
            3 Start watching this issue



              Smart Checklist