Akka 1.3.5
This morning while making some provisioning changes, we ended up in a state where two single node clusters were running that pointed to the same database. After fixing the error and starting only a single node, the underlying akka code was failing to recover.
2018-04-22 15:28:45.148 ERROR PersistentShardCoordinator Exception in ReceiveRecover when replaying event type [Akka.Cluster.Sharding.PersistentShardCoordinator+ShardHomeAllocated] with sequence number [40650] for persistenceId [/system/sharding/zCoordinator/singleton/coordinator]
System.ArgumentException: Region [akka://AkkaCluster/system/sharding/z#673731278] not registered
Parameter name: e
at Akka.Cluster.Sharding.PersistentShardCoordinator.State.Updated(IDomainEvent e)
at Akka.Cluster.Sharding.PersistentShardCoordinator.ReceiveRecover(Object message)
at Akka.Actor.ActorBase.AroundReceive(Receive receive, Object message)
at Akka.Persistence.Eventsourced.<>c__DisplayClass92_0.<Recovering>b__0(Receive receive, Object message)
2018-04-22 15:28:45.438 ERROR PersistentShardCoordinator Exception in ReceiveRecover when replaying event type [Akka.Cluster.Sharding.PersistentShardCoordinator+ShardHomeAllocated] with sequence number [50633] for persistenceId [/system/sharding/oCoordinator/singleton/coordinator]
System.ArgumentException: Shard 78 is already allocated
Parameter name: e
at Akka.Cluster.Sharding.PersistentShardCoordinator.State.Updated(IDomainEvent e)
at Akka.Cluster.Sharding.PersistentShardCoordinator.ReceiveRecover(Object message)
at Akka.Actor.ActorBase.AroundReceive(Receive receive, Object message)
at Akka.Persistence.Eventsourced.<>c__DisplayClass92_0.<Recovering>b__0(Receive receive, Object message
2018-04-22 15:29:07.714 ERROR PersistentShardCoordinator Exception in ReceiveRecover when replaying event type [Akka.Cluster.Sharding.PersistentShardCoordinator+ShardHomeAllocated] with sequence number [11774] for persistenceId [/system/sharding/pCoordinator/singleton/coordinator]
System.ArgumentException: Region [akka://AkkaCluster/system/sharding/p#1560991559] not registered
Parameter name: e
at Akka.Cluster.Sharding.PersistentShardCoordinator.State.Updated(IDomainEvent e)
at Akka.Cluster.Sharding.PersistentShardCoordinator.ReceiveRecover(Object message)
at Akka.Actor.ActorBase.AroundReceive(Receive receive, Object message)
at Akka.Persistence.Eventsourced.<>c__DisplayClass92_0.<Recovering>b__0(Receive receive, Object message)
2018-04-22 15:29:10.192 ERROR PersistentShardCoordinator Exception in ReceiveRecover when replaying event type [Akka.Cluster.Sharding.PersistentShardCoordinator+ShardHomeAllocated] with sequence number [3087] for persistenceId [/system/sharding/wCoordinator/singleton/coordinator]
System.ArgumentException: Region [akka://AkkaCluster/system/sharding/w#43619595] not registered
Parameter name: e
at Akka.Cluster.Sharding.PersistentShardCoordinator.State.Updated(IDomainEvent e)
at Akka.Cluster.Sharding.PersistentShardCoordinator.ReceiveRecover(Object message)
at Akka.Actor.ActorBase.AroundReceive(Receive receive, Object message)
at Akka.Persistence.Eventsourced.<>c__DisplayClass92_0.<Recovering>b__0(Receive receive, Object message)
2018-04-22 15:29:15.210 ERROR PersistentShardCoordinator Exception in ReceiveRecover when replaying event type [Akka.Cluster.Sharding.PersistentShardCoordinator+ShardHomeAllocated] with sequence number [11109] for persistenceId [/system/sharding/eCoordinator/singleton/coordinator]
System.ArgumentException: Region [akka://AkkaCluster/system/sharding/e#1963167556] not registered
Parameter name: e
at Akka.Cluster.Sharding.PersistentShardCoordinator.State.Updated(IDomainEvent e)
at Akka.Cluster.Sharding.PersistentShardCoordinator.ReceiveRecover(Object message)
at Akka.Actor.ActorBase.AroundReceive(Receive receive, Object message)
at Akka.Persistence.Eventsourced.<>c__DisplayClass92_0.<Recovering>b__0(Receive receive, Object message)
My expectation is in the case of two nodes attempting to own the same data, that one would eventually see a journal write error as the journal sequence number would not be unique, and that the ActorSystem would then shut itself down. On recovery it should always be able to get back into a consistent state.
In our case, it was caused by a user error, but this could easily occur in the case of a network partition where two nodes claim to own the same underlying dataset.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4