MSExchange Tranpsort service stuck on “Starting”.
Yesterday while monitoring my email servers (we have configured a live monitoring using SCOM) on Live MAP, the primary hub server alerted on the number of emails stuck in “submission” queue. I quickly logged on to the server and found that submission queue is increasing rapidly. As this server is one of the 2 servers to accept incoming emails from internet and also for outgoing, the queue was growing very rapidly.
Checked the event logs and the first event which caught my attention was “Database corruption”. Oh no…long long ago this used to be a normal issue but I don’t remember when was the last time I worked on a corrupted database.
The logs also showed that Transport service process was crashing continuously
As the users had already started complaining about emails not reaching to recipients, we had a little time to fix this issue and business demanded to restore the services at the earliest.
How we fixed it:
The toughest part was to get any change request approved as environment being under change freeze period. As the first priority was to establish the mail flow, we chalked out the plan to provide alternate path for incoming emails and remove this server from outgoing connector. The second part was easy, removed the server from Send Connectors. For the first part, we borrowed an unused IP address and assigned it to this problematic server. Then we assigned the original IP of this server to another Hub server as a secondary address. Once the replication was done in DNS, the incoming emails started flowing through this server. Return to original issue…
The submission queue was showing 1700+ emails. We decided to follow the well laid path..recycle the service. Voila…the service stuck on “Starting”…waited for a while expecting that it would start. Checked the events …logs are replaying…waiting…process crashed. Phewww
Team huddle…decided to kill the transport process and then reboot the box…done…No luck…
Time for some real stuff. Thanks to my Exchange Guru and God : Ashley …..
Though the email queue database extension is .que, the underlying functionality is still like normal exchange database (.edb)
Used the old friendly utility and found that database is in “Dirty shutdown” state.
Command: eseutil /mh “D:\Exchange Server\Queue\Database\mail.que”
Output: Cropped for better readability
State: Dirty Shutdown
Log Required: 6525364-6525463 (0x6391b4-0x639217)
Checked the Log folder and fortunately all the required log files were present..(Mail queue database uses circular logging by default)
Though a simple option was to kill the transport service and rename the Queue folder and start the service again. This will create a new database and the issue could have been fixed easily but we would have lost those 1700+ emails. So this option was rejected.
At this point, we decided to perform recovery of the database.
- Disabled the Transport service and restarted the server.
- Copied the complete Queue database to a different drive – for safe keeping.
- Ran the below command
Eseutil /r trn /l “D:\Exchange Server\Queue\Logs” /d “D:\Exchange Server\Queue\Database”
The process took some time as the database size was big but completed successfully. Checked the state of the database again and ah…”Clean shutdown”
- Started the transport service and this time it started successfully.
- As it was a recovered database, we decided to flush all the emails and then create a new database. So, paused the service again and let the queue clear.
- Once the queue was clear, stopped the service, renamed the queue folder and started the service again. A new queue database was created.
- Checked the queue status – All healthy
- Swapped the IP which was changed earlier to restore the original mail route and added the server again to connectors.
A good learning…Special thanks to my Lead “Ashley”..you are genius…always…not sometimes J