Lync 2013 – RTCSRV Frontend Service failing to start “showing as starting”

Good Morning

This blog post is one to talk through a situation experienced with a client recently. The scenario was the client had vanilla Lync 2013 Enterprise edition implementation with three front ends and a backend SQL. All of the servers were running Windows 2008r2 standard edition.

The installation had gone by design with no issues with the prior steps. on starting the services though i ran into a issue id personally never seen before with the RTCSRV service stuck recycling on ‘starting’ with no ending. (I left this for 2 days and it still didn’t finish’.

So what was causing the problem. This is what i did to track down the issue and resolve the problem. <Its probably worth noting that their is a lot on the tech net and other blog sites around this issue, and in some cases some of the suggestions i found are crazy and would break your Lync environment if ran>

  1. First step was to check event error logs for information..  > this proved fruitless as nothing in the way or a error or warning was showing against the start of the services.
  2. Check the binding of the trust on the certificate including the intermediately chain. > This checked out ok and the certificate was good to use.
  3. Get snooper running. Add SIP Stack and S4 all tracing and stop and start the service for the front-ends again while you have snooper running. NOTE: you will need to kill the RTCSRV process off by command. (first cmd, sc queryex RTCSRV, this will give your the process number. then run taskkill /f /pid <process number>)  > I ran this and again it checked out ok with no errors to be seen.
  4. Run some powershells command just to check the status of this Lync 2013 implementation just to ensure it did actually go ok.

These command were

  • Get-CSManagementstoreReplicationStatus > Check that the readings are true 

  • Get-CSpoolreadinessstate > this was ready

So what was my next step… After consulting other internal consultants on this (Thanks to the Modality Systems Guys), the next natural step was to patch the lync 2013 environment even with the issue. this is something i don’t usually do as i don’t like to muddy the water with patching until I’m happy that the implementation is working as expected. HOWEVER as Tom Arbuthnot mentioned there had been changes in the way things worked within Lync 2013 internally in patch CU4 so it was worth a shot to see if patching fixed this odd issue.

I patched all three Lync FEs and the Backend SQL upto CU January 2014 patch, and still NO the service was stuck on recycling. As with all things as a consultant you follow the same trodden path on investigation so again i set about looking in event logs and snooper. This time though in event logs there was a lot more information to view and one key line of relevance was the below warning showing.

<<<<

Server startup is being delayed because fabric pool manager has not finished initial placement of users.

 

Currently waiting for routing group: {63BB8586-A9D8-5AF2-83FF-B5CE680594C0}.

Number of groups potentially not yet placed: 1.

Total number of groups: 1.

Cause: This is normal during cold-start of a Pool and during server startup.

If you continue to see this message many times, it indicates that insufficient number of Front-Ends are available in the Pool.

Resolution:

During a cold-start of a large Pool it can take upto an hour for the placement process to finish as it needs to populate all the Front-End databases with data from the Backup Store. If the Pool is running and the Front-End is just started, this is normal for some time. If this repeats for a long time, ensure that all the Front-Ends configured for this Pool are up and running. If multiple Front-Ends have been recently decommissioned, run Reset-CsPoolRegistrarState -ResetType QuorumLossRecovery to enable the Pool to recover from Quorum Loss and make progress

 >>>>

What interesting about this is why has quorum got itself in a  twist.?? yes the servers have been rebooted but the issue was already showing before the reboots.? No servers have been removed from the pool so again this shouldn’t have affected the quorum state.

Anyhow i ran the quorum lossRecovery command.

Reset-CsPoolRegistrarState -ResetType QuorumLossRecovery

 AND BOOM.. the frontend services started as expected.

 

KEY TAKEAWAYS 

  1. Always follow the same process in investigation work even after your patched your Lync environment. 
  2. DONT aways follow what people write on tech net forum and either you will end up chasing your tail, or more drastically breaking your already not working Lync environment.

Thats it for this blog post

Thanks

IainS

 

Premicell – The One Point – The good, The Bad, The ugly

Sometimes in life you hear something said which beggars belief. Well today was that day.

I’m in the process of planning a customer upgrade of a onsite low call rate GSM gateway device which allows for call routing over SIM cards for calling. (i.e.: user rings a o2 mobile and instead of this call going out over PSTN it routes to the GSM gateway and utilises the o2 SIM installed and byproxy the SIM’s minutes for the call)

Historically the vendor of choice was http://www.premicell.com who are now owned by http://www.theonepoint.co.uk. so today i called them to discuss the upgrade of the devices only to be shunned once with a excuse the people i needed was on a call and would ring me back in 2 mins.

3 hours later i rang back again, this time to speak to ‘Nathan’ who i was told originally was the person i needed. The guy initially quizzed about why i was ringing?? i then proceeded to tell the guy on the phone i was wanting information on the latest devices and costs as i have a customer wanting to update/upgrade their premicell devices. This guy i could hear was the go between for Nathan, Once the verbal message was passed to Nathan (which i could hear, as Nathan must have been only 4 foot from the phone) the guy then came back to state. Quote ‘Nathan ‘thinks’ he’s to busy to speak with you today’. when i said really, the guy said ‘yeah’.

The one point company must be making millions £££ which allows them to be so rude and turn customers away.!!

Well done Nathan.. you’ve just won yourself a complaint via the one point Managing director and no future sales from me.

Customer service at its worst.!!