Quantcast
Channel: Data Protector Practitioners Forum topics
Viewing all articles
Browse latest Browse all 3189

DP tip when using many Media Agents in parallel (load balanced)

$
0
0

We backup a large amount of virtual servers (>300), over LAN.

This in one DP 7.03 cell; one Cell Manager and two media servers, all running Windows 2008R2.

Backups are written to VLS’s and later on copied to LTO tape.

 

For ease of management we created backup specs. based on client function, packing together many servers in one spec. E.g. production servers packed together, development server packed together, etc.

This way, we created a few backup specs. with some of them backing up between 50 and 100 cliënts.

 

Backing up to a VLS, and later on copy that to LTO, proofed fastest when using lowest drive concurrency (=1) and use as many drives possible in parallel (some up to 30)

Even more, some of these large backup specs. run concurrently.

All this leads to the fact that, at one time, a media server can be driving some 60 VLS tape drives.

 

This is where problems with one of the backup specs. began to arise.

A backup spec. started to fail with some client mountpoints hanging forever or mountpoints running into the default 8400 seconds timeout.

Usually these things end up searching for a needle in a haystack.

But just for once, Data Protector gave us some reference to what was going on.

 

In de DP debug.log on the media server concerning, an application exception message error code could be found, occurring exactly 8400 seconds before the first timeout message arose in the session log. The exception message error code is 0xc0000142.

Googling this error code showed us a Microsoft Knowledge Base article #KB824422.

This article talks about servers running into troubles when running many processes concurrently. Although this article is giving reference to Windows NT, it still applies to W2008R2!

 

Consider a Windows 2008R2 media server driving 60 VLS drives in parallel. That is 60 BMA.EXE processes. But, it will also be running 60 UMA.EXE processes then. Above that, you will also see 60 CONHOST.EXE processes running. That is some 180 processes, next to all the other Windows processes running.

Seemed legit to think that this cloud be the culprit for the backup.

 

And indeed, after having followed the advice given in the Microsoft article, the backup spec. gave us no more trouble. We raised the mentioned non-interactive value from 768 to 1024 which proofed enough to be in our case. Maybe, in time, we will have to raise that again.


Viewing all articles
Browse latest Browse all 3189

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>