-
Don't login as Administrator on any of the network machines.
We've seen this happen too many times - a user wants to do something on a network
server machine, and because the user hasn't got a profile setup on that machine,
he end up using the Administrator password to log on as administrator. This is not
a good thing because:
- We can not tell who currently is logged in remotely, so if another developer wants
to change something on the server, we can't work out who is on it.
- This is particularly the case where a lot of the servers don't allow multiple concurrent
users, so we need to know who to disconnect or kick to free up a remote connection
license.
- A lot of applications are installed as 'administrator', and no one end up remembering
what they installed, and thus the administrator profile is loaded with applications
that most people don't use.
- If you check in/check out files from Source Safe, it may end up using the administrator
account - which means we can't work out who made a change in source safe.
So log on using your own domain account.
-
What is your server reboot/restart policy?
If your servers are down or have to go down during business hours you should notify
the users at least 15 minutes beforehand so you will not get 101 people all asking
you if the computer is down.
For short outages IM is the best method. If you use MSN Messenger, simply open a
new chat window, click the "Invite" button, Select All the recipients,
and then hit OK to add them to the Message window. This is best used for short outages
and when you have a small number of users, or if you will be performing maintenance out of hours and you expect users will not be using the affected systems. If they are not online on Live Messenger or Skype, they can't complain that they were not warned.
For extended or planned outages, or if you have a larger number of users (50+) Email
is the suggested method.
If you send an email it is a good idea to tell the user a way to monitor the network
themselves. We use WhatsUp Gold for this.
e.g.
-
Subject:Network Outage
To:SSWALL
| Planned/Unplanned: | Planned |
| Change Description: | MERMAID – install Windows Server 2008 SP2 at 9PM |
| Risk (see table below): | LOW RISK (LOW Probability and MEDIUM Impact) |
| Reason For Change: | Windows 2008 SP2 is a prerequisite for TFS SharePoint integration |
| Uptime over last month: | 94.059% |
| |
| Planned Outage (mins): | 150 |
| Planned Start Time: | 26/10/2009 9:00 PM |
| Planned Finish Time: | 26/10/2009 11:30 PM |
| |
| Affected Services: | |
| \\Mermaid http://sharepoint.ssw.com.au http://intranet.ssw.com.au http://projects.ssw.com.au |
| |
| Detailed Change Plan: | |
| 1) Lock out users via IIS 2) Backup server 3) Install Service Pack (Windows Server 2008 SP2) 4) Reboot server
5) Follow test plan 6) Based on result of test plan, follow backout plan if procedure failed 7) Procedure completed |
| |
| Test Plan: | |
| 1) Check Event log for errors 2) Check each affect service is running
3) Call test users to start “Test Please” on the affect services 4) Get result of user “Test Please” by email by 11:15 PM |
| |
| Backout Plan: | |
| 1) Restore server from backup |
| Note: | This is as per rule What is your server reboot/restart policy? |
Risk Lookup Table by Probability and Impact:
|
Risk
|
|
|
Probability
|
|
Low
|
Medium
|
High
|
Unknown
|
|
Impact
|
Low
|
Low Risk
|
Low Risk
|
Low Risk
|
Medium Risk
|
|
Medium
|
Low Risk
|
Medium Risk
|
Medium Risk
|
High Risk
|
|
High
|
Medium Risk
|
High Risk
|
High Risk
|
High Risk
|
|
Unknown
|
Medium Risk
|
High Risk
|
High Risk
|
High Risk
|
Note: The following servers will be affected (if this is a HyperV host)

http://owl/NmConsole/Reports/Full/Group/Performance/RptGroupPingAvailability/
RptGroupPingAvailability.asp?_nDeviceGroupID=-1&_sStartDate=2/11/2008&_sEndDate=
3/11/2008&_nStartTime=1205154000000&_nEndTime=1205154000000&RptGroupPingAvailability.
oTablePingAvail=&RptGroupPingAvailability.oTablePingAvailSumamry=&_nDeviceID=71&
DeviceStatus.nWorkspaceID=10012&_nReportID=145&_oComboDateRange=Custom&_sStartTime=
12:00%20AM&_sEndTime=12:00%20AM

Immediately before the scheduled downtime, check for logged in users, file access
and database connections.
Users
Open ‘Windows Task Manager’ (Run > taskmgr) and select the ‘Users’
tab. Check with users if they have active connections, then have them log off.
-
-
Figure: Connected users can be viewe in Task Manager
Files
Open ‘Computer Management’ (Run > compmgmt.msc), then ‘System
Tools > Shared Folders’. Check ‘Session’ and ‘Open Files’
for user connections.
-
-
Figure: Computer Management 'Open Files' View
Database
Open SQL Server Management Studio on the server. Connect to the local SQL Server.
Expand ‘Management’ and double-click ‘Activity Manager’.
-
-
Figure: SQL Management Studio 'Active Connections' View
Once these have been checked for active users, and users have logged off, maintenance
can be carried out.
Restarts should only be performed during the following time periods
- Between 7am and 7:05am
- Between 1pm and 1:05pm
- Between 7pm and 7:05pm
If a scheduled shutdown is required, use the PsShutdown utility from
Microsoft's Sysinternals page.
Reply Done when you finish the task
-
Do you keep your file servers clean?
How often do you find files on your network file server that clearly shouldn't be
there? Developers are notorious for creating temporary files and littering your
file system with them. So how can you identify exactly who created or modified the
file, and when?
-
-
Figure: Who created this file?
-
-
Figure: Terminal into your file server using Terminal Services
-
-
Figure: It was Jatin!
The easiest way is to configure Windows file auditing.
Thankfully, Windows XP and Server come with built-in file auditing. Any changes,
creates and deletes can be logged to your system event log. Here's how to set it
up.
How to implement auditing on your file server
- Terminal Server into the file server
- In Windows Explorer, locate the directory you want to configure logging for (e.g.
C:\Inetpub\wwwroot for logging changes to your web site files)
- Select Security tab | Advanced
-
-
Figure: Select the folder you want to configure auditing for
- Click the Auditing tab
- Select the users whose usage you want to monitor (usually all users, so select
Everyone)
-
-
Figure: Select Everyone so that anyone who modifies any of the files will be logged
- Select what you want to monitor. For best performance, we only tick the options
in shown in the figure below - there's no need to log when someone opens a file.
-
-
Figure: Select these 4 options (only audit the events you need to audit - there's
no need to log when someone opens a file)
- Click OK and OK again to apply the changes. The process may take some
time depending on the number of subfolders and files selected.
Now you need to configure the system event log.
- Open Control Panel->Administrative Tools->Event Viewer
- Right-click the Security node and Control Panel | Administrative Tools |
Event Viewer
- Right-click the sure Overwrite events as needed is checked.
-
-
Figure: Keep your log file to about 250MB - otherwise your system performance may
suffer
Checking who created the file
Now test to see if auditing is working.
- On the server, create a file called "test.aspx" somewhere in the path
that is being audited
- Open Control Panel->Administrative Tools->Event Viewer
- Select the Security node, and notice the entries that have been created.
They will have a similar format to the figure below.
-
-
Figure: Any creates, deletes and updates now get logged to the Event Log
That's all! It is also great for finding out who accidentally deleted files from
the file system.
Furthermore, we can dump the event log to an Access or SQL Server database to make
it easier to handle. Here is how to do it:
- Download the scripts: one for Access database
and the other for SQL Server.
- Find and change the strEventDBConn variable to your connection string, also, modify
strEventDB and tblEvents variable to your database name and table name.
- Write down the names of the servers to monitor in EventHosts.txt.
Done, now you need only double click to start it.
-
-
Figure: Caught an action on remote server and logged it to database
This script is originally from
http://pubs.logicalexpressions.com/pub0009/LPMArticle.asp?ID=340
.
Do you know the way of fixing issue of running out of disk space?
How to free up more disk space on servers?
- Check sql backups
- Check sql logs
- Use TreeSizePRO to find disk spaces issues
- User CCleaner to automatically clean any temporary or junk files on the server
-
Do you have your UPS send an email when it kicks in?
Of course all your servers are on UPS. (If not they should be!)
How do you know that all the money you paid for a UPS was worth it thought? How many times
has it saved our servers? How long do the battery's last for before they go flat? Why
was a server off when you came in in the morning?
If you get your UPS to email you when an event occurs then you will have answers
to these questions.
The problem is that there is no uniform software that will work with all UPS's as
they all have there own format.
All UPS's come with management software that can perform these actions. You just needs
to install it.
We use a MGE UPS so we use Personal Solution Pac which allows you to run script
files on events. We just call a script file which will send us an email.

-
Is your wireless hardware reliable?
When purchasing new network hardware you should always choose the most reliable option.
At SSW we have discovered that:
- Linksys is the best.
Google Answers helped in our decision - Linksys is the safer choice based on user ratings. http://answers.google.com/answers/threadview?id=2588
- Netgear is OK. The hardware works, the drivers work, and the support is excellent.
However they tend to be “simple” devices. They generally lack advanced features and are aimed more toward the home user market.

- DLink is NOT recommended. We will never buy this brand ever again
They tend not to last longer than the warranty period
More Links:
-
What is your password security policy?
We recommend enforcing strict password policies.
Below is a capture of the settings we use:

When passwords have to be changed they must meet the following minimum requirements:
- Not contain all or part of the user's account name
- Be at least six characters in length
- Contain characters from three of the following four categories:
- English uppercase characters (A through Z)
- English lowercase characters (a through z)
- Base 10 digits (0 through 9)
- Non alphanumeric characters (e.g., !, $, #, %)
Complexity requirements are enforced when passwords are changed or created.
Every 180 days clients will be required to change their password, they can change
it when:
- Logon to their computer
- Terminal server to another computer
- VPN
This allows users to change their password by making a VPN connection to the office.
We also enforce a lockout policy so if a user gets their password wrong 5 times,
their account will be locked out for 15 minutes
If you want to change your password
sooner, press [ctrl] [alt] [delete] then click "Change Password" button.
-
Occasionally, one server and its drives will not have sufficient space to store
all related files in a network share. For example, you may have a "SetupFiles"
directory that stores all Setup executables on your network e.g. \\bee\SetupFiles.
There are problems with this approach.
-
You will run out of space - which means you will have to copy or move old (but still
used) setup files around to other drives (\\bee\d$\SetupOld\ ) or other machines
e.g. \\tuna\SetupFiles. This fragmentation of your setup files can cause confusion
for your users.
-
When you retire or rename the old server, links to the old server location will
not work
So how do you get around this problem? The answer is in the Distributed File System
(DFS). Instead of having several server-specific file share locations, you
can have a domain-wide setup location that offers a seamless experience to your
users. DFS will even track a history of when and where file locations were moved.

Figure: The Distributed File System consolidates many separate
file shares into one convenient location for your users
-
Do you have a consistent naming convention for each machine?
When we configure networks we give all computers in the company a naming theme like
Buildings, Cars, Countries, Colours, Fruits, Vegetables.
In our environment we have adopted the animal kingdom.
-
Do you secure your wireless connection?
Wireless networks are everywhere now. You can’t drive down the street without
finding a network which is unsecure.
At home this is fine. Who cares if your next-door neighbour uses a bit of bandwidth.
However in an office environment there is a lot more to lose than a bit of bandwidth.
It is vital that wireless is kept secure.
WEP, No SSID broadcast, allowed MAC addresses are all OK but these are more home
security. For the office you need something a bit more robust.
I am all for the use of using Radius to integrate with your Active Directory. This
is safe, secure, and requires little to NO setup on the client computer.
http://www.hansenonline.net/Networking/wlanradius.html
Because the wireless is tied with Active Directory. You cannot give clients wireless
access.
-
Do you have servers around the world and use GEO based DNS IP’s?
Having a very popular website is great. The only problem is where to host it. If
you host it in your local country then it is very fast for your local market but
what about the market on the other side of the world?
The solution is to have 2 or more servers and direct users to the server that is
the closest to them. This is possible with the help of Bind DNS server and a list
of IP addresses and the country of origin.
The beauty of this solution is that it is not application specific. Anything like
VoIP or game servers can be directed to their local server.
Follow the directions found in this article http://peen.net/2006/03/08/geo-dns to
setup your Bind config file. The only problem is that the PHP script supplied in
that article does not work correctly. It cannot convert the number based IP to the
real IP and subnet. Because of this I had to create my own little app to make the
file for Bind to use. You can find it and get the
source code.
You can download a free list of IP to country’s from
http://software77.net/cgi-bin/ip-country/geo-ip.pl
How it works:
Once you have made your acl files you can use views in the bind configuration
to specify which zone file to use for each group of IP’s. Each zone file would
have the relevant IP information for that target segment of the world.
Imagine you have 3 zone files: one for europe, one for the america’s and one
for the rest of the world. You simple edit named.conf.local to include the acls
for europe and the america’s. E.g.:
-
include "/etc/bind/named.conf.options";
include "/etc/bind/acl-europe_east.inc";
include "/etc/bind/acl-europe_sout.inc";
include "/etc/bind/acl-europe_west.inc";
include "/etc/bind/acl-europe_nort.inc";
include "/etc/bind/acl-america_cari.inc";
include "/etc/bind/acl-america_cent.inc";
include "/etc/bind/acl-america_nort.inc";
include "/etc/bind/acl-america_sout.inc";
Next you create separate views. One for europe, one for the america’s
and one for everyone else.
view "europe" {
match-clients {
europe_east;
europe_nort;
europe_sout;
europe_west
};
zone "peen.net" {
type master;
file "/etc/bind/europe/db.peen.net";
};
};
view "americas" {
match-clients {
america_cari;
america_nort;
america_sout;
america_cent
};
zone "peen.net" {
type master;
file "/etc/bind/americas/db.peen.net";
};
};
view "others" {
match-clients { any; };
zone "peen.net" {
type master;
file "/etc/bind/others/db.peen.net";
};
};
-
Logon - Do you have a companywide word template?
-

- Figure:
Bad Example - creating an email/document does not have the company templates
-

- Figure:
Good Example - creating an email/document with the company templates
A companywide template will be implemented, so users have automatic footers to save
time and give better branding.
How to have a companywide word template:
- Modify your Normal.dotm file to have the headings and format that you want for Word
document
- Create standard employee email footer files e.g. JamesZhou.htm or JamesZhou.txt
- Put the files on a network location - this is the place that will have the master
copies
e.g. \\ssw\ant\standardsinternal\template\
- Have a logon script which is setup through Group policy that will copy the file
to the users computer when they logon.
-

- Figure:
Good Example - company templates
@ECHO OFF
ECHO This is in the default group policy - user section
REM Copy template from network share
XCOPY /Y "\\ant\ssw\StandardsInternal\Templates" "%APPDATA%\Microsoft\Templates\"
ECHO. Templates Copied
REM Copy user outlook template from network share
COPY "\\ant\ssw\StandardsInternal\Templates\Outlook\SSW_%UserName%.htm" "%APPDATA%\Microsoft\Signatures\SSW.htm"
COPY "\\ant\ssw\StandardsInternal\Templates\Outlook\SSW_%UserName%.rtf" "%APPDATA%\Microsoft\Signatures\SSW.rtf"
COPY "\\ant\ssw\StandardsInternal\Templates\Outlook\SSW_%UserName%.txt" "%APPDATA%\Microsoft\Signatures\SSW.txt"
ECHO. Outlook Template Updated
ECHO. Write to log file
ECHO EXIT|%COMSPEC%/kPROMPT SSW Startup (\\cow\sysvol\sydney.ssw.com.au\Policies\{31B2F340-016D-11D2-945F-00C04FB984F9}\User\Scripts\Logon\sswlogon.cmd) Script Ran at $d $t >> C:\SSWLogin.log
:EXIT
Figure above: This is how our script looks like in microsoft word.
Note: We don't want people using .RTF emails so we include this message in SSW.rtf.
Be aware that we don't want to using RTF because of
Remove RTF as an option or explain when it is a good choice
For more information of why we need to modify the Normal.dotm, you can access the
website below.
http://office.microsoft.com/en-us/word/HA100307561033.aspx
-
Do you assume catastrophic failure before touching a server?
If you are going to install a service pack on a machine, moving a virtual server
to another drive or doing any critical system level changes, make sure you back
up your machine first. For virtualized machine, make sure you back up all related
files, including vhd, avhd etc.
You should already assume there could be catastrophic failure after these kind of
operations and you should always be prepared for them by having a full backup somewhere.
This is especially important when you are working your production or critical servers.
-
Do you monitor the uptimes of all your servers daily?
It is important that the network administrator can easily find out how reliable his servers are.
This can be achieved using tools like What's Up Gold (Add a link to What's up gold better third party software) to monitor the uptime and
SQL Reporting Services to create a report showing server uptime.
Here is a report that we use to monitor our servers on a daily basis
-
-
Figure: Good example - We can easily see the uptime of all our servers
Do you know the right notification for backups?
You need to log a record on success so you can check for backups that have failed.
-
-
Figure: Bad example - an email is sent on completion
-
-
Figure: Good example - a record is logged on completion
Now you are able to be aware of missing backups. You can make automatically notification based on above table e.g.
by SQL Reporting Services data-driven subscription
Do you check your DNS settings?
w3dt.net supplies a DNS report tool which can help administrator to troubleshoot DNS issues with domains, name servers, SOA, and other information.
We need to get all green ticks except for:
- Missing (stealth) nameservers
- Missing nameservers 2
-
Do you know when to scale out your servers and when to keep it as a standalone server?
At SSW, we recommend using virtualized standalone servers because:
- If one server goes down it does not affect other servers (e.g. a centralized SQL server fails and brings down: CRM, TFS, Reports, Web Server)
- You can just copy the VPC to another computer and it just works, no need to worry about reconfiguring the SQL connection string or web services
- You can just backup the VPC
However, you should scale out your servers if:
- You want the best performance (e.g. A different server for SQL backend and Web frontend)
-
Do you know not to delete expired domain users?
When an employee leaves or a domain account expires, disable the account, never delete it:
- Some LOB application such as CRM maintain a reference to the AD domain user GUID
- During the migration or restoration of CRM, users stored in the database are verified against AD and problems may occur if they no longer exist
-
Do you send notification if you cannot access essential services?
Some of the network services, like TFS/Exchange/Database are essential for our business and people will not be able to work if any of these services is down or inaccessible.
When such thing happens, the first thing you need to do is to send notification to SysAdmins so they can start investigating the problem, and you should cc your project manager because those issues will stop you getting tasks done.