What Do I Do With My Data File?
Accessing Files on the Princeton Network
You need to use the data from the General Social Survey 1972-2000 Cumulative file. You learn from the DSS web page that these data are available through the Princeton network at \\dss1\DataLib\disk1\datalib\gss2000. The preceding string is called the “network address” or “path”. Here’s how to proceed (note that these instructions are also on the DSS web page at http://dss1.princeton.edu/~data/access.html).
There are two basic methods of accessing files on the Princeton network.
Mapping the network drive:
1.Right-click on the "Network Neighborhood" ("Princeton Network" for Windows 2000) icon (usually on the top left of your screen).
2.Select "Map Network drive".
3.Choose any drive letter you wish.
4.In the "Path" ("Folder" for Windows 2000) box enter the network address (i.e., \\dss1\DataLib) 5.You do not need to enter anything in the "Connect As" box.
6.Make sure that the "Reconnect at logon" box is NOT checked!
7.Click "OK", and a window should pop up after a moment. If a window does not pop up, then you may need to double click on the "My Computer" icon.
Browsing the Network:
1.Click on the "Start" button on the bottom left corner of your screen. 2.Select "Run".
3.Enter the network address (i.e., \\dss1\DataLib).
4.Click "OK", and a window should pop up after a moment. If a window does not pop up, then you may need to double click on the "My Computer" icon.
Arizona Accounts and “H:” Drives
This seems like a good place to detour slightly and discuss Arizona accounts and “H:” drives. As you will see, these are essentially the same thing. Everyone with a regular Princeton id has an Arizona account, which allows him/her to use the resources on the Arizona Unix server, including a certain amount of storage space on an Arizona disk (for students the standard space allocation is 20 megabytes). OIT has set up a way for users to access their Arizona accounts (save files, run programs, etc.) using Windows, via the Princeton network. In their clusters, this takes the form of a network drive that maps automatically when a user logs on to the network. Since it maps to the letter “H”, OIT refers to it as your “H:” drive, but there is nothing special about “H”; they could have chosen any other letter as well. I have found that students frequently don’t understand this setup – they don’t realize that their “H:” drive and their Arizona account
are the same thing, or they think there’s something special about “H:”, among other misconceptions.
On Princeton computers not maintained by OIT, the “H:” drive link is not automatic at logon, but can be easily established by following the instructions for mapping a network drive above, with a few changes. Since there is nothing special about “H:”, you can select any drive letter you wish.
The path to your “H:” drive is \\smbserve\USERID, where USERID is your email id. In the
“Connect As” box enter your email id. You will be asked for a password. This can create confusion, because some people have two different passwords, one to log on to Arizona using telnet (and read email), and one to connect to the Princeton network/log on in an OIT cluster. If this is true for you (or the student you’re assisting), use the network/cluster password.
If you (the student) connect to your “H:” drive on a public machine, make sure that you (he/she) disconnect before you leave, otherwise anyone who uses the machine after you will have access to your personal files. To disconnect, just right-click on “Network Neighborhood” or “Princeton
Network”, select “Disconnect Network Drive…”, click on your “H:” drive, and click “OK”.
It is possible for students to get additional storage space on their Arizona accounts/”H:” drives by requesting it from OIT. Students doing data analysis on large files for their JPs, senior theses, or graduate research, frequently need additional space. Although there is a charge involved, academic departments will pay it for students who require the space for their research. Advise students to see their department’s undergraduate or graduate secretary about funding. They can do this before or after they contact OIT. The sooner they take care of this, the more quickly and smoothly their research can proceed.
Transferring Files Over the Network
Once you have access to the network drive, transferring files is simply a matter of copying and pasting in “My Computer” or Windows Explorer. I strongly suggest copying whatever files you need to a local drive (e.g., your hard drive) before trying to open or work with them. In the case of zipped files, this is normally essential, since you are not likely to have permission to write (the unzipped files) on the network drive. Which brings us to . . .
I suggest the following procedure for using WinZip to unzip files. Other approaches may work, but I’ve found this the most reliable:
1. Open “My Computer” or Windows Explorer.
2. Right-click on the file you want to unzip.
3. Click “Extract to…”.
4. Click “I Agree”.
5. Select the folder in which you wish to place the unzipped file.
6. Click “Extract”.
Downloading Files From the Web
There are several things you need to be careful of when downloading files from the web, if you want to make sure you receive your data in a usable form. Since there are more pitfalls in Netscape than in Internet Explorer, I will present a Netscape example, but the following applies at least to some extent to both.
Say you need to download a set of files from ICPSR: a pdf codebook, a text file containing an SPSS program, and a zipped data file. Use the following procedure for each file:
1. Right-click on the link (never left-click).
2. Click “Save Link As…”.
3. Change “Save as type” to “Plain text” or “All files” (you would hardly ever want to use
4. Enclose the name you want to give the file, including the extension, in double quotes. 5. Click “Save”.
Using the /scratch Directory on Arizona
The /scratch directory on Arizona is a public directory for temporary storage of files. Anyone with an Arizona account, i.e., a regular Princeton id, can use it. The catch is that files are automatically deleted after they have been on /scratch for 24 hours. One of the options DSS gives people who extract files from our on-line databases is to put them on /scratch. Other than that, /scratch is useful as a “halfway house” for data that you want to transfer from one PC to another. It is an alternative to floppies, zip disks, writeable CDs, etc., and for really large amounts of data it may be the only way. (The /scratch directory holds several gigabytes of data. Once in a while it gets full, but normally space is not an issue.) The way to transfer data between /scratch and your PC is to use FTP (file transfer protocol), our next topic.
FTP Between Unix (Arizona) and Your PC
The standard software package for FTP between a PC and a remote server (such as Arizona) is called WS_FTP. Everyone at Princeton should have it on their computer. The following are instructions for using WS_FTP to upload a file from a PC to the /scratch directory on Arizona:
1. Open WS_FTP and log on to Arizona.
2. Use the point and click controls to change to the local (PC) directory where your file is, or
click the “ChgDir” box on the local (left) panel and type the directory’s path.
3. Click the “ChgDir” box on the remote (right) panel and type “/scratch” (without the quotes).
4. If you want to create your own temporary directory on /scratch (a good idea), click the
“MkDir” box on the right panel, and type the name you want to give your directory.
5. Highlight the file(s) you want to transfer in the left panel.
6. Click the “ASCII” or “Binary” circle, depending on the type of files you are transferring.
Select “ASCII” only if you are sending plain text files; for all other files select “Binary”. In
most cases, “Binary” will work fine for plain text files too, but the reverse is not true.
7. Click the right-pointing arrow in the middle of the window to send your files.
To download files from /scratch to your PC, just follow the same procedures switching left and right, ending by clicking the left-pointing arrow to send your files.
You can use WS_FTP to transfer files between any remote server and your PC. You’re not just limited to /scratch (or Arizona).
Parsing Files of Text Data for Reading Into Excel
There are two ways of organizing data in a text file: “delimited” and “fixed width.” A delimited text file uses some special character, most commonly a comma or a tab, to divide elements of data, called “variables,” from each other. A quick example should make this clear. Here are the
first four lines of a sample comma-delimited file. The first line contains the variable names, the next three lines contain the first three lines of data:
A fixed width file, in contrast, has no “delimiter” (divider) characters. Instead, variable values are placed in the same “columns” (places on the page) for each line of data. Here’s an example of how the same data would look in a fixed width file (for obscure technical reasons, fixed width files will not have a line for the variable names):
John Smith 42250
Mary Johnson 66120
Note that the data is all packed together. The blanks between the names and ages of the first two people are called “filler”. They are there for two reasons: one, to leave room for people with longer names (like our third person), and two, to force each value of each variable to occupy the same columns. In this example, we would say that the variable “Name” is in columns 1-30,
“Age” is in columns 31-32, and “Weight” is in columns 33-35.
The type you are more likely to encounter (and the easier to deal with) is delimited. To read a delimited text file into Excel, follow these steps:
1. Click “File”, then “Open”.
2. Change the “Files of type” box to “Text Files”.
3. Highlight the file you want to open, and click “Open”.
4. Click the “Delimited” circle.
5. Click “Next >”.
6. Click the box for the type of delimiter your file has, in this case comma.
7. Click “Finish”.
Note that if your file is comma-delimited and has the extension .csv, Excel will automatically recognize it as a comma-delimited file and read it directly without going through the above steps. But there’s no way around them for tabs or other types of delimiters.
To read a fixed width text file into Excel, steps 1-3 are the same as for delimited files. The next steps are:
4. Click the “Fixed width” circle.
5. Click “Next >”.
6. Use the arrowed lines to place your column breaks in the correct locations (between
7. Click “Finish”.
As you can see, with more than a few variables, this would get to be quite a pain. That is why most people use statistical packages such as SAS or Stata to read fixed width text files (they are actually designed to be read by stats packages).