Glossary of Selected Social Science Computing Terms
and Social Science Data Terms

This glossary includes terms which you may find useful in managing data collections and providing basic data services. It does not attempt to cover all social science research terms or all computer terms. The definitions used are meant to be helpful in a data library environment. Terms that are defined in the glossary are highlighted in boldface in the definitions. To supplement this glossary, you might want to use the Encyclopedia of Computer Science, Fourth Edition, edited by Anthony Ralston, Edwin D. Reilly and David Hemmendinger, which has excellent comprehensive definitions of many computer terms, or the chapter "Information Resources Management: A Glossary of Terms," by Aninyda Bose (p. 92-161 in Encyclopedia of Library and Information Science Volume 41 - Supplement 6, N.Y.: Marcel Dekker. 1986, ed. Allen Kent), which brings together information technology terms from computing, telecommuncations, networking, and related fields.

This page: http://odwin.ucsd.edu/glossary/
Aggregate
(noun) A total created from smaller units. For instance, the population of a county is an aggregate of the populations of the cities, rural areas, etc. that comprise the county.
(verb) To total data from smaller units into a large unit. Example: "The Census Bureau aggregates data to preserve the confidentiality of individuals."

Aggregate Data
Data that have been aggregated . Contrast with Microdata .

ASCII
A character encoding scheme used by many computers. The ASCII standard uses 7 of the 8 bits a byte to define the codes for 128 characters. Example: in ASCII, the number seven is a treated as a character and is encoded as: 00010111. Because a byte can have a total of 256 possible values, there are an additional 128 possible characters that can be encoded into a byte, but there is no formal ASCII standard for those additional 128 characters. Most IBM compatible personal computers do use an IBM "extended" character set that includes international characters, line and box drawing characters, Greek letters, and mathematical symbols. (ASCII stands for American Standard Code for Information Interchange.) See also EBCDIC .

Binary Format.
Any file format in which information is encoded in some format other than a standard character encoding scheme . A file written in binary format contains information which is not displayable as characters. Software capable of understanding the particular binary format method of encoding information must be used to interpret the information in a binary formatted file. Binary formats are often used to store more information in less space than possible in a character format file. They can also be searched and analyzed more quickly by appropriate software. A file written in binary format could store the number "7" as a binary number (instead of as a character) in as little as 3 bits (i.e., 111), but would more typically use 4 bits (i.e., 0111). Binary formats are not normally portable, however. Software program files are written in binary format. Examples of numeric data files distributed in binary format include: the IBM-binary versions of the Center for Research in Security Prices files, the U.S. Department of Commerce's National Trade Data Bank on CD-ROM . The International Monetary Fund distributes International Financial Statistics in a mixed character-format and binary (packed-decimal ) ) format. SAS and SPSS store their system files in binary format.

Binary Number.
A number written using binary notation which only uses zeros and ones. Example: decimal number seven in binary notation is: 111.

Bit.
A bit is the smallest unit of information that a computer can work with. Each bit is either a one or a zero. Often computers work with chunks of bits rather than one bit at a time; the smallest chunk of bits a computer usually works with is a byte which is 8 bits.

Block.
On magnetic tape, a physical chunk of information separated from other blocks by an interblock gap of blank space. The computer reads and processes the information from a tape in blocks. The size of a block is typically a multiple of the size of a physical record .

Blocksize.
On magnetic tape, the size of a block. Note that when a smaller blocksize is used, a greater number of interblock gaps are necessary to record a given amount of information and thus a greater amount of tape is used.

BPI.
Bytes Per Inch. A measure of the storage density used on magnetic tape. A 2400 foot tape at 6250 BPI could theoretically store about 180 megabytes (million bytes) of data. In practice, however, one seldom find tapes with more than about 100-150 megabytes of data. See also blocksize .

Branching.
See Skip Pattern.

Byte.
Eight bits . A byte is simply a chunk of 8 ones and zeros. For example: 01000001 is a byte. A computer often works with chunks of bits rather than individual bits and the smallest chunk of bits that a computer usually works with is a byte. A byte is equal to one column in a file written in character format . Most data files distributed by ICPSR are in character format.

Card Image.
1. Eighty characters of data stored as a single physical record .
2. A file storage format of eighty characters or bytes per record. The card image format is a remnant of the time when data were literally input on punch cards which had a physical limit of 80 characters per card. Usually a case or all the variables of a single respondent are stored on several "cards" of eighty characters. Each "card" is numbered and stored in numerical sequence. Cards with the same sequence number (i.e., having a common format for the layout and contents of variables) are called a "deck;" thus cards are often referred to in documentation by their "deck number." Example: "The variable for age is stored in deck 01 in columns 10-11 and the variable for race is stored in deck 02 in column 10."

Card.
See Card Image.

Cartridge Tape.
A tape enclosed in a case similar to an audio casette. There are several "standard" cartridge tape formats; one of the most common is the IBM standard 3480/3490.

Case.
In survey research, an individual respondent. Contrast with unit of analysis .

CATI.
See Computer Assisted Telephone Interviewing.

CD-ROM.
Compact Disk Read Only Memory. A storage medium. Data are "stamped" onto the disk during the manufacturing process. The disk is read-only.

Character Encoding Scheme.
A method of encoding characters including alphabetic characters (A-Z, uppercase and lowercase), numbers 0-9, punctuation and other marks (e.g. comma, period, space, &, *), and various "control characters" (e.g., tab, carriage return, linefeed) using binary numbers . For a computer to, for instance, print a capital A or a number 7 on the computer screen, we must have a way of telling the computer that a particular group of bits represents an A or a 7. There are standards, commonly called "character sets," that establish that a particular byte stands for an A and a different byte stands for a 7. The two most common standards for representing characters in bytes are ASCII ASCII and EBCDIC.

Character Format.
Any file format in which information is encoded as characters using only a standard character encoding scheme . A file written in "character format" contains only those bytes that are prescribed in the encoding scheme as corresponding to the characters in the scheme (e.g., alphabetic and numeric characters, punctuation marks, and spaces. A file written in the ASCII character format, for instance, would store the number "7" in eight bits (i.e., one byte ): 00010111. A file written in EBCDIC EBCDIC would store the number "7" in eight bits as: 11110111. Contrast with binary format.

Character Sets.
See Character Encoding Scheme.

Cleaning.
To "clean" a data file is to check for wild codes and inconsistent responses (see Consistency Check); to verify that the file has the correct and expected number of records, cases, and cards or records per case; and to correct errors found.

Code.
In most numeric data files, answers to questions are recorded with numbers rather than text and often even numeric answers are recorded with numbers other than the actual response. The numbers used in the data file are called "codes." Thus, for instance, when a respondent identifies herself as a member of a particular religion, a "code" of 1 might be used for Catholic, a 2 for Jewish, etc. Likewise, a person's age of 18 might be coded as a 2 indicating "18 or over." The codes that are used and their correspondence to the actual responses are listed in a codebook .

Codebook Listed to Tape.
A file format for writing machine readable codebooks . The format includes characters at the beginning of lines that are used to control line printers.

Codebook.
Generically, any information on the structure, contents, and layout of a data file. Typically, a codebook includes: column locations and widths for each variable ; definitions of different record types ; response codes for each variable; codes use to indicate non-response and missing data; exact questions and skip patterns used in a survey; and other indications of the content of each variable. Many codebooks also include frequencies of response. Codebooks vary widely in quality and amount of information included. They may be machine-readable or paper copy or microfiche.

Column Location.
The precise location in a data file of a variable expressed in column numbers, beginning with the first column in a physical record as column number 1.

Column.
In a data file, a single vertical column each being one byte in length. Fixed format data files are traditionally described as being arranged in lines and columns. In a fixed format file, column locations describe the locations of variables.

Computer Assisted Telephone Interviewing (CATI).
A method of coding information from telephone interviews directly into a computer during the interview. CATI software usually has built in consistency checks , will not allow wild codes to be entered, and automatically prompts the interview for correct skip pattern questions.
Consistency Check.
A process of data cleaning which looks for inappropriate responses to branched questions. For instance, one question might ask if the respondent attended church last week; a response of "no" should skip the questions about church attendance and code the answers to those questions as "inapplicable." If those questions were coded any other way than "inapplicable this would be inconsistent with the skip patterns of the survey instrument.

Control Cards.
See SPSS Control Cards and SAS Control Cards.

Cross Sectional Study.
In survey research, a study in which data are obtained only once. Contrast with longitudinal studies in which a panel of individuals is interviewed repeatedly over a period of time. Note that a cross sectional study can ask questions about previous periods of time, though.

DAT.
Digital Audio Tape. A high density storage medium.

Data
Social science data are the raw material out of which social and economic statistics are produced. Social science data originate from social research methodologies or administrative records, while statistics are produced from data. Data are the information collected and stored at the level at which the unit of analysis was observed. Summaries of these data are usually statistics. Data must be processed to be of practical use. This compilation is accomplished with statistical software, which reads the raw data from a computer file.

Dataset.
The term dataset or "data set" is used in specific ways in different contexts.
  1. ICPSR defines a dataset as "a collection of data records" and uses this term to encompass a file or group of files associated with one part of a study. Files associated with a dataset might include a data file, a machine- readable codebook , SPSS control cards, and other files related to the data file. Examples: The files associated with California might be considered one dataset in the 1990 Census of Population and Housing STF 1A study; the files associated with the First Congress, House of Representatives, in the study "Congressional Roll Call Voting Records."
  2. SAS. In the SAS statistical software, a SAS "data set" is the internal representation of data. Raw data when read by SAS command statements is converted into a SAS Data Set before SAS can use the data. SAS Data Sets have specific filename extensions for different operating systems; e.g., a SAS 6.12 Data Set created in Unix has the filename extension ".ssd01" and in Windows, ".sd2".
Deck.
See Card Image.

Dictionary File.
A special form of machine-readable codebook that contains information about the structure of a data file and the locations and, often, the names of variables variables in the data file. Typically, you use a dictionary file and a data file together with statistical software; the statistical software uses the dictionary so that you may specify variables by name, rather than having to specify their locations in the file.

EBCDIC.
A character encoding scheme used by used by IBM mainframe computers and some other computers. Unlike ASCII , the EBCDIC standard specifies use of the entire 8 bits of each byte. Example: in EBCDIC the number seven is treated as a character and is encoded as: 11110111. (EBCDIC stands for Extended Binary Coded Decimal Interchange Code.)

Export File.
See System File.

Federated Membership.
An ICPSR membership category that allows a number of institutions to share data through a single linking institution and a single Official Representative.

File.
A physical unit of storage on a computer disk or tape. Example, the California dataset of STF 1A might contain a file of data, a file of codebook information, and a file of SPSS control cards for use with the statistical software SPSS.

Fixed Format.
A file structure consisting of physical records of a constant size within which the precise location of each variable is based on the column location and width of the variable. Most data from ICPSR is distributed in a fixed format and codebooks are used to specify the column location and width of each variable. Contrast with Free Format .

Flat File.
See Rectangular File.

Free Format.
A physical file structure that specifies the order of variables in a file and that they are delimited from each other by a special character or characters (usually a blank or other whitespace). Free format files may have variable physical record lengths ; when they do, they are typically delimited by a newline character at the end of each line. Contrast with Fixed Format .

Frequencies.
(Also called "marginals.") In survey research, the number of respondents who responded to each of the possible answers to a question. Often codebooks list the frequency of response to each question. So, for instance, you might be able to tell from a codebook how many House Members voted in favor of a bill and how many voted against it.

Frequency File.
A file that contains the frequencies for each question in a survey.

FTP.
File Transfer Protocol. A reliable method of transferring files over The Internet.

GIS
Abbreviation for Geographic Information System.

Header Record.
A record that denotes the beginning of a series of records and describes the contents of the records that follow. For example, the International Financial Statistics has a header record which describes a time series and the header record is followed by a number of records which contain the actual time series. Header records are often used when the number of physical records needed to contain the data for a particular variable is not constant for all variables. For instance, in economic time series data files one variable might be a time series that contain 20 years of data and fill 20 physical records while the next variable might contain a time series with only 5 years of data and fill only 5 physical records. The use of a file format that includes header records enables one to determine where the series begin and end.

Hierarchical File Structure.
A format for storing hierarchical files . Each unit of analysis has its own record structure or record type . Different units of analysis do not necessarily have the same number of bytes or characters as the records for other units of analysis. In order to give such a file a common physical record length , short logical records are typically "padded" with blanks so that they will all be the same physical record length. A hierarchical file can be also be stored in a rectangular file . For instance, the Survey of Income and Program Participation is distributed both ways; users can choose the format they prefer. Typically, the hierarchical file structure is more space-efficient but more difficult to use.

Hierarchical File.
A hierarchical file is one that contains information collected on multiple units of analysis where each unit of analysis is subordinate to another unit. For example, if the physical housing structure is one unit, and individual persons within the structure is another unit, the person records are subordinate (e.g. related to) the housing unit. An example would be the Current Population Survey Annual Demographic File which has household, family, and person units of analysis. Studies that include data for different units of analysis often link those units to each other so that, for instance, one can analyze the persons as they group in a structure. Such studies are sometimes referred to as having a relational structure .

Import File.
See System File.

Interblock gap.
A small blank space on a tape between blocks .

Interface Format.
A format used by ICPSR to distribute data. It is available in EBCDIC only. Interface format is the same as OSIRIS format in that you receive a LRECL EBCDIC character file data file and an OSIRIS dictionary file . The OSIRIS codebook file in not sent, however. If you have software that is capable of using OSIRIS dictionaries, but you do not have OSIRIS, this is a useful format to choose.

Labeled Tape.
A labeled tape has machine readable "labels" indicating the file name and other information about each file on the tape. Tapes can be "labeled" or "unlabeled." Labels can be "standard IBM" or "ANSI."

Line.
Often used synonymously with physical record . Thus it means the same as "card" in a card-image data file, and it means the same as logical record length in data files that have a logical record length format. In general, a "line" in data file terminology refers to a physical unit of data that the computer reads and processes, one at a time. In DOS and UNIX environments, most statistical software expects "lines" to end with a newline character , but most statistical software can be configured to read a specific number of bytes as a "line" regardless of the presence or absence of a newline character.

Logical Record Length.
abbreviated LRECL

Logical Record.
All the data for a given unit of analysis . It is distinguished from a physical record because it may take several physical records to store all the data for a given unit of analysis. For instance, in Card Image data, a "card" is a physical record and it usually takes several "cards" to store all the information for a single case or unit of analysis.

Longitudinal Study.
In survey research, a study in which the same group of individuals is interviewed at intervals over a period of time. See also: panel study . Note that some cross sectional studies are done regularly (for instance, the General Social Survey and the Current Population Survey (Annual Demographic File) are conducted once a year), but different individuals are surveyed each time. Such a study is not a true longitudinal study. An Example of a longitudinal study is the National Longitudinal Survey of Labor Market Experience.

Margin of error.
A measurement of the accuracy of the results of a survey. Example: A margin of error of plus or minus 3.5% means that there is a 95% chance that the responses of the target population as a whole would fall somewhere between 3.5% more or 3.5% less than the responses of the sample (a 7% spread). However, for any specific question, the margin of error could be greater or less than plus or minus 3.5%.

Marginals
See Frequencies.

Microdata.
Microdata files are those that contain information on individuals rather than aggregate data . The U.S. Census Bureau's "Summary Tape Files" contain aggregate data and consist of totals of individuals with various specified attributes in a particular geographic area. They are, in a sense, tables of totals. The Bureau's PUMS (Public Use Microdata Sample) files, however, contain the data from the original census survey instrument with certain information removed to protect the confidentiality of the respondent.

Newline Character.
One or two bytes which denote the end of a line . In DOS a newline character is two bytes: a carriage return and a linefeed. In UNIX a newline character is one byte: a linefeed.

NFS.
Network File System. A process for mounting magnetic disks on a network so that disks not physically attached to a computer appear as if they were physically attached.

Operating System.
The special software required to make a computer work. It is provides the link between the user and the hardware. Popular operating systems include: DOS, MacOS, VMS, VM, MVS, UNIX, and OS/2. (Note that "Windows 3.x" is not an operating system as such, since in must have DOS to work. )

OSIRIS Codebook.
A machine readable codebook written in binary format for use by OSIRIS software.

Osiris Dictionary.
A machine-readable data usable by OSIRIS software. ICPSR distributes only "Type 1" OSIRIS Dictionaries which are in a binary format and must be written in EBCDIC . OSIRIS "Type 5" Dictionaries are character format files.

OSIRIS.
Statistical software similar to SPSS and SAS with strong data management features. ICPSR distributes many studies in OSIRIS format with special machine readable codebooks and dictionary files readable by the OSIRIS software.

Packed Decimal.
A method of encoding 2 pieces of information in a single byte . For instance, instead of storing a digit in one byte and a sign in another byte using a traditional character encoding scheme , a packed decimal format might use a binary number to indicate the value of the digit in 4 bits of the byte and a code indicating whether the digit is positive or negative in the other 4 bits. The International Monetary Fund distributes data in Packed Decimal format.

Panel Study.
A longitudinal study in which a panel of individuals is interviewed at intervals over a period of time. In general usage, the definitions of longitudinal study and panel study overlap. At least one author says that the term "panel study" is sometimes used for studies that are restricted to a short period of time or are limited to two or three interviews and "longitudinal study" is used for studies that last longer or include more interviews; but there are significant examples where this distinction is not accurate. In general, longitudinal studies involve panels of respondents and panel studies are longitudinal studies. Examples of panel studies include the Survey of Income and Program Participation (SIPP) and the Panel Study of Income and Dynamics (PSID).

Panel.
A group of individuals who are interviewed more than once over time in a longitudinal survey .

Physical Record Length.
The length, in bytes , of a physical record . In ICPSR Tape Information Forms and on CDNet, physical record length is referred to simply as "record length" (abbreviated "RecLen").

Physical Record.
A chunk of data that has a specified and constant size in bytes or that is clearly delimited from other records by a newline character or sector of a disk or other means identifiable to a computer program reading the file. For example, a card-image data file has physical records of 80 bytes each, by definition. In a file in logical record length structure, each physical record is the same number of bytes in length as the "logical record length." See also, Line .

PI.
An abbreviation for Principal Investigator .

Portable.
In computer usage, a file or program is "portable" if it can be used by a variety of software on a variety of hardware platforms. Numeric data files written as plain character format files are fairly portable.

Principal Investigator.
The person or organization responsible for a study; equivalent to "author" in bibliographic citations.

RecLen.
An abbreviation for Physical Record Length .

Recode.
Often a data analyst or data producer will produce new data values from raw data and include these in a data file; this process is called "recoding." For instance, an age variable might contain a respondent's actual age in years, but this information might be "recoded" to produce a new variable, "eligible voter," with a code of "1" for all those 18 and over and a code of "2" for all those under 18.

Record Length.
Depending on the context, the length in bytes (i.e., columns) of a physical record or a logical record . On ICPSR Tape Information Forms and on CDNet, the abbreviation "RecLen" is used for physical record length.

Record Type.
A record that has a consistent logical structure. In files that include different units of analysis , for instance, different record types are needed to hold the different variables. For example, one record type might have a variable for income in one column and another record type might have a variable for household size in that same column. The codebook will describe these different structures and how to determine which is which so that you can tell your statistical software how to interpret that particular column as income or household size.

Record.
Depending on the context, "record" may refer to a physical record or a logical record . See also Line .

Rectangular File.
A physical file structure. A rectangular file is one which contains the same number of card-images or the same physical record length for each respondent or unit of analysis . A Hierarchical file can be stored in a rectangular file structure by storing all units of analysis in a single physical record . For instance, each record might contain one household unit, two family units, and four person units for each family unit. This method of storage of hierarchical files can be very inefficient in terms of storage space, but can make the file easier to describe and work with.

Reel Tape.
One-half inch magnetic tape stored on round reels.

Relational Structure.
A study that includes different units of analysis , particularly when those units are not arranged in a strict hierarchy as they are in a hierarchical file , has a relational structure. Note that the data could be arranged in several different physical structures to handle such a data structure. For instance, each unit of analysis might be stored in a separate rectangular file with identification numbers linking each case to the other units; or, the different units of analysis might be stored in one large file with a hierarchical file structure ; or the different units could be stored in a special database structure used by a relational data base management system such as INGRES. An example of a study with a relational structure is the Survey of Income and Program Participation which has eight or more record types ; these record types are related to each other but are not all members of a hierarchy of membership. For instance, there are record types for household, family, person, wage and salary job, and general income amounts.

Respondent.
In survey research, the person responding to the survey questions.

Response codes.
Typically responses to questions are "coded" by assigning numeric codes to each possible response. Thus a "yes" might be coded "1" and a "no" "2"; female respondents might be indicated by a "1" and male respondents by a "2"; each state or county might be assigned a numeric code.

Round Tape.
A nickname for reel tape .

SAS Control Cards.
A character format file written in the SAS statistical software language describing a data file. Useful because it provides variable locations, names and labels. SAS code must be added to perform analysis.

Skip Pattern.
In survey research, the sequence of questions asked and skipped. For instance, persons who answer one question that indicates they did not vote in the last election would trigger a "skip" so that the interviewer would not ask those respondents questions about how they voted in the last election.

SPSS Control Cards.
A character format file written in the SPSS statistical software language describing a data file. Useful because it provides variable locations, names and labels. SPSS code must be added to perform analysis.

Statistics
Statistics are produced from data.

The dictionary definition of "statistics" refers to numeric indicators of nations. Popular usage of the term points to numeric summaries that condense information, or numbers that are used to make comparisons, or numbers that portray relationships or associations.

The term statistics also refers a formal discipline of study. The field of statistics is the science of generalization. Built upon theories of probability and inference, statistics support the making of broad generalizations from a smaller number of specific observations.

Study.
All the information collected at a single time or for a single purpose or by a single principal investigator. A study may consist of one or more datasets and one or more files . Examples: the General Social Survey; A Gallup Poll; the 1990 Census of Population and Housing STF 1A.

System File.
A generic term of for the native or internal storage format used by statistical software. When statistical software reads a "raw" character format data file consisting of ASCII or EBCDIC characters, it must read each byte in sequence. It can be more efficient in its storage, retrieval and calculations by storing a data file in a special binary format called a system file. Typically, a system file for one brand of software cannot be read by another brand of software or by the same brand on another hardware platform. Some software is capable of creating an "export" file which can then be read by other software or on other platforms. Also, some software can "import" files from other software.

Tape Density.
A measure of how much data, measured in BPI , can fit on a magnetic tape.

Text File.
1. In computer usage, any file written in pure character format . Sometimes called a "plain text file."
2. In a data library situation, "text file" may also refer to a file containing natural language text (e.g., a literary text such as the works of Shakespeare) as opposed to a numeric data file that contains mostly numbers. Such a file could be stored as a character format file but does not necessarily have to be. Also known as a character file.

Time Series.
Observations of a variable made over time. Many economic studies such as International Financial Statistics, and Citibase are time series data files. Time series, of a sort, can also be constructed from a cross sectional study if the same questions are asked more than once over time. See also longitudinal study .

Undocumented Code.
See Wild Code.

Unit of analysis.
The basic observable entity being analyzed by a study and for which data are collected in the form of variables . Although a unit of analysis is sometimes referred to as the case or "observation," these are not always synonymous. For instance, in public opinion polls, the unit of analysis is usually a single person and the answers to the survey questions by one person constitute a "case." In a census, however, a "case" could be considered the household because all the data for one household is collected on one survey instrument; the household "case" may contain different variables for the different units of analysis: a physical housing structure, a family within the structure, a person within the family. Contrast with Unit of observation.

Unit of Observation
When social science methodology is used to collect data, the entity which is observed or about which information is collected is the unit of observation.

The unit of observation is the same as the unit of analysis when the generalizations being made from a statistical analysis are attributed to the unit of observation (i.e., the objects about which data were collected and organized for statistical analysis).

While the units of observation and analysis are often the same, the wealth of secondary data sources creates opportunities to conduct analyses with data from multiple units of observation. This is probably most recognizable in GIS research.

Example: A major national study uses a form that collects information about each person in a dwelling and information about the housing structure. Therefore, this study collects data for two units of observation: persons and housing structures. From these data, different units of analysis may be constructed: Household could be examined as a unit of analysis by combining data from people living in the same dwelling. Family could be treated as the unit of analysis by combining data from all members in a dwelling sharing a familial relationship. This expresses how the unit of analysis can be constructed from units of observation consisting of some type of relationship constructed by time, space or social properties.

Variable.
In social science research, for each unit of analysis , each item of data (e.g., age of person, income of family, consumer price index) is called a variable.

Wave.
In a panel study , a wave is the interviewing period during which the entire panel is questioned and asked the same questions. Typically, a panel study consists of several waves. Waves are important because each wave typically covers a different time period and, often, different topics.

Weight.
In survey research, a number associated with a case or unit of analysis ; the weight is used as a measure of the relative significance of the variables of that case when making estimates for the entire population. When a probability sample is used, there is often a chance that some elements of the population are under or over represented in the sample. In order to allow more accurate estimates of a complete population, therefore, "weights" are assigned to each case and used to adjust the overall results to more closely conform to the total population.

Wild Code.
In survey research, "wild" codes are codes that are not authorized for a particular question. For instance, if a question that records the sex of the respondent has documented codes of "1" for female and "2" for male and "9" for "missing data," a code of "3" would be a "wild" code, sometimes called an "undocumented code."

Workstation.
1) In a computing environment the term "workstation" has historically referred to a particular class of machine: specifically, a very powerful computer with a 32 bit microprocessor, usually with a graphics-oriented display and a mouse, intended for use on a local area network. Workstations in this sense usually run the UNIX operating system. Although many are powerful enough to have several users logged on and using the machine simultaneously, many computing environments assume a workstation will be used by only one person at a time. Well known manufacturers of workstations include Sun, Hewlett Packard, DEC, and, for a time, NeXT.
2) In a library environment the term workstation often is used to refer to any personal computer such as an IBM PC or an Apple Macintosh. Note, however, that personal computers that have 32 bit microprocessors and operating systems and software capable of utilizing the full power of those microprocessors are becoming available. Note also that traditional "workstations," such as Suns, are also becoming more powerful and are increasingly being used to support multiple simultaneous users. Thus, the meaning of the term workstation is becoming more vague and less precise. It is therefore becoming increasingly important when using this term, particularly in an environment of users with a variety of computing backgrounds and experience, to be precise and refer to a particular type of computer and computing environment rather than assume that everyone means the same thing by the term "workstation."

glossary home | table of contents | other glossaries