StatsProgs2DDI - Conversion tool from statistical package formats to DDI Version 3.0
StatsProgs2DDI is a tool to generate the variable-level documentation in DDI 3.0 format on the basis of statistical package system files. Generating a core DDI 3.0 document from an existing system file (for example SPSS) enables a quick start in building DDI 3.0 documents. Currently the tool includes a SPSS converter, further planned formats are SAS, and Stata.
SPSS Converter SPSSOMS12toDDI30
The SPSS converter generates the following DDI information for each variable: variable name, variable label, and common descriptive summary statistics, for each category: category value, category label, missing value indicator, and frequencies/percentages.
Two steps are necessary to get a valid DDI document from a SPSS system file:
- Execution of a SPSS command setup (basically the well known commands DISPLAY and FREQUENCIES) to create an intermediary XML file in the SPSS OMS XML format
- Transformation of this SPSS OMS XML file according to the XSLT stylesheet (from the SPSS converter) into a DDI version 3.0 file
What is SPSS OMS? OMS is an output management system of SPSS. OMS provides syntax-driven control over SPSS output. Besides other formats OMS can create XML files. These XML files have a structure according to the XML output schema of SPSS. The SPSS converter requires version 1.2 of SPSS OMS which is available since SPSS 14 (SPSS OMS 1.0 in SPSS 12 and SPSS OMS 1.1 in SPSS 13 can probably used as well, but this is not tested).
An XSLT processor like Xalan [http://xalan.apache.org/] or Saxon [http://saxon.sourceforge.net/] is necessary for the transformation process. The stylesheet uses version 1.0 of XSLT.
Example
SPSS command setup to create SPSS OMS XML file
get file = "ISSP.sav" .
oms
/destination
format = oxml
outfile = "sample_spssoms.xml"
viewer = no .
/* metadata */
display dictionary
/variables = v1 to v20 .
/* frequencies and statistics */
/* without id variables, weights etc. */
frequencies
/variables = v3 to v20
/ntiles = 4
/ntiles = 5
/statistics = all .
omsend .
Transformation command to generate DDI 3.0 file
Xalan -in sample_spssoms.xml -out sample_ddi3.xml -xsl SPSSOMS12toDDI30.xslt
The process is entirely configured by the SPSS commands. The converter transforms all available information into related DDI 3.0 fields. One DISPLAY command is necessary for the desired variables (usually noted by the keyword ALL for all variables of the file).
An optional FREQUENCIES command can be defined to include frequencies in the DDI 3.0 file. Then the absolute value, the percentage, and the cumulative percentage are included. Desired statistics can be specified at the FREQUENCIES command. Multiple frequencies commands with different statistics subcommands can be defined to specify specific statistics for different variable sets. As always, not all statistics are appropriate for all variables. Following descriptive statistics can be stored in DDI 3.0: mean, quartiles, quintiles, mode, median, standard deviation, valid cases, invalid cases, minimum, and maximum.
Usually special variables like ID's and weights should be not included in the frequencies (see in the example: v1 and v2 are not listed).
Future plans
- Weighted summary statistics and frequencies
- Summary statistics and frequencies dependent on specific missing value definition
- Improvement of system missing handling
- Processing of string variables
- Addition of variable format (problematic, if not defined for the variables in SPSS system file)
- Addition of the information regarding the measurement level (problematic, if not defined for variables in SPSS system file)
- Detection of common category schemes used by multiple variables
- Graphical user interface
- Integration in SPSS as custom command by Python language
Important Note
The SPSSOMS12toDDI30 generates DDI files according to the public review version of DDI 3.0. The tool will be adapted in future to the published version of DDI 3.0. The tool is in beta state, and not heavily tested. Don't use it currently for production purposes.
StatsProgs2DDI is part of a DDI tool family including generation of multi-channel output (i.e., HTML, Microsoft Help, and PDF), and generation of data definition commands for statistical packages on the basis of DDI Version 2 documents. StatsProgs2DDI is available to the public under the terms of the GNU Lesser General Public License.
Download
Version 2007-05-09, revision 130: zip file
© GESIS Joachim Wackerow
2007-05-09