DDI on Rails documentation¶
Note
This is still work in progess. Please visit the official documentation on http://www.ddionrails.org
Contents¶
User guide¶
Note
This is a draft version.
This is a guide how to use the data portal DDI on Rails, which builds the foundation for the new version of SOEPinfo.
DDI on Rails is created in order to assist the user to explore survey data (e.g. the SOEP), to compile personalized datasets and to publish the results on the publication database. Primarily, it guides the user throughout the entire process of their research projects using the SOEP data from conception to publication and citation.
Search engine¶
On the main page a search option can be found, providing a quick way to be navigated directly to the respective results if the user has a specific variable, dataset, or topic in mind. It also helps for a quick overview if the user’s interest is to find out if there is any data available for his/her research topic/question.
For instance, if you are looking for at the results for the keyword “age”, you can find a total of 2614 items listed. However, this might be overwhelming. Thus, it provides you a way to narrow your search output, using so-called facets. Possible classes above the total results are “Concepts”, “Variables”, “Questions” and “Publications” - each of them with optimized facets for the particular class.
The select box for studies is the only one, available in all views. This makes it easy for the data users of one particular study to consistently adjust all result views. The descriptions to those studies can be found on the very top of the page under “Studies” after a particular study is selected and will be specified in the next section again.
The already mentioned classes with its assigned symbols are listed again underneath the “Study” option and below that another sorting option regarding the analysis unit can be selected.
Studies¶
DDI on Rails incorporates various SOEP-studies, including “SOEP Core study”, “SOEPlong”, “Families in Germany”, “SOEP Innovation Sample”, “SOEP Pretest”, “Base II”, and “SOEP Test Study”. A particular study can be accessed as shown in the picture below. After a study of interest is selected, a general overview will be displayed. Moreover, the total number of involved variables and datasets can be viewed.
In the variable browser, there is another search engine adapted to the chosen study. That way you can find your required variables in the particular study. The same holds for the dataset browser. Furthermore, the variables can be selected according to the desired analysis unit or period on the left-hand side.
Topics and concepts¶
A variety of topics can be selected on the very top of every page next to “Studies”. After clicking on a particular topic, several subtopics appear. The user may also use the search engine to look for specific concepts regarding the chosen topic.
Concepts are used to group variables within one study or in between multiple studies, that might have slight variation but still put themselves forward to be analysed in a comparative way. They replace the so-called “item correspondence” from the former SOEPinfo.
Publications¶
Under “publications” you can search for any keyword and you will be directed to a list of papers that involve the searched word(s). Each result provides a link to the publication for direct access.
Workspace and basket¶
While the former SOEPinfo allowed the use of the basket and its script generator without any login information, the new system requires you to log in in order to create baskets for variables.
The login is necessary to enable some of the new features in DDI on Rails, among others the possibility to store multiple baskets at a time and access those baskets directly from most statistical packages.
After having signed in, a new basket-symbol appears on the very top. Now you can create new baskets and fill in the individual information for each basket. To enable comparison between studies and distributions (versions of one study) later in the process, it is necessary to bind the basket to a particular distribution.
All variables of interest accessed as previously explained can be selected into your basket now by clicking on the green “Add to basket” symbol underneath the variable in the variable browser. The number next to the basket indicates how many variables are already integrated into your basket.
Data management and documentation¶
Note
Plans for this chapter:
- Conventions and concepts for data management
- Integrating metadata generation and data management in a metadata-driven process.
- Lifecycle model for data mangement, documentation, and re-use.
Imports and exports¶
Note
Attention: the import procedure is about to change in the next version of DDI on Rails.
Import formats¶
CSV formats¶
Note
The full list of CSV imports is currently documented on ddionrails.org/imports
Markdown
In most imports, there is a description
field using Markdown. For
more information about the Markdown markup language, see: Daring
Fireball.
Conventions
- Some fields in the CSV exports are not part of the import. Those
fields start with
view_
variables
anddatasets
. - Columns with the
internal_
prefix are indended for internal use only and will not be imported (e.g.internal_comment
). - Language codes, for all translation purposes: ISO 639-1
XML formats¶
- Endnote: Publications in Endnote’s XML export.
- r2ddi: DDI-Codebook-based XML, generated by r2ddi.
- QeDML: QeDML-XML from QLIB.
Other formats¶
- EndnoteKeys are a special import of two columns (accession number and keywords) in addition to the normal XML import. Endnote exports everything but the keywords to XML what makes this import neccessary.
Import structure¶
Top level¶
import/
| system/ # -> system-wide imports
| study-first/ # \
| study-second/ # }-> one folder per study
| study-third/ # /
All levels¶
import/
| system/
| | endnote.xml
| | endnote-keys.txt
| | ddiOnRails.png
| study-first/
| | studies.csv
| | variables.csv
| | ...all other csv files...
| | files/
| | | ...all files for public folder...
| | qedml/
| | | ...questionnaires in QeDML-XML-format...
| | r2ddi/
| | | version/
| | | | ...dataset descriptions in DDI-C-XML...
| study-second/
| | ...like study-first...
CSV Imports¶
studies.csv¶
Columns¶
- organization
- Name of the organization (foreign key)
- study
- Name of the study (primary key).
- label
- Human-readable label.
- description
- Description (using Markdown).
- html_description
- HTML description (DEPRECATED).
- language_string
- Whitespace seperated list of languages used in the study as two-figure language codes (e.g. “de en”). These parameters are used to import and export the translations of questionnaires and datasets.
- import_url
- URL from where all import files are retrieved.
- files_url
- URL from where files are loaded interactively.
- import_config
- Addintional import parameters, currently not used.
topics.csv¶
Columns¶
- topic
- Name of the topic (primary key).
- parent
- Name of the parent topic (foreign key). If empty, this topic becomes a root-level topic, requiring an icon.
- label
- Short label.
- description
- Description using Markdown.
concepts.csv¶
Columns¶
- concept
- Name of the concept (primary key).
- topic
- Name of the topic (primary key).
- label
- Short label.
- description
- Description using Markdown.
periods.csv¶
Columns¶
- period
- Name of the period (primary key).
- label
- Short label.
- description
- Description using Markdown.
analysis_units.csv¶
Columns¶
- analysis_unit [PK]
- Name of the analysis unit.
- label
- Short label.
- description
- Description using Markdown.
conceptual_datasets.csv¶
Columns¶
- conceptual_dataset
- Name of the conceptual dataset (primary key).
- label
- Short label.
- description”
- Description using Markdown.
logical_datasets.csv¶
Columns¶
- study
- Name of the study (primary key).
- logical_dataset
- Name of the dataset (primary key).
- label
- Short label.
- description
- Description using Markdown.
- conceptual_dataset
- Name of the conceptual dataset (foreign key).
- analysis_unit
- Name of the analysis unit (foreign key).
- period
- Name of the time period (foreign key).
logical_variables.csv¶
List of Columns¶
- study
- Primary key, name of the study.
- logical_dataset
- Primary key, name of the dataset.
- logical_variable
- Primary key, name of the variable.
- label
- Human-readable label.
- concept
- Name of the underlying concept, foreign key to concepts.csv.
- questionnaire
- Name of the underlying questionnaire, foreign key to questionns.csv.
- question
- Name of the underlying question, foreign key to questions.csv.
- item
- Name of the underlying item, foreign key to questions.csv.
- is_primary_key
- Boolean indicator, if this variable is part of the dataset’s primary key.
- basket_key
- Name of an study-specific identifier in this dataset, which is used for the script generator.
- basket_is_default
- Boolean indicator, whether a script generator should include this variable by default, if its dataset is used.
Special Rules¶
- The link to a question (or question item) is only established if the question already exists. There are no new questions created by variables.csv.
distributions.csv¶
Columns¶
- study
- Name of the study (primary key).
- distribution
- Name of the Distribution (primary key).
- label
- Short label.
- description
- Description using Markdown.
- active
- Boolean value (“true” or “false”), indicating whether this
is currently the active distribution of the study.
datasets_distributions.csv¶
Columns¶
- study
- Name of the study (primary key).
- distribution
- Name of the distribution (primary key).
- dataset
- Name of the dataset (primary key).
- version
- Versio of the dataset (primary key).
This table builds a has-and-belongs-to-many relationship between datasets and distributions. Thus, it only consists of key values without any attributes.
variables.csv¶
This Format is export only.
List of Columns¶
- study
- Name of the study (primary key)
- dataset
- Name of the dataset (primary key)
- version
- Version of the dataset (primary key)
- variable
- Name of the variable (primary key)
- label
- Short label.
- categories
- List of categories in pseudo-JSON format.
- label_xx & categories_xx
- Translated labels for variables and categories.
variable_categories.csv¶
This Format is export only.
List of Columns¶
- study
- Name of the study (primary key).
- dataset
- Name of the dataset (primary key).
- version
- Version of the dataset (primary key).
- variable
- Name of the variable (primary key).
- value
- Value of the category (primary key).
- label
- Category label.
- frequency
- Frequency.
- label_xx
- Translated labels.
generations.csv¶
Columns¶
- output_study
- Name of the output variable’s study (primary key).
- output_dataset
- Name of the output variable’s dataset (primary key).
- output_version
- Name of the output variable’s dataset version (primary key).
- output_variable
- Name of the output variable (primary key).
- input_study
- Name of the input variable’s study (primary key).
- input_dataset
- Name of the input variable’s dataset (primary key).
- input_version
- Name of the input variable’s dataset version (primary key).
- input_variable
- Name of the input variable (primary key).
questionnaires.csv¶
Columns¶
- study
- Name of the study (primary key).
- questionnaire
- Name of the questionnaire (primary key).
- label
- Human-readable label.
- description
- Description using Markdown.
- analysis_unit
- Name of the analysis unit (foreign key).
- period
- Name of the time period (foreign key).
- dataset
- Name of the dataset (foreign key).
question.csv¶
List of columns¶
(1) Identifier: The first four columns identify are question. Please note that a question can consist of multiple items. In this case the first item is considered to be the root element and the item is either empty or “root”.
- study
- Name of the study (primary key).
- questionnaire
- Name of the questionnaire (primary key).
- question
- Name of the question (primary key).
- item
- Number of the question item (primary key). If the
item
is empty, the question is considered to be a “root question”, which might have items.
(2) Content: The following columns represent the content of a question or item
- number
- Question number (integer), as a reference to the position in the questionnaire.
- text
- Question text.
- instruction
- Interviewer instruction.
- answer_list
- Name of the list of answers (foreign key). The
answers.csv
. - scale
- Scale (see list of scales below) of the answers.
- filter
- Incoming filters (see definition below).
- goto
- Outgoing filters (see definition below).
- label
- Label (DEPRECATED).
- description
- Human readable description including additional unstructured information.
- concept
- Name of question’s concept (foreign key). In DDI on Rails the primary link from a question to one or multiple concepts is through the question’s logical variables. Nevertheless, it is possible to link a question or an item directly to a concept.
(3) Links to logical variables and concepts (import only): A question can be linked to multiple logical variables. Therefore, DDI on Rails stores this link with the logical variables. Yet, the questions import allows to link every question to one logical variable.
- logical_dataset
- Logical dataset name (foreign key).
- logical_variable
- Logical variable name (foreign key).
(4) Export only: There is a couple of columns that is included in the export but will not be imported.
- view_sort_id
- Sort order of the questions. The
view_sort_id
is generated from the order of the questions in the import file. - view_lft
- and
view_rgt
Export only. - view_import_note
- Export only (DEPRECATED).
- view_first_concept
- Concept of the question, based on the first related variable.
- view_import_typ
- Export only (DEPRECATED).
- view_calculated_number`` and
view_calculated_item
- Special information for imports following the SOEP-QLIB-conventions.
- logical_variable
- Name of the resulting variable (foreign key, import only).
- logical_dataset
- Name of the dataset of the resulting variable (foreign key, import only).
(5) Namespaces (neither imported nor exported): Every study can add
an arbitrary number of columns to store additional information that are
not intended to be imported in DDI on Rails. Those columns are prefixed
with internal_
Scales¶
- txt
- Only display the text, no variables are generated. All filters and instructions still apply.
- chr
- Result is a character string.
- int
- Result is a integer.
- dec
- Result is a number with decimals.
- bin
- Result is either true, false (equals “null”)
- cat
- Result is a pre-defined answer category. See
answer_list
for possible answers.
Rules for filter and goto¶
Filter and goto definitions consist of question names and symbols only, no keywords (e.g. “goto”) are used.
- Symboles
( ) = < > @ | & : != <= >=
- Filter
(AGE > 20) & (SEX = 1)
means: this question is asked if “age” is greater than 20 and “sex” is 1 - Goto
(2 @ TARGET)
means: if the answer to the current question is 2 then go to question “target” - Refer to items using the colon as a seperator, e.g.
(PSOR:2 = 3)
. - Value lists and ranges:
(x = 1:3)
is equal to(x = 1,2,3)
is equal to(x = 1) | (x = 2) | (x = 3)
answers.csv¶
List of columns¶
(1) Identifiers: The first three columns identify an answer list. An answer list always refers to one questionnaire. It is not possible to refere from a question in questionnaire A to an answer list in questionnaire B.
The fourth column (value
) identifies an item of an answer list.
- study
- Name of the study (primary key).
- questionnaire
- Name of the questionnaire (primary key).
- answer_list
- Name of the answer_list within the questionnaire (primary key).
- value
- Integer value of the answer (primary key).
(2) Content: The content of an item is a label. This label can be translated.
- label
- Answer label in the primary language (usualy English).
- label_*
- Translations of the label. Please replace
*
by a two-digit language code, e.g.label_de
for a German label.
Features¶
Answer labels are translateable. The language of the translation is set
using a two letter code, e.g. label_de
for a German label. The
default language for the column label
is English.
translations.csv¶
Please keep in mind that translations.csv
is only an export format.
The import of tranlations is part of the respective translatable object.
The term “translatable” refers to an object that has one or more attributes that can be translated.
Columns¶
- class
- Class of the translatable.
- id
- ID of the translatable.
- attribute
- Translated attribute of the translatable.
- text
- Original version of the text.
- language
- Language of the translation.
- translation
- Translated version of the text.
API¶
Basket API¶
Every user can have multiple baskets, where variables can be stored.
Basket¶
Returns a basket instance. The owner of the basket must be logged in.
/baskets/:id
Ressource Properties¶
A basket is represented by the following properties:
Property | Type |
---|---|
id | int |
basket_name | String |
variable_list | Collection<Variable> |
owner | User |
study | String |
Supported HTTP Methods: GET, PUT, DELETE,
Optional Parameters: None.
Basket List¶
Returns a list of all baskets belonging to the currently logged in user.
/baskets
Supported HTTP Methods: GET, POST
Optional Paramters: None
Variables List of a Basket¶
Returns a list of Variables associated with the specified basket.
/baskets/:id/variables
Supported HTTP Methods: GET, POST
Optional Parameters: None
Remove Variable from Basket¶
Removes the specified variable from the specified basket.
/baskets/:id/variables/:id
Supported HTTP Methods: DELETE
Optional Parameters: None
Concept API¶
A concepts represents a ???.
Concept¶
Represents a single concept instance.
/concepts/:id
Ressource Properties¶
A concept is represented by the following properties:
Property | Type |
---|---|
id | int |
concept_name | String |
label | String |
? | ? |
Supported HTTP Methods: GET
Optional Parameters: None.
Concept List¶
Represents a list of all concepts.
/concepts
Supported HTTP methods: GET
Optional Paramters: None
Variables by concept¶
Get all variables with specified concept.
/concepts/:id/variables
Supported HTTP methods: GET
Optional Paramters: None
Dataset API¶
A dataset instance is a collection of variables. A dataset is always part of a study.
Dataset¶
Represents a single dataset instance.
/datasets/:id
Ressource Properties¶
A dataset is represented by the following properties:
Property | Type |
---|---|
id | int |
dataset_name | String |
variables | Collection<Variable> |
? | ? |
Supported HTTP Methods: GET
Optional Parameters: None.
Dataset List¶
Represents a list of all datasets.
/datasets
Supported HTTP methods: GET
Optional Paramters: None
Study API¶
A study instance is a collection of datasets.
Study¶
Represents a single study instance.
/studies/:id
Ressource Properties¶
A study is represented by the following properties:
Property | Type |
---|---|
id | String |
study_name | String |
datasets | Collection<Dataset> |
? | ? |
Supported HTTP Methods: GET
Optional Parameters: None.
Study List¶
Represents a list of all studies.
/studies
Supported HTTP methods: GET
Optional Paramters: None
Included Datasets¶
/studies/:id/datasets
Returns a list of all datasets associated with the specified study.
Supported HTTP methods: GET
Optional Paramters: None
Included Variables by Dataset¶
/studies/:id/datasets/:id/variables
Returns a list of all variables included in specified dataset and study.
Supported HTTP methods: GET
Optional Paramters: None
Included Variables¶
/studies/:id/datasets/variables
Returns a list of all variables included in specified study.
Supported HTTP methods: GET
Optional Paramters: None
User API¶
A user instance represents a person who has registered with ddionrails.
User¶
Represents a single user instance.
/users/:id
Ressource Properties¶
A user is represented by the following properties:
Property | Type |
---|---|
id | int |
String | |
is_active | boolean |
date_joined | timestamp |
username | string |
Supported HTTP Methods: GET
Optional Parameters: None.
User List¶
Represents a list of all users.
/users
Supported HTTP methods: GET
Optional Paramters: None
Variable API¶
A variable instance represents a ???.
Variable¶
Represents a single variable instance.
/variables/:id
Ressource Properties¶
A variable is represented by the following properties:
Property | Type |
---|---|
id | int |
variable_name | String |
dataset | Dataset |
study | Study |
analysis_unit | String |
boost | int |
label | String |
label_de | String |
period | int |
sub_type | String |
namespace | String |
… | … |
Supported HTTP Methods: GET
Optional Parameters: None.
Variable List¶
Represents a list of all variables.
/variables
Supported HTTP Methods: GET
Optional Paramters:
Parameter | Values | Description |
---|---|---|
dataset | dataset name | Variables included in specified dataset. |
basket | basket id | Variables included in specified basket. |