SAS Interview Questions and Answers – Part 6
81). What do you understand about SDTM and its importance?
SDTM stands for Standard data
Tabulation Model, which defines a standard structure for study data
tabulations that are to be submitted as part of a product application to
a regulatory authority such as the United States Food and Drug
Administration (FDA) 2.In July 2004 the Clinical Data Interchange Standards Consortium (CDISC) published standards on the design and content of clinical trial tabulation data sets, known as the Study Data Tabulation Model (SDTM). According to the CDISC standard, there are four ways to represent a subject in a clinical study: tabulations, data listings, analysis datasets, and subject profiles6.
Before SDTM:
There are different names for each domain and domains don’t have a standard structure. There is no standard variables list for each and every domain.
Because of this FDA reviewers always had to take so much pain in understanding themselves with different data, domain names and name of the variable in each analysis dataset. Reviewers will have spent most of the valuable time in cleaning up the data into a standard format rather than reviewing the data for the accuracy. This process will delay the drug development process as such.
After SDTM:
There will be standard domain names and standard structure for each domain. There will be a list of standard variables and names for each and every dataset. Because of this, it will become easy to find and understand the data and reviewers will need less time to review the data than the data without SDTM standards. This process will improve the consistency in reviewing the data and it can be time efficient.
The purpose of creating SDTM domain data sets is to provide Case Report Tabulation (CRT) data FDA, in a standardized format. If we follow these standards it can greatly reduce the effort necessary for data mapping. Improper use of CDISC standards, such as using a valid domain or variable name incorrectly, can slow the metadata mapping process and should be avoided4.
82). PROC CDISC for SDTM 3.1 Format 2?
Syntax
The PROC CDISC syntax for CDISC SDTM is presented below. The DATA= parameter specifies the location of your SDTM conforming data source.PROC CDISC MODEL=SDTM;SDTM SDTMVersion = “3.1″;DOMAINDATA DATA = results. AE DOMAIN = AE CATEGORY = EVENT;RUN;
83). What are the capabilities of PROC CDISC 2?The PROC CDISC syntax for CDISC SDTM is presented below. The DATA= parameter specifies the location of your SDTM conforming data source.PROC CDISC MODEL=SDTM;SDTM SDTMVersion = “3.1″;DOMAINDATA DATA = results. AE DOMAIN = AE CATEGORY = EVENT;RUN;
PROC CDISC performs the following checks on domain content of the source:
Verifies that all required variables are present in the data set
Reports as an error any variables in the data set that are not defined in the domain
Reports a warning for any expected domain variables that are not in the data set
Notes any permitted domain variables that are not in the data set
Verifies that all domain variables are of the expected data type and proper length
Detects any domain variables that are assigned a controlled terminology specification by the domain and do not have a format assigned to them.
The procedure also performs the following checks on domain data content of the source on a per observation basis:
Verifies that all required variable fields do not contain missing values
Detects occurrences of expected variable fields that contain missing values
Detects the conformance of all ISO-8601 specification assigned values; including date, time, date time, duration, and interval types
Notes correctness of yes/no and yes/no/null responses,
84). What are the different approaches for creating the SDTM 3?
There are 3 general approaches to create the SDTM datasets:
a) Build the SDTM entirely in the CDMS,
b) Build the SDTM entirely on the “back-end” in SAS,
c) or take a hybrid approach and build the SDTM partially in the CDMS and partially in SAS.
BUILD THE SDTM ENTIRELY IN THE CDMS
It is possible to build the SDTM entirely within the CDMS. If the CDMS allows for broad structural control of the underlying database, then you could build your eCRF or CRF based clinical database to SDTM standards.
Advantages:
• Your “raw” database is equivalent to your SDTM which provides the most elegant solution.
• Your
clinical data management staff will be able to converse with
end-users/sponsors about the data easily since your clinical data
manager andthe und-user/sponsor will both be looking at SDTM datasets.
• As soon as the CDMS database is built, the SDTM datasets are available.
Disadvantages:• This approach may be cost prohibitive. Forcing the CDMS to create the SDTM structures may simply be too cumbersome to do efficiently.
• Forcing the CDMS to adapt to the SDTM may cause problems with the operation of the CDMS which could reduce data quality.
BUILD THE SDTM ENTIRELY ON THE “BACK-END” IN SAS
Assuming
that SAS is not your CDMS solution, another approach is to take the
clinical data from your CDMS and manipulate it into the SDTM with SAS
programming.
Advantages:
• The
great flexibility of SAS will let you transform any proprietary CDMS
structure into the SDTM. You do not have to work around the rigid
constraints of the CDMS.
• Changes could be made to the SDTM conversion without disturbing clinical data management processes.
• The CDMS is allowed to do what it does best which is to enter, manage, and clean data.
Disadvantages:
• There would be additional cost to transform the data from your typical CDMS structure into the SDTM.
Specifications, programming, and validation of the SAS programming transformation would be required.
Specifications, programming, and validation of the SAS programming transformation would be required.
• Once the CDMS database is up, there would then be a subsequent delay while the SDTM is created in SAS.
This delay would slow down the
production of analysis datasets and reporting. This assumes that you
follow the linear progression of CDMS -> SDTM -> analysis datasets
(ADaM).• Since the SDTM is a derivation of the “raw” data, there could be errors in translation from the “raw” CDMS data to the SDTM.
• Your
clinical data management staff may be at a disadvantage when speaking
with end-users/sponsors about the data since the data manager willlikely be looking at the CDMS data and the sponsor will see SDTM data.
BUILD THE SDTM USING A HYBRID APPROACH
Again,
assuming that SAS is not your CDMS solution, you could build some of the
SDTM within the confines of the CDMS and do the rest of the work in
SAS. There are things that could be done easily in the CDMS such as
naming data tables the same as SDTM domains, using SDTM variable names
in the CTMS, and performing simple derivations (such as age) in the
CDMS. More complex SDTM derivations and manipulations can then be
performed in SAS.
Advantages:
• The changes to the CDMS are easy to implement.
• The SDTM conversions to be done in SAS are manageable and much can be automated.
Disadvantages:
• There
would still be some additional cost needed to transform the data from
the SDTM-like CDMS structure into the SDTM. Specifications, programming,
and validation of the transformation would be required.
• There would be some delay while the SDTM-like CDMS data is converted to the SDTM.
• Your clinical data management staff may still have a slight disadvantage when speaking with endusers/
sponsors about the data since the clinical data manager will be looking
at the SDTM-like data and the sponsor will see the true SDTM data.
85). What do you know about SDTM domains?A basic understanding of the SDTM domains, their structure and their interrelations is vital to determining which domains you need to create and in assessing the level to which your existing data is compliant. The SDTM consists of a set of clinical data file specifications and underlying guidelines. These different file structures are referred to as domains. Each domain is designed to contain a particular type of data associated with clinical trials, such as demographics, vital signs or adverse events.
The CDISC SDTM Implementation Guide provides specifications for 30 domains. The SDTM domains are divided into six classes.
The 21 clinical data domains are contained in three of these classes:
Interventions,
Events and
The trial design class contains seven domains and the special-purpose class contains two domains (Demographics and Comments).
The trial design domains provide the reviewer with information on the criteria, structure and scheduled events of a clinical trail. The only required domain is demographics.
There are two other special purpose relationship data sets, the Supplemental Qualifiers (SUPPQUAL) data set and the Relate Records (RELREC) data set. SUPPQUAL is a highly normalized data set that allows you to store virtually any type of information related to one of the domain data sets. SUPPQUAL domain also accommodates variables longer than 200, the Ist 200 characters should be stored in the domain variable and the remaining should be stored in it5.
86). What are the general guidelines to SDTM variables?
Each of the SDTM domains has a collection of variables associated with it.
There are five roles that a variable can have:
Identifier,
Topic,
Timing,
Qualifier,
and for trial design domains,
Rule. Using lab data as an example, the subject ID, domain ID and sequence (e.g. visit) are identifiers.
The name of the lab parameter is the topic,
the date and time of sample collection are timing variables,
the result is a result qualifier and the variable containing the units is a variable qualifier.
Variables that are common across domains include the basic identifiers study ID (STUDYID), a two-character domain ID (DOMAIN) and unique subject ID (USUBJID).
In studies with multiple sites that are allowed to assign their own subject identifiers, the site ID and the subject ID must be combined to form USUBJID.
Prefixing a standard variable name fragment with the two-character domain ID generally forms all other variable names.
The SDTM specifications do not require all of the variables associated with a domain to be included in a submission. In regard to complying with the SDTM standards, the implementation guide specifies each variable as being included in one of three categories:
Required, Expected, and Permitted4.
REQUIRED – These variables are necessary for the proper functioning of standard software tools used by reviewers. They must be included in the data set structure and should not have a missing value for any observation.
EXPECTED – These variables must be included in the data set structure; however it is permissible to have missing values.
PERMISSIBLE – These variables are not a required part of the domain and they should not be included in the data set structure if the information they were designed to contain was not collected.
87). Can you tell me more About SDTM Domains5?
SDTM Domains are grouped by classes, which is useful for producing more meaningful relational schemas. Consider the following domain classes and their respective domains.
• Special Purpose Class – Pertains to unique domains concerning detailed information about the subjects in a study.
Demography (DM), Comments (CM)
• Findings Class – Collected information resulting from a planned evaluation to address specific questions about the subject, such as whether a subject is suitable to participate or continue in a study.
Electrocardiogram (EG)
Inclusion / Exclusion (IE)
Lab Results (LB)
Physical Examination (PE)
Questionnaire (QS)
Subject Characteristics (SC)
Vital Signs (VS)
• Events Class – Incidents independent of the study that happen to the subject during the lifetime of the study.
Adverse Events (AE)
Patient Disposition (DS)
Medical History (MH)
• Interventions Class – Treatments and procedures that are intentionally administered to the subject, such as treatment coincident with the study period, per protocol, or self-administered (e.g., alcohol and tobacco use).
Concomitant Medications (CM)
Exposure to Treatment Drug (EX)
Substance Usage (SU)
• Trial Design Class – Information about the design of the clinical trial (e.g., crossover trial, treatment arms) including information about the subjects with respect to treatment and visits.
Subject Elements (SE)
Subject Visits (SV)
Trial Arms (TA)
Trial Elements (TE)
Trial Inclusion / Exclusion Criteria (TI)
Trial Visits (TV)
88). Can you tell me how to do the Mapping for existing Domains?
First step
is the comparison of metadata with the SDTM domain metadata. If the
data getting from the data management is in somewhat compliance to SDTM
metadata, use automated mapping as the Ist step.
If the data management metadata is not in compliance with SDTM then avoid auto mapping. So do manual mapping the datasets to SDTM datasets and the mapping each variable to appropriate domain.The whole process of mapping include:
*Read in the corporate data standards into a database table.
• Assign a CDISC domain prefix to each database module.
• Attach a combo box containing the SDTM variable for the selected domain to a new mapping variable field.
• Search each module, and within each module select the most appropriate CDISC variable.
•Then search for variables mapped to the wrong type Character not equal to Character; Numeric not equal to Numeric.
• Review the mapping to see if any conflicts are resolvable by mapping to a more appropriate variable.
• We need to verify that the mapped variable is appropriate for each role.
• Then finally we have to ensure all ‘required’ variables are present in the domain6.
89). What do you know about SDTM Implementation Guide, Have you used it, if you have can you tell me which version you have used so far?• Assign a CDISC domain prefix to each database module.
• Attach a combo box containing the SDTM variable for the selected domain to a new mapping variable field.
• Search each module, and within each module select the most appropriate CDISC variable.
•Then search for variables mapped to the wrong type Character not equal to Character; Numeric not equal to Numeric.
• Review the mapping to see if any conflicts are resolvable by mapping to a more appropriate variable.
• We need to verify that the mapped variable is appropriate for each role.
• Then finally we have to ensure all ‘required’ variables are present in the domain6.
SDTM Implementation guide provides documentation on metadata (data of
data) for the domain datasets that includes filename, variable names,
type of variables and its labels etc. I have used SDTM implementation
guide version 3.1.1.
90). Can you identify which variables should we have to include in each domain?
A) SDTM implementation guide V 3.1.1 specifies each variable is being included in one of the 3 types.
REQUIRED –They must be included in the data set structure and should not have a missing value for any observation.
EXPECTED – These variables must be included in the data set; however it is permissible to have missing values.REQUIRED –They must be included in the data set structure and should not have a missing value for any observation.
PERMISSIBLE – These variables are not a required part of the domain and they should not be included in the data set structure if the information they were designed to contain was not collected.
91). Can you give some examples for MAPPING 6?
Here are some examples for SDTM mapping:
• Character variables defined as Numeric
• Numeric Variables defined as Character
• Variables collected without an obvious corresponding domain in the CDISC SDTM mapping. So must go into SUPPQUAL
• Several corporate modules that map to one corresponding domain in CDISC SDTM.
• Core SDTM is a subset of the existing corporate standards
• Vertical versus Horizontal structure, (e.g. Vitals)
• Dates – combining date and times; partial dates.
• Data collapsing issues e.g. Adverse Events and Concomitant Medications.
• Adverse Events maximum intensity
• Metadata needed to laboratory data standardization.
92). Explain the Process of SDTM Mapping?• Numeric Variables defined as Character
• Variables collected without an obvious corresponding domain in the CDISC SDTM mapping. So must go into SUPPQUAL
• Several corporate modules that map to one corresponding domain in CDISC SDTM.
• Core SDTM is a subset of the existing corporate standards
• Vertical versus Horizontal structure, (e.g. Vitals)
• Dates – combining date and times; partial dates.
• Data collapsing issues e.g. Adverse Events and Concomitant Medications.
• Adverse Events maximum intensity
• Metadata needed to laboratory data standardization.
A list of basic variable mappings is given below4.
DIRECT: a CDM variable is copied directly to a domain variable without any changes other than assigning the CDISC standard label.
RENAME: only the variable name and label may change but the contents remain the same.
STANDARDIZE: mapping reported values to standard units or standard terminology
REFORMAT: the actual value being represented does not change, only the format in which is stored changes, such as converting a SAS date to an ISO8601 format character string.
COMBINING: directly combining two or more CDM variables to form a single SDTM variable.
SPLITTING: a CDM variable is divided into two or more SDTM variables.
DERIVATION: creating a domain variable based on a computation, algorithm, series of logic rules or decoding using one or more CDM variables.
93). Can you explain AdaM or AdaM datasets7?
The Analysis Data Model describes the general structure, metadata, and content typically found in Analysis Datasets and accompanying documentation. The three types of metadata associated with analysis datasets (analysis dataset metadata, analysis variable metadata, and analysis results metadata) are described and examples provided. (source:CDISC Analysis Data Model: Version 2.0)
Analysis datasets (AD) are typically developed from the collected clinical trial data and used to create statistical summaries of efficacy and safety data. These AD’s are characterized by the creation of derived analysis variables and/or records. These derived data may represent a statistical calculation of an important outcome measure, such as change from baseline, or may represent the last observation for a subject while under therapy. As such, these datasets are one of the types of data sent to the regulatory agency such as FDA.
The CDISC Analysis Data Model (ADaM) defines a standard for Analysis Dataset’s to be submitted to the regulatory agency. This provides a clear content, source, and quality of the datasets submitted in support of the statistical analysis performed by the sponsor.
In ADaM, the descriptions of the AD’s build on the nomenclature of the SDTM with the addition of attributes, variables and data structures needed for statistical analyses. To achieve the principle of clear and unambiguous communication relies on clear AD documentation. This documentation provides the link between the general description of the analysis found in the protocol or statistical analysis plan and the source data.