Teacher Education

Research Tools and Process of Development
Data collection is an important part of research. In order to collect the requisite data for any research, you have to devise appropriate tools and use suitable measuring techniques, and decide on the relevant attributes of the samples drawn. There are several research tools, varying in design, operation, complexity of features, and interpretation. In certain situations you may select from a list of available tools. In other situations you may find that existing research tools do not suit your purpose or objective of research and, therefore, you may like to modify them or
develop your own. Each tool is appropriate for collecting a particular type of data or information which lends itself to a particular type of analysis and interpretation for drawing meaningful conclusions and generalisations. For this, you need to familiarise yourself with the nature, merits and limitations of various research tools. In this section we will focus on the characteristics, types, uses and limitations of some commonly used research tools – questionnaires, rating scales, attitude scales and tests.
OBJECTIVES
On completion of this , you should be able to:
1 Describe the characteristics of a good research tool,
2 Define a questionnaire and describe its various types;
3 Describe the characteristics, uses and limitations of a questionnaire;
4 Define a rating scale and describe its types, uses and limitations;
5 Define an attitude scale and describe its types, uses and limitations;
6 Define a test and’describe the types, uses and limitations of tests and Choose appropriate techniques and use them efficiently in your research.

SCALING IN EDUCATIONAL RESEARCH
Research tools are the measuring devices. Every measuring device has some kind of graduation depending upon the system of measurement. For example, the FPS or CGS systems measure length in foot or centimeter. Similarly weight is measured in pounds and grams. The footrule that measures length is graduated in inches.

There are two major attributes –
(1) each inch is of equal length wherever it appears on the footrule,
(2) two different objects measured as two inches, for example by same footrule arof same length.
Just as FPS or CGS provides the basis for scaling for physical measurement, it is necessary to provide certain form of scaling for mental measurement –

measurement of variables like intelligence, achievement, demographic, attributes, etc.
Four type of scaling are used in measurement. These are:
1 Nominal,
2 Ordinal,
3 Interval and
4 Ratio.

1 Nominal Scale: It is the most elementary form of the scale. As indicated by the name itself it is onlynominal. This form of scale is largely used to classify people or object in certain categories like male-female, rural-urban, dark-light, tall-short etc. In other words, it labels object of measurement. In the context of research it is concerned with the frequency of occurrence in the various categories.
For example in a class or in a counselling session how many are male or how many are female students or how many have read the learning material or how many have not read the material.

2 Ordinal Scale: This is the second level of scale which is more sophisticated than the nominal scale, though remains in one of the cruder forms. Wherever the sample of the research is arranged in ascending or descending
order on the basis of data ori a variable, we are using the ordinal scale, e.g., when the students are ranked in a class on the basis of their achievement we are usin ordinal scale e.g., the 10″‘ rank in a class of 50 students is better B than 11 rank but lesser than the 9’h. However, despite the ranking it does not indicate the difference between the 9th and 10″‘ rank is equal to the difference between the 1 1 “‘ and 1 2th. In other words, the difference between the ranks are either unknown or unequal. The only information Chat is derived fiom this case is the relative position of a subject within the sampled population on a variable.

3 Interval Scale: As the name indicates the scale that intervals at different points of graduation is called interval scale. This is also called equally appearing interval scale. The most common use of interval scale is the achievement test when the test contains 100 as full marks, it implies 0 as the beginning. Hence 0 to 100 or 101 point scale where it is graded by one score at a time. This form of scale is extensively used in large majority of psychological variables like interests, attitudes, aptitude, etc. As mentioned – earlier it is also called equally appearing intervals. There is a significant implication of the word appearing. Apparently the difference between 88 and 90. Since both have a scale difference of two points.however practical experience wiIl indicate that moving from 28 to 30 score points is far easier than moving from 88 to 90. In other words, despite the apparent difference 5 o ‘two, the actual difference between 88 to 90 is much larger than the difference 28 and 30. Although, it is by far the most sophisticated scale in social research, it has the limitation of being in exact compared to the ratio scale used in the physical measurement.
4 Ratio Scale: Ratio Scale as mentioned above is primarily used in physical measurement. It is exact and accurate. It is very similar to interval scale except that it has an absolute zero. For example, if a length is indicated by 0 cm., it means non-existence whereas a score of 0 in mathematics does not indicate the absence of mathematical knowledge. The other major feature of ratio scale is the ratio itself. It lies that in a 100 cm., or 1 meter long rod the R distance between the 28th and 30 cm., is exactly equal to the distance between the 8oth and 9oth cm., of the rod. In this case the intervals in thescale are not apparent but real. However, ratio scale has very little if at allvapplication in socia1 research in general educationaI research in particular.
However, the basic philosophy of ratio scale is the basis of the interval scale that is extensively used in educational research. As mentioned above, there are four types of scales Nominal, Ordinal,
~nterval and Ratio.
The nominal scale and the ratio scale are the crude end and the sophisticated end of the continuum. The most extensively used scaling technique in educational research is interval scale. However, choice of scaling technique depend upon the nature of the variable.

CHARACTERISTICS OF A GOOD RESEARCH TOOL
There are mainly three characteristics of a good research tool. These include validity, reliability, and usability. In selecting tools for collecting data a researcher should evaluate them in terms of these characteristics. Let us
discuss these one by one.
1 Validity
A tool used for collecting data must provide information that is not only relevant but free from systematic errors. In other words, it must produce only valid infarnlation and measurewhat it claims to measure.
For example, an achievement test in Physics must measure knowledge of students in Physics alone. It should not turn out to be a language test. If a question on frictional force is asked, and a certain student well versed in the
English language writes a good ‘essay’ on it, the researcher should not end up measuring the language ability of the student. A tool, however, does not possess universal validity. It may be valid in one situation but not in another. The tool useful in deciding in a particular research situation may have no use at all for a different situation. So, instead of asking, “Is this research tool valid?” It is important to ask the more pertinent question,
“How valid a particular tool is for collecting information which the researcher needs to gather?” Or, more generally, “For what decision is this tool valid?”

There are three types of validity:
(i) content validity;
(ii) criterion-related validity;
(iii) construct validity..

(1) Content validity : It relates to the relevance of the content of a research tool to the objective and nature of a research problem. For example, in the case of tests of achievement, content validity is estimated by evaluating the relevance of the test items to the instructional objectives, the actual subject studied, and the knowledge, acquired individually and as a whole. Taken collectively, the items should constitute a representative sample of the variable tested. Content validity of a research tool is based on the judgement of several experts in the field concerned, careful analysis of objectives of the subject research and the hypotheses, if any, to be tested. Content validity is also known as rational or logical validity or face validity.

(2) Criterion-related validity :In decision making situations, selection or classification is based on an individual’s expected performance as predicted by a research tool. For example, a psychological test or rating scale which predicts the kind of behaviour it was intended to predict, is said to possess ‘predictive validity’. The prediction may be regarding success in a job or a course. This validity, refers to the association between present result as indicated by a particular research tool and future behaviour. In order to determine the predictive
validity of a tool, the results from it must be compared with the actual performance or outcome in the future. For example, if a test is designed to select students for a certain medical course, scores on the test must indicate
a significant positive relationship with their ultimate success in the medical profession. A researcher studies predictive validity if his or her primary interest is in the outcome which he or she wants to improve by some
professional decisions. In some research situations, a researcher may wish to develop a new tool as a substitute for an already existing cumbersome tool (technique or method). If the existing tool is considered useful for decision making and we want to test the validity of the new one, the key question to ask is whether the new tool agrees with the information sought through the existing cumbersome technique. If they disagree, the new one cannot be substituted for the original tool.
The agreement between the newly developed tool and the already existing cumbersome technique for which the tool has been developed, is estimated by an empirical comparison. Both, the newly developed tool and the original one are applied to the san~e sample groups, and the results are compared. This type of empirical check on agreement is
called concurrent validation, as the information obtained through the two tools ought to give nearly the same results.
The validity of the new tool thus established is called its ‘concurrent validity’. Let us suppose that a
researcher has developed an achievement test in mathematics. The scores on this test may be compared with scores given by the mathematics teacher to the sample students. If the two tests show nearly the same result the
concurrent validity of the researchers newly developed tool can be established. In case of predictive.validity, the measure of the outcome is termed ‘criterion’. While estimating concurrent validity the newly developed tool is
proposed as a substitute for the existing technique or method, and the information obtained through the existing technique acts as the criterion.’ Since in both the cases the information sought through the newly developed
tool is related to a criterion, the two types of validation are also termed ‘criterion-related validity’.
Construct validity
(3) Construct validity : It is concerned with the extent to which a test measures a specific trait or construct. This type of validity is essential for those tests which are used to asses individuals on certain psychological traits and abilities. Examples of common constructs are anxiety, intelligence, motivation, attitude, critical thinking etc. Construct validity is established by relating a presumed measure of a construct with some behaviour that it is hypothesized to underlie.

Reliability
A tool used for data collection must be reliable, that is, it must have the ability to consistently yield the same results when it is repeatedly administered to the same individuals under the same conditions.
For example, if an individual records his resposes on various items of a questionnaire and thus provides a certain type of information, he/ she should provide approximately the same type of responses when the questionnaire is administered to himlher on the second occasion. If an achievement test is administered to learners and then readministered after .a gap of fifteen days without any special coaching in that subject, within these fifteen days, the 1earners just show similar range of scores on re-administration of the test.
Repeated measure of an attribute, characteristic or a trait by a tool may provide different results. They may be due either to a real change in the individual’s behaviour or to the unreliability or inconsistency of the tool itself. If the variation in the results is due to a real change in behaviour, the reliability of the tool is not to be doubted. However, if the variation is due to the tool itself, then the tool is to be discarded.

There are various procedures to assess the reliability of a tool. These include
(i) the test-retest method,
(ii) the alternate or parallel-form method,
(iii) the split half method, and
(iv) the rational equivalence method.

The test-retest method : In this method the same tool is re-administered to the same sample ofpopulation shortly after its first administration. The relationship or agreement between the information or data sought through the two administrations provides the measure of reliability of the tool.
The chief disadvantage of this method is that if the time between two administrations of the tool is short, the immediate memory effects, practice and the confidence induced by familiarity with the tool may give a wrong measure of its reliability. On the other hand, if the interval is too long, the real changes in behaviour in terms of growth may under-estimate the reliability of the tool. Owing to these limitations, the test-retest method is generally less useful than the other methods. However, this type of measurement is commonly used with questionnaires, observations, and interviews.

The equivalent or parallel-forms method : This method requires that two equivalent or parallel forms of tool be prepared and administered to the same group of subjects. The items in these tests are parallel. Then, the results in terms of two sets of measures obtained by the use of the tool are correlated to. measure the level of its reliability. In developing the parallel forms of a tool, care has to be taken to match the tool material with the content, the difficulty level and the form. The parallel-form method is widely used for determining reliability of a research tool. The reliability of psychological tests and attitude scales is usually estimated by this
method. ,

The split-half method :
In this method, the tool is first divided into two equivalent ‘halves’. If there are 50 items in a test, two equivalent halves are made of 25 items each. It may be done by having alternate items. The measure of the first half of the tool is correlated with thk measure of the other half. The measures are correlated to find the reliability of tests and attitude scales. The main limitation of this method is that a tool can be divided into two halves in a
number of ways and, thus, the estimate of the reliability may not have a unique value.

The rational equivalent method :

This method of measuring reliability is considered to be free from thelimitations of the other methods discussed so far. Two forms of a tool are defined as equivalent when their correspondiilg contents are interchangeable. This method is most commonly used in estimating the reliability of psychological tests.

3 Usability

The usability of a tool depends on its objectivity, cost effectiveness, the time and effort required to administer it, and how easy it is to analyse and draw conclusions through its use. A tool should yield objective information and results. In other words, the results should be independent of the personal judgement of the researcher. If it cannot yield objective data, we say that it is not usable. If the tool can be administered in a short period of time, it is likely to gain cooperation of the subjects and save time of all those involved in its administration. The cost of construction, printing and administration of the tool should be reasonable.
The simplicity and ease of administration, the swres and interpretation are also important factors to be considered while selecting a tool, particularly, when the expert advice is not easily available. The tool should interest’ and fascinate the subjects so that it may gain their cooperation.

Steps of Research tool development

Step 1: Define all constructs, including those that are not the focus of your research, fully before you attempt to create an instrument. Define all of the constructs, not just those you will use and identify all of the dimensions for each construct. You need to define all of them to make sure that there is no overlap or ambiguity in the definitions. Go to more than one source of information about the theory because there may be some differences
in definitions even within the same theoretical perspective. Write down the definition you use. Include a one or two sentence definition of the construct and a description of the major differences in how the construct is defined and the decisions you had to make.
For example, you might find that one theorist argues that there are three dimensions and another four. You
have to make a decision about which to use. This is critical when you have to write your dissertation or thesis and when you publish. You need a written record of the decisions you made that could affect the outcomes of the study.

Step 2: Identify variables that will represent the constructs. You may find useful information in the literature that will help with this, but remember that the variables are specific to your study topic and the context (place, time, population) in which you will conduct the study. Make sure the variables represent all dimensions in the constructs of interest for your research. It is better to have too many variables than too few early in the process. You may want to generate measures for two or three variables representing the same construct because you can often merge the scores later if you decide to reduce the number of variables. You can also use this redundancy in the assessment of the validity of your measurements because multiple variables representing the same construct should have similar patterns of response. If you have multiple variables per construct, you can assess this convergence in scores. Multiple variables per construct are recommended.

Step 3: Develop the specific items you will use. It is always valuable to look for existing instruments and use them if they are appropriate to the context for your study and using an existing instrument – if it will work for you – is always preferable to creating your own. You must purchase or get the consent of the creator of an existing instrument to use it in your research.
Failure to do so constitutes plagiarism. You can also sometimes “borrow” individual items from instruments others have developed. However, you need to cite the original work when you do this. If you are going to use more than one or two items, you need to get permission from the original creator to use the items. Even when whole instruments or items are available that seem like they will work for you, you need to test them thoroughly because context is critical to wording, content, and even response formats for items. In this class, you practice developing your own instruments and your own items. Therefore, I to rely primarily on items that you development. Do not use existing instruments. No more than 25% of the items in any of the instruments you develop should be from the literature – 75% are your creations.
Start with many redundant items. You will run multiple tests on your item banks to determine which items yield reliable, valid, and discriminatory results. Redundant means that you start with many items for each variable, items that differ significantly so that they capture the full meaning of the construct. You will eliminate many (maybe most) of them. You may also have two, three or more versions of an item that differ in wording or construction (for example, a reverse-scored version of an item) where such differences could influence how the respondent understands General Procedure what you are asking. However, focus on redundancy in content, not wording. Taking an
example of an instrument using closed responses (check the box), if you start with 30 items, you may find that only 8 or 10 or them remain after all this testing. Hence rule of thumb is to start with 30-40 items per variable for instruments like an index, scale or test. For other instruments like an interview protocol with an open, narrative response, the key is to start with many different kinds of questions for each topic. For example, a leading question might start the conversation about a topic, followed by some probing questions (confirming and disconfirming) and end with some summary questions. The range of types of questions you can use to develop an interview protocol. You would have several of the trails of questions for each variable just as you would have several items for a “check the box” set of responses. The principle is the same. Start with many items in all assignments.

Step 4: Revise to correct problems in wording, response structure, etc. (technicalissues). Do not waste other people’s time looking at your instrument until you have done your best to correct technical problems. Do this before anyone else sees your instrument.

Step 5: Expert Panel Review. After you have completed your review and correction, you need to get an expert panel to review your work. An expert panel consists of people who have expertise either in methods of data collection or in the topic of your research and/or your theoretical approach. This is not a check of wording, etc. Experts may find some of those problems, but it is your responsibility to fix those problems. The expert panel normally consists
of your colleagues and for students the chair and members of your supervisory committee. Expert panel review is what your committee does when you defend your research proposal. The work of the expert panel is largely conceptual. Typically, you would ask them to do three things for you.
(1) The panel should provide an assessment of whether your instrument(s) capture the full meaning of the construct. (2) They should identify any aspects of your instrument that are notrelevant to the theoretical construct or that are not appropriate for the topic/context of your study.
(3) The panel should assess the adequacy of the format and structure of the items.

The common use of the Likert-type statement (sentence or phrase) with several categories of agreement or disagreement (strongly agree to strongly disagree in most cases) is often an example of an inappropriate item-response format. A Likert item-response format used with statements like “My department provides training for teaching assistants.” This is a simple “yes/no” item – if is a correct statement of fact or it is not. In any case, the Likert format of agree to disagree creates a great deal of mental work for the respondent who must complete at least four mental steps
(1) read a statement, (2) decipher/interpret the meaning of the statement, (3) determine one’s basic agreement (yes/no), and (4) decide which specific category of agreement or disagreement best reflects one’s assessment.
This mental work is valuable when topics are emotionally charged because it gives the respondent a way to create emotional distance from potentially disturbing ideas expressed in the statements. If the topic is not one that is apt to create emotional turmoil for the respondent, do not use the Likert-type statement and agree/disagree response format.

Be cautious :. You CANNOT USE the Likert-type response format for this class unless there are
sound reasons why some other approach will not work.

General Procedures – Questions are generally preferable to statements because they elicit a more straightforward thinking process on the part of the respondent. They are easier for the respondent to process. Here is an example.
Question: How often do you host dinner parties in your home? Select one. Responses:
1-3 times per year,
4-6 times per year,
7-9 times per year,
10-12 times per year,
more than once a month.
Statements; I rarely host dinner parties in my home; I occasionally host dinner parties in my home; etc.Just ask the question.
Provide the panel members (again, that may be just one or two people) with the research questions, a brief description of your theoretical approach, and a full definition of each construct you are trying to operationalize. They do NOT need to see the standard items for demographics and such. It is also useful to draw their attention to areas where you are uncertain and feel you need their input. Do everything you can to make this as easy as possible
for them to help you. Otherwise, always conduct a cognitive review with members of the class, faculty members, or other graduate students with the requisite expertise.

Step 6: Cognitive Testing. Cognitive testing as nearly equal in importance to expert panel review in the process of instrument development. Like expert panel review, you can reap major improvements from the process at relatively little expenditure of time and effort. You can and should use this technique with all types of data collection. Also like expert panel review, you do not ask people to answer the questions in most instances – you ask them to tell you how they go about deriving an answer to the question. E.g., you are asking people to
explain the cognitive processes they would use to answer the questions. Cognitive testing
must be done with members of the target population for the study or with individuals who are “very much like” the members of the target population with regard to characteristics or traits that can affect how well they can respond to your questions.

Step 7: Pilot Testing. Pilot testing is the step when you ask members of the target population to respond to your items. These are your first data points. I recommend that you treat the test as the first phase of data collection. If you do that, and your instrument performs well, you may be able to use the data as part of your final data base. The document “Procedures for Operationalization” discusses various techniques for pilot testing instruments.

Source:
1 Steps in Instrument Development for the Course, University of Florida
2 RESEARCH TOOLS- IGNTU Amarkantak