www.camh.net

Skip Navigation Links
Français

Helping Professionals


Carrying Out the Data Collection

by Darryl Upfold and Nigel Turner 

Program staff need to carefully consider how much of the evaluation (including the development of the logic model, evaluation framework, evaluation workplan, data collection and analysis) can be done by the staff, and whether other human resources (including students, volunteers and content experts) are needed. However, regardless of how much external assistance is enlisted, it is likely that a good deal of the actual data collection will be done by the staff, because the data collection generally involves face-to-face contact at several points with the clients while they are involved in the program. Because of this, decisions need to be made about who in the agency will be involved in the data collection. If questionnaires or standard tests are being used, one option might be to ask the client to complete the forms in the waiting room. In other cases, for example, when clinical information is coded to be used in the evaluation, the counselling staff would probably collect and code the information during counselling sessions.

Volunteers or students can conduct follow-up interviews, which are often done by telephone. Follow-up workers should be trained to ensure reliability and consistency. Volunteers are generally excellent and highly motivated workers. However, because of volunteer turnover, training them to conduct interviews can be costly and decrease consistency.

One of the problems in conducting program evaluation is that the evaluation activities (particularly data collection) are sometimes perceived by staff to interfere with counselling. It is not unusual for front-line staff and those conducting the evaluation to have different perspectives on the value of collecting data. Counsellors are often not used to thinking in terms of statistical evidence. Evaluators often do not fully appreciate the individuality of problems that might not be captured on forms, and how data collection might affect treatment. The counsellor-evaluator relationship can be facilitated by regular meetings in which the counsellors and evaluators share information and try to work together for the common goal of improving the service. Data collection should not be viewed as a barrier to treatment, but an ongoing process to help improve the service, and ultimately benefit the clients of the service.

Accuracy in Data Collection

Generally, self-reports are valid and reliable (see Graham et al., 1993); however, they are dependent on the participant’s honesty and memory. As such, there are a number of sources of error in self-report measures that need to be considered. First, some people lie or distort the truth. Respondents might feel guilty or embarrassed about their gambling. Some people give answers that are more socially acceptable. Other people might lie to avoid legal or marital problems. Second, some people pay very little attention to the questionnaire and respond in a random manner. Third, a person might not be fully aware of his or her behaviour and thus underestimate it. Fourth, there is a limit to what people can remember. The more complex or detailed the information you are asking for, the less accurate the information is likely to be. When designing a follow-up questionnaire, be aware of your client’s ability to recall information. Fifth, the wording of the question can affect the accuracy of the data.

Improving the Accuracy of Responses to Questionnaires
The majority of people who answer questionnaires give an honest report of their behaviour. Unless respondents are motivated to lie, they are probably telling the truth. However, the accuracy of the results can be improved by motivating the person to be honest. For example, you can explain to them that your reasons for conducting the study are to find out what the counsellors are doing right and wrong so that you can improve the service or determine what additional services are needed. In this context, the client might be less motivated to underestimate his or her gambling. Furthermore, by emphasizing the confidentiality and anonymity of the evaluation, clients might be less inclined to lie.

Detecting Random Responses
Random responses to items can be detected by the use of “fake questions.” For example, an Ontario drug use survey uses non-existent drugs to identify individuals that are either responding without paying attention to the items, or are pretending to use drugs. Similarly some personality tests use highly improbable items such as “I spent my summer vacation in Somalia” to detect invalid responses. Other items are used to detect people who are endorsing items that are socially desirable. In addition, sometimes filler items are used to make the purpose of the test less obvious. If you suspect a validity problem, you might want to consider including items that measure the validity of responses.

Using Multiple Indicators
Another factor that can improve the accuracy of the information is the use of multiple indicators. Information is remembered in context and is recalled better in the same context. Consequently, if you ask questions that cover a variety of contexts (“How often did you gamble in a casino last month?” “At a charity casino?” “At a ‘friendly’ poker game?”) the net result is likely to be more accurate than asking “How often did you gamble last month?” In addition, people are more likely to minimize their answers if asked a single question about how much money they have lost on gambling, because they might become more aware of how excessive their answer is. Asking clients how much they spend on several different gambling activities (e.g., lotteries, bingo, casino card games, etc.) will produce a more accurate estimate of their spending than a single question about their spending.

A potential problem in evaluation is that too many questionnaires might dissuade some clients from attending. This has to be balanced against the fact that more questions produce greater accuracy. Therefore, it is important to be clear about what questions need to be asked, and to be sure that the data that are collected will be useful (which should be the case if the process of developing a logic model, evaluation framework and evaluation questions is followed). Questions should not be asked if there is no clear purpose to the data that would result.

Statistical Analysis

In some agencies, staff may be trained in statistics. In other agencies, this skill may need to be acquired externally. This section is not intended to teach staff how to conduct statistical analysis, but rather to provide the reader with information so that he or she can communicate effectively with the individual(s) who might be carrying out the statistical analysis. In this section we will cover two types of statistics, statistical significance and other tool measures of impact and sampling. This will then be followed by a brief discussion of how statistical analysis can be applied to each of the five types of program evaluation. There are two main functions of statistics: to summarize or describe information to make it more useable (“descriptive statistics”); and to infer, or make decisions about the sample in the evaluation (“inferential statistics”).

Descriptive Statistics

Descriptive statistics help researchers and evaluators make large amounts of data more manageable. This is done by computing descriptive measures like averages (mean, median, and mode), standard deviations, percentages and correlations (the relationship of one variable to another).

Inferential Statistics

Inferential statistics have their basis in probability theory. They are used to “infer” the properties of a sample. These are the statistics that are used to determine if, for example, one group (e.g., a group of individuals who completed treatment) is statistically different from another group (e.g., a group of individuals who were placed on a wait list as a control group) based on one or more outcome indicators. Some of the descriptive statistics, like the mean and the standard deviation, are used to compute inferential statistics. These computations are more advanced than the computations in descriptive statistics. The type of inferential statistic available depends on the type of data collected. Chi-square (P2) is used to analyze the relationship between two categorical variables (male vs. female and treatment, completers vs. non-completers). Analysis of variance (t-test, f-test, Manova) is used to see if different groups of people (an “independent” variable such as treatment completers vs. non-completers) differ on an outcome measure (a “dependent” variable such as life satisfaction). Consult a statistics textbook for details on the appropriate analysis to use with different data.

Many published papers present their findings in terms of statistical significance. For example, a paper might report that life satisfaction six months following treatment was significantly higher among clients who completed the program compared with those that dropped out, t = 2.8, p< .05. The t = 2.8, or t-test is the inferential statistic used to measure the effect and the p< .05 indicates that the probability (p) that the observed difference is due to chance is less than 5%. The null hypothesis is the hypothesis that there is no difference between the group means. In this case, we can reject the null hypothesis, because the groups appear to be different. Inferential statistics are a powerful tool. Without significance it is impossible to judge if an effect represents a real change or is merely chance error.

It is important to keep in mind that no matter how big the effect is, the observed results could still be due to chance. Finding an effect that is not really there is called a type 1 error. Since statisticians only declare an effect significant if there is less than a 5% chance that the effect is random fluctuation, a type 1 error will occur roughly once out of every 20 times you estimate the significance of an effect. Other times you will fail to find a difference when two groups actually do differ. This is called a type 2 error. A larger sample, more reliable measures or multiple dependant variables with multivariate analysis (Manova) will reduce the chance of making a type 2 error. It is important to understand that type 1 and type 2 errors do occur. If you conduct a lot of research or use many different outcome variables, chances are that some of them will show type 1 errors. This does not mean that you should use fewer measures or conduct fewer studies since that would increase your chance of a type 2 error. What it means is that when you hire consulting staff, discuss the possible use of multivariate analyses, data aggregation, or repeated studies, to control for type 1 and type 2 errors. Even if a large sample is used, on average 1 in 20 comparisons will be type 1 errors; however, with a large sample, type 1 errors will tend to be trivial in terms of effects, size and clinical significance, while real effects will tend to produce larger and more meaningful effects (see effect size and clinical significance).

Use of Computer Software in Data Collection

There are two choices in developing a system to record and maintain the data that are collected: a paper-and-pencil system, or a computerized system. As mentioned earlier, addiction agencies in Ontario use software that is recommended by the funding body to collect some statistical information. Other software programs have been developed for the purpose of collecting and analyzing data. Among these are: epi info, Lotus 1-2-3 and access. There are other programs that can be used to conduct more sophisticated statistical analysis, such as spss Qualitative analysis, which usually involves the use of a spreadsheet or database programs such as nudist to help organize the information into specific and more general themes. (Please note that these programs are listed as examples of software that can be used by agencies, and do not serve as an endorsement of any one program.) The advantage of software programs is that they can be used to conduct basic statistical analysis (descriptive statistics). Note, however, that some of this software costs as much as $1,000 per unit, so budget for software.

Other Evaluation Statistics: Effect Size, Clinical Significance and Cost Effectiveness
A statistically significant test result should be seen as a necessary condition for concluding that a treatment is effective. Three other means of evaluating the success of a program are Effect Size, Clinical Significance and Cost Effectiveness. Again, these are only introduced to give you an introductory understanding of their role in evaluation. Effect size, d, is a ratio of the effect itself (difference between groups) divided by the size of the standard deviation (average difference between people). For example, suppose that the difference in spending of subjects on the pre-test vs. post-test was $5, on average. If the standard deviation was $100, the effect size would be deemed trivial (d = 5/100 = .05).

However, if the standard deviation was $10, then the effect size would be considered relatively large (d = 5/10 = .5). However, effect size and significance results do not necessarily represent a meaningful change in the person’s life. Jacobson & Truax (1991) describe a statistically based method of determining the clinical significance of an outcome. Clinical significance is based on the extent to which a program shifts the population of clients from the distribution of people who are considered to have a problem (e.g., a score of 5 or more on the sogs) to the distribution of people who are considered not to have a problem (e.g., a sogs score of 2 or lower; see, Jacobson & Truax, 1991 or Stinchfield & Winters, 1996, for more information).

Cost effectiveness is another consideration that should be taken into account. One program may be more effective, but when balanced against cost, a cheaper program may be better since it might reach a larger number of people (see Sloan, 1997).

Sampling

Sampling refers to the selection of a representative number of individuals from the overall “population” that is being considered in the evaluation (i.e., usually the clients of the program). Sampling is used to reduce costs, time and effort, while still obtaining information from a sample that is representative of the entire target population. In other words, sampling is used when it is not possible to include all individuals in the evaluation.

There are six types of sampling techniques, and the selection of which sampling technique to use depends on the type of evaluation being conducted (e.g., the sampling technique for a needs assessment evaluation would be different than the sampling technique for a process or outcome evaluation).

  • Stratified: used to ensure that certain subgroups are included in the evaluation (e.g., if all age groups between the ages of 15 and 65 need to be included).
  • Cluster: used in face-to-face interviewing when the population in question is spread out, in order to keep costs down (e.g., randomly select two of five communities, rather than all five).
  • Quota: used if you have limited resources (e.g., if the population you are interested in is 70% male and 30% female, and you can afford to survey 50 individuals, you would sample from the population until you reached your “quota” of 35 males and 15 females).
  • Random: everyone in the population has an equal chance of being included (e.g., picking every third client from the client register).
  • Accidental/convenience: taking what is convenient, often called “person in the street” technique (e.g., taking the next 20 clients who call for an appointment).
  • Reputational: the selection of the individual depends on someone’s judgment of who a “typical” representative of the population is (e.g., key informant interviews with family members of gamblers)

Each technique has its strengths and limitations, and a textbook can be consulted to learn more about sampling. In the next section we will cover some of the basic issues involved in statistical analysis for each of the five types of program evaluations.

Needs Assessment Evaluation
A Needs Assessment evaluation typically involves collecting both quantitative and qualitative data. The qualitative data would include information and opinions from individuals (sometimes referred to as “key informants”) and groups as to whether a service is needed and how it should be designed and marketed.

The quantitative data include information like population descriptors, estimates of the number of individuals who might need the service being planned and the number of individuals the service would be capable of seeing in a month or year. These statistics can be easily expressed as averages and percentages (descriptive statistics). For example: “Based on the literature that reports that approximately 2% of the general population over age 18 have a problem with gambling, we would expect that in a community of 100,000, there would be approximately 2,000 individuals with a gambling problem. Given that somewhere between 1 and 3% of the individuals with a problem are likely to seek assistance in a given year, we would anticipate a service demand of between 20 and 60 individuals a year.” (Note that the statistics presented here are for illustrative purposes and are not intended to represent the actual prevalence of problems or the likelihood that a person will seek treatment in a given year.)

Process Evaluation
A Process Evaluation is conducted to describe the services being provided, which are described as Implementation Objectives in the logic model and evaluation framework. (e.g., number of sessions), to describe client characteristics (e.g., age, gender, presenting problem) and to determine if the service is operating as it was planned, and is attracting the clientele for which it was designed. Descriptive statistics are used in Process Evaluation to express the results as frequencies, percentages and averages.

Outcome Evaluation
Outcome Evaluation is conducted to determine the extent to which the program is helping clients change their behaviour, which is described in the Outcome Objectives in the logic model and evaluation framework. To measure this, information (data) is collected that can be analyzed using inferential statistics. The specific tests are known as “tests of significance” (e.g., the “t-test”; see above). These statistical tests are used to determine if differences (e.g., in amount of money spent on gambling) between groups (e.g., clients who complete eight sessions of counselling vs. clients who drop out) are large enough to be considered “statistically significant.” In other words, inferential statistics used in Outcome Evaluation are used to determine program effectiveness.

The Gambling Treatment Outcome Study  includes an Executive Summary of a process and outcome evaluation that provides an example of how data can be analyzed and discussed.

Client Satisfaction Evaluation
Most client satisfaction surveys contain eight to 10 scaled questions (e.g., a five-point scale from “strongly disagree” to “strongly agree”). The most common statistic used to analyze client satisfaction data is a “mean” (commonly known as the “average”). For example, the mean of the clients’ responses on each item (question) can be calculated (e.g., “The average client response to the question ‘Would you refer a friend to this agency’ was 4.1 on the five point scale”).

There are a variety of timing strategies used to collect Client Satisfaction Evaluation data. The survey can be administered as each client completes counselling. However, this poses problems since clients who are dissatisfied might drop out earlier, which skews the results in the more favourable direction. Administering a survey to all clients on a specific day is a useful method of obtaining a cross sample at any one time. A telephone survey of randomly selected clients that includes completers and non-completers would be the best method of obtaining a full range of clients.

Economic Evaluation
Economic evaluation is done to identify the treatment options that yield the best value for the resources expended. Meticulous records are often needed of staff time and client time devoted to a particular stream of treatment. This information can be used to determine per-unit cost. In a broader context, these methods also take into account the costs of not treating people. Ideally, an economic evaluation would compare two treatments by holding effectiveness constant, and then determining which service costs the least, or holding cost constant, and determining which service is the most effective. In practice, the economic evaluation must often compare treatments that vary in terms of both effectiveness and cost. For example, a evaluation might compare a mass marketing information campaign on smoking cessation with outpatient cognitive behavioural therapy. The therapy might be more effective, but cost more and reach fewer people. The economic evaluator has to find a mathematical means of measuring the two on common scales in order to make the judgment of whether the added effectiveness of the treatment offsets higher cost. Economic evaluations are often completed by external consultants. A review of the statistical methods involved in doing an economic analysis is beyond the scope of this section (see Sloan, 1996).

Ethics in Program Evaluation
The primary ethical considerations in most research studies are confidentiality, informed consent, access to treatment services (specifically in a controlled design), time required to participate in the study and quality of service (e.g., are you treating the clients as guinea pigs for a method that is untested). Generally you must take careful steps to avoid harming the client in any way.

The first step in doing any data collection is to have the client read and sign a consent form that asks him or her if he or she will participate in the follow-up. The consent form should explain to the client the procedure and purpose of the follow-up and ensure him or her of confidentiality and/or anonymity. The consent form should also ask the client about contact numbers, including alternative contact numbers (work, parents, other family) to reduce the chance of losing track of participants. You should also include a question on whether or not you can identify your agency when you call or leave a message.

Information about a client’s identity should not be included in the data that are analyzed. It may be necessary to check the accumulated data against the client’s file; however, only a limited number of people should be familiar with the relationship between a client’s research id number and name. When writing a report about your service, you must be very careful never to provide any information about a client that can identify who the client is.

Back to Program Evaluation for Problem Gambling Services


DISCLAIMER: Information on this site is not to be used for diagnosis, treatment or referral services. CAMH does not provide diagnostic, treatment or referral services through the Internet.
CAMH accepts no responsibility for such use. Individuals should contact their personal physician, and/or their local addiction or mental health agency regarding any such services.
Technical enquiries: webmaster@problemgambling.ca