Practical Project Design > Part 2 > Unit 4

From Data Collection to Data Interpretation


1. Introductory remarks

As I remarked in Unit 2, data-driven research is our preferred mode of research for our B.A. students. Data collection and interpretation therefore plays a crucial role in their projects, hence this lecture.

According to Webster's Unabridged Dictionary, the word data is a plural of datum, which is originally a Latin noun meaning "something given." Today, it is used in English both as a plural noun meaning "facts or pieces of information" (e.g. These data are described more fully elsewhere), and as a singular mass noun meaning "information": Not much data is available on flood control in Brazil. It is almost always treated as a plural in scientific and academic writing. In other types of writing it is either singular or plural. The singular datum meaning "a piece of information" is now rare in all types of writing. In surveying and civil engineering, where datum has specialized senses, the plural form is datums.

For our purpose, there is an extra dimension on the definition of data, namely facts or pieces of information employed to support, or demonstrate, or instantiate a point, an argument, an abstract category or concept, etc. Below is an extract from a research paper. Read it and comment on the use of data (Note that it does not matter if you do not understand what it is being dealt with, as long as you can tell the data from the analytic text.)

Time, space, workplace facilities, functions and goals of the workplace, routinized workplace practice, are the main non-personal constraining/enabling factors on the staff members' performance of workplace discourse. There are, on the other hand, personal constraining/enabling factors: members vary, e.g. in their ideological orientation towards work, towards workplace practice, and in their ability in making use of the discourse resources, and in their sociability and communicative ability. Thanks to these diversified constraining/enabling factors, the actual talks recorded from the actual workplaces are quite mosaic, resisting the classification into hard and fast ready-made categories of discourse (see also 6.7 below). Consider this dialogue:

Thursday 28.wav
B: 某某回来啦 (the so-and-so is back)
A: 嗯 (pardon)
B: 某某哇 (the so-and-so)
A: 啊回来了(ah back)
B: 回来啦 (back)
A: 某某很年轻的 (the so-and-so's very young)
B: 对 (yeah)
A: 可能比你小吧 (younger than you perhaps)
B: 什么样子 (what does she look like)
A: 好象以前是他的学生 (seems to be his former student)
B: 好可惜呀 (what a pity) 应该要早点过来念书 (should have come to study here earlier) <start giggling>这样就有机会 (that will give an opportunity) 嘿嘿… (giggling)

I am sure that you can easily identify which part of the text is data. The function of the data is to demonstrate the author's point that workplace discourse is mosaic, resisting the classification into hard and fast ready-made categories of discourse. With this intuitive sense of what data is, let us examine the ensuing issues:
 1) types of data
 2) data collection and storage
 3) data processing
 4) data interpretation and utilization

2. Types of data

Data can be classified in a variety of ways: (a) source of origin; (2) manner of collection; (3) manipulation; (4) processing; (5) utility.

2.1 Source of origin

If the data come from the recall of memory, or reflection, such data are called retrospective data. For instance, in the following dialogue --

Wang Ling: John, is it idiomatic in English to say "make bed and lie on it"?
John: Erm … not quite. Er… we say "make your bed and lie in it".

John's reply "make your bed and lie in it" is an instance of retrospective data (in this case it is also called intuitive data).

Data which are not based on memory or reflection are non-retrospective data. For instance,

 (on the spot)
Police: Sir, did you see the two cars crashing into each other?
Witness: Yes, I saw it with my very eyes. The two cars hit each other headlong.

The witness's reply "The two cars hit each other headlong" is non-retrospective data, and can be called eye-witness data.

2.2 Manner of collection

Data can be classified on the basis of the ways they are collected. There are (a) note-making, (b) interviewing, (c) journal-keeping, (d) questionnaire, (e) audio-taping, and (f) video-taping.

You go and sit in one of your students' class. While observing the class, you note down what you observe. The notes you make are known as field notes.

Suppose that you want to find out why some students keep committing the same mistakes again. You can do so by interviewing them. The details of the interviews, for example, the minutes you take while interviewing the interviewee, become interview data.

Questionnaire as a way of data collection is familiar with most of us. Its advantage is in its scale, that is, questionnaires can be sent to a large population which would otherwise be impossible to access to.

Suppose that you want to find out how students feel with your newly designed activities. You can ask them to keep their learning journals (i.e. keep diaries in Chinese) for you. You can also keep teaching journals for yourself. They are journal data (diary data in Chinese).

Suppose that you use audio-recorder and video-recorder to record the activities. You collect audio and video data.

The differences among these kinds of data lie in the degrees of detail, authenticity and reliability. The video data overtakes all the other types in detail, authenticity, and reliability. It loses to audio data with regard to naturalness, for participants may start acting while facing the camera.

Field notes, interview minutes and journals only capture the highlights or key points, leaving out a lot of detailed information. They are also subject to observation errors, and biases. So their reliability can be called into serious question if they are not used with care. However, they have an obvious advantage over audio and video data, namely they are easier and less expensive to collect.

2.3 Manipulation

The manipulation of data refers to the way data collectors deliberately influence the way participants perform the activity. For example, a policeman was asked to collect eye-witness data over a hit-and-run accident. It just so happened that the hit-and-runner was the policeman's brother, and that the policeman knew this. He bribed the eye-witness before he started collecting his report. The data thus collected are not only on longer eye-witness data, but also being manipulated.

In terms of manipulation, there are
(a) naturally occurring data, that is, no interference;
(b) experimental data, that is, participants are put in a special circumstance for their performance;
(c) solicited data, e.g. participants' response is framed by the questions put to them.

Data collected through questionnaires and interviews are as a matter of fact solicited data. The questions can be unwittingly or deliberately phrased in such a way that the participants' response is conditioned. The potential manipulation of data with questionnaire and interview renders the data thus collected less reliable. This explains why the questionnaire itself has to be included in the appendix for future inspection if necessary.

2.4 Data-processing

The data you collect by using any or all of the techniques mentioned above can be called "raw data", that is, they are not properly processed. Data-processing can be made in various ways. Take audio and video recorded data for example. You probably have to transcribe them orthographically, or phonetically, otherwise they would be less accessible. For questionnaire data, you probably have to put them into a database (e.g. by using Access) for the retrieval of the specific information you want. The data having been properly processed become processed data.

2.5 Utility

Finally data can be classified in terms of the utility they are put into. For example, teaching journals can be used as reflective data, that is, they are used for the teachers themselves to monitor and improve their performance. For another instance, students' learning journals can be used as feedback data for teachers to monitor the effectiveness of the teaching methods or to find out students' problems or difficulties.

3. Data collection and storage

In section 2 above, data collection was touched upon already. In this section I would like to outline some general principles concerning data collection. They are (1) the authenticity principle; (2) the naturalness principle; (3) the maximum background information principle; and (4) the minimum manipulation principle.

The authenticity principle
This principle commands that the data, being collected by whatever means, must be authentic. In other words, if the data are claimed to be audio-tape transcripts, they must be genuine transcripts with the original audio-tapes readily available for cross-checking. If the data are claimed to be students' learning journals, they must not be actually written by the teacher himself or herself.

The naturalness principle
This is a principle concerning preference. If circumstances allow for collecting naturally-occurring data, efforts should be made to do so. In other words, if you can afford the way to collect the data in the most natural way, you should do so.

The maximum background information principle
This principle commands that you must do your best to keep, as much as possible, the background information concerning your data, e.g. the information concerning

Who are involved? What are they?
What is going on?
When does the activity take place?
Where does the activity take place?
How does the activity take place?

Also required is the background information concerning the way data are being collected, e.g. by whom, with what facilities, etc.

The minimum manipulation principle
This principle is related to the naturalness principle. If you cannot observe the naturalness principle, you should maintain the minimum manipulation principle. In other words, you must make sure that efforts are being made to reduce the manipulation of data to the best you can.

So much for the principles of data collection. Now let us turn to data storage. It is absolutely unacceptable to say that the original data are lost or are not reserved when they are required for inspection, except that they are destroyed by humanly uncontrollable forces. Any invalid claim of the loss of the original data automatically results in the failure of the dissertation.

Data can be stored in various media, e.g. in paper, audio and video cassette. Ideally they are stored in electronic media with sufficient backup copies.

4. Data processing

Data processing can be a long painstaking process. It may involve database skills and sophisticated tagging systems. Surely we cannot expect our B.A. students to have reached this level of competence. However, we must help them develop some sense of data processing. For instance, after the questionnaires are returned from the informants, it becomes obvious that they have to be properly processed to retrieve the information from them. Nowadays there are several popular database softwares commercially available. For instance Access is part of Microsoft Office which is easy to learn and use.

5. Data interpretation and utilization

Data interpretation refers to the way data are being understood by researchers. For instance, here is a student's learning journal.

Top Girls 8/8

It is about two months since the term began. In this new term we have a new lesson, that is Active English. I like this course very much not only it's all new for me, but also it really makes me more active than before.

As a group leader of Top Girls. I must work hard and to be a good leader. Sometimes I must do something first, for instance. When we were having a discussion. I should speak first. When we were having performance. I should play first. When we were having a vocabulary game, I should say first and so on. It also improve my ability of organization, and my courage. In another way of saying, it is very good for me to be a group leader. Fortunately, my group members are all listen to me, and we are united, too. I named our group Top Girls, and all of us went to be the top, so we work hard and everyone did a good job.

The books are new for me and the way of learning is new too. I like this form, but I think it is so much to preview especially the Comprehensive English. There are so much new words. I always put all the afternoon only preview one unit. It made me disappointed that I think my English is poor. The other two books one not difficult. The articles are close to our life. It is very good. I will try my best to learn English well. I am full of confidence, because I believe that "Where there is a will, there is a way." Please believe me and believe our group Top Girls.


How are you going to make sense out of this journal? Well, you cannot answer this question meaningfully without mentioning the purpose of using this data. What do you want to use this data for? Do you want to find out --

1) how good this student is with her grammar?
2) how good this student is with her writing?
3) how your students respond to the new textbook?
4) how your students respond to your new teaching method?
5) how students actually feel with their study?
6) what are the problems your students experience at the beginning of the term?

For the first purpose, the journal is quite informative and revealing. You can easily pinpoint the grammatical errors and even calculate the percentage of errors per hundred words.

For the second purpose, the journal is again quite informative and revealing. But you will not make a hasty judgment about her writing ability on the basis of only one journal of one kind.

For the third purpose, the journal presents only one possible response. To what extent it is representative we do not know, if we are only given one journal to evaluate. So as it is, the data are far from being adequate for the purpose.

For the fourth purpose, like for the third purpose, the journal by itself is far from being adequate. More data are required.

For the fifth purpose, the journal is quite revealing, because people's feelings are changing, and one report is informative enough for that particular moment.

For the sixth and last purpose, the journal only provides one case, and you need more case reports before you can say anything positive.

What I hope to demonstrate with the student's journal data is that data interpretation and utilization are not a straightforward business. Once the data are collected, the work is half done. This may be true. But we have to be very careful with the way we do with the data. We must be on guard against the danger of claiming more than the data can actually support.

Questions for you to reflect upon

1. In what ways do the types of data affect the qualities of research?
2. What are the advantages and weaknesses audio-taped data have over the interview minutes and questionnaires?
3. Why do you think that the author first talks about the authenticity principle of data collection?
4. Can one type of data be used for various purposes? Does the force of the data remain the same?