1. Introductory remarks
As I remarked in Unit 2, data-driven research is our preferred
mode of research for our B.A. students. Data collection and
interpretation therefore plays a crucial role in their projects,
hence this lecture.
According to Webster's Unabridged Dictionary, the
word data is a plural of datum, which is originally
a Latin noun meaning "something given." Today, it
is used in English both as a plural noun meaning "facts
or pieces of information" (e.g. These data are described
more fully elsewhere), and as a singular mass noun meaning
"information": Not much data is available on
flood control in Brazil. It is almost always treated as
a plural in scientific and academic writing. In other types
of writing it is either singular or plural. The singular datum
meaning "a piece of information" is now rare in
all types of writing. In surveying and civil engineering,
where datum has specialized senses, the plural form is datums.
For our purpose, there is an extra dimension on the definition
of data, namely facts or pieces of information employed to
support, or demonstrate, or instantiate a point, an argument,
an abstract category or concept, etc. Below is an extract
from a research paper. Read it and comment on the use of data
(Note that it does not matter if you do not understand what
it is being dealt with, as long as you can tell the data from
the analytic text.)
|Time, space, workplace facilities, functions and goals
of the workplace, routinized workplace practice, are the
main non-personal constraining/enabling factors on the
staff members' performance of workplace discourse. There
are, on the other hand, personal constraining/enabling
factors: members vary, e.g. in their ideological orientation
towards work, towards workplace practice, and in their
ability in making use of the discourse resources, and
in their sociability and communicative ability. Thanks
to these diversified constraining/enabling factors, the
actual talks recorded from the actual workplaces are quite
mosaic, resisting the classification into hard and fast
ready-made categories of discourse (see also 6.7 below).
Consider this dialogue:
||某某回来啦 (the so-and-so is back)
||某某哇 (the so-and-so)
||某某很年轻的 (the so-and-so's very young)
||可能比你小吧 (younger than you perhaps)
||什么样子 (what does she look like)
||好象以前是他的学生 (seems to be his former student)
||好可惜呀 (what a pity) 应该要早点过来念书 (should have come to study
here earlier) <start giggling>这样就有机会 (that will
give an opportunity) 嘿嘿… (giggling)
I am sure that you can easily identify which part of the
text is data. The function of the data is to demonstrate the
author's point that workplace discourse is mosaic, resisting
the classification into hard and fast ready-made categories
of discourse. With this intuitive sense of what data is, let
us examine the ensuing issues:
Data can be classified in a variety of ways: (a) source of origin;
(2) manner of collection; (3) manipulation; (4) processing;
1) types of data
2) data collection and storage
3) data processing
4) data interpretation and utilization
2. Types of data
2.1 Source of origin
If the data come from the recall of memory, or reflection,
such data are called retrospective data. For instance, in
the following dialogue --
||John, is it idiomatic in English to say
"make bed and lie on it"?
||Erm … not quite. Er… we say "make your
bed and lie in it".
John's reply "make your bed and lie in it" is an
instance of retrospective data (in this case it is also called
Data which are not based on memory or reflection are non-retrospective
data. For instance,
(on the spot)
||Sir, did you see the two cars crashing into
||Yes, I saw it with my very eyes. The two
cars hit each other headlong.
The witness's reply "The two cars hit each other headlong"
is non-retrospective data, and can be called eye-witness data.
2.2 Manner of collection
Data can be classified on the basis of the ways they are collected.
There are (a) note-making, (b) interviewing, (c) journal-keeping,
(d) questionnaire, (e) audio-taping, and (f) video-taping.
You go and sit in one of your students' class. While observing
the class, you note down what you observe. The notes you make
are known as field notes.
Suppose that you want to find out why some students keep
committing the same mistakes again. You can do so by interviewing
them. The details of the interviews, for example, the minutes
you take while interviewing the interviewee, become interview
Questionnaire as a way of data collection is familiar with
most of us. Its advantage is in its scale, that is, questionnaires
can be sent to a large population which would otherwise be
impossible to access to.
Suppose that you want to find out how students feel with
your newly designed activities. You can ask them to keep their
learning journals (i.e. keep diaries in Chinese) for you.
You can also keep teaching journals for yourself. They are
journal data (diary data in Chinese).
Suppose that you use audio-recorder and video-recorder to
record the activities. You collect audio and video data.
The differences among these kinds of data lie in the degrees
of detail, authenticity and reliability. The video data overtakes
all the other types in detail, authenticity, and reliability.
It loses to audio data with regard to naturalness, for participants
may start acting while facing the camera.
Field notes, interview minutes and journals only capture
the highlights or key points, leaving out a lot of detailed
information. They are also subject to observation errors,
and biases. So their reliability can be called into serious
question if they are not used with care. However, they have
an obvious advantage over audio and video data, namely they
are easier and less expensive to collect.
The manipulation of data refers to the way data collectors
deliberately influence the way participants perform the activity.
For example, a policeman was asked to collect eye-witness
data over a hit-and-run accident. It just so happened that
the hit-and-runner was the policeman's brother, and that the
policeman knew this. He bribed the eye-witness before he started
collecting his report. The data thus collected are not only
on longer eye-witness data, but also being manipulated.
In terms of manipulation, there are
(a) naturally occurring data, that is, no interference;
(b) experimental data, that is, participants are put in a
special circumstance for their performance;
(c) solicited data, e.g. participants' response is framed
by the questions put to them.
Data collected through questionnaires and interviews are
as a matter of fact solicited data. The questions can be unwittingly
or deliberately phrased in such a way that the participants'
response is conditioned. The potential manipulation of data
with questionnaire and interview renders the data thus collected
less reliable. This explains why the questionnaire itself
has to be included in the appendix for future inspection if
The data you collect by using any or all of the techniques
mentioned above can be called "raw data", that is,
they are not properly processed. Data-processing can be made
in various ways. Take audio and video recorded data for example.
You probably have to transcribe them orthographically, or
phonetically, otherwise they would be less accessible. For
questionnaire data, you probably have to put them into a database
(e.g. by using Access) for the retrieval of the specific information
you want. The data having been properly processed become processed
Finally data can be classified in terms of the utility they
are put into. For example, teaching journals can be used as
reflective data, that is, they are used for the teachers themselves
to monitor and improve their performance. For another instance,
students' learning journals can be used as feedback data for
teachers to monitor the effectiveness of the teaching methods
or to find out students' problems or difficulties.
3. Data collection and storage
In section 2 above, data collection was touched upon already.
In this section I would like to outline some general principles
concerning data collection. They are (1) the authenticity
principle; (2) the naturalness principle; (3) the maximum
background information principle; and (4) the minimum manipulation
The authenticity principle
This principle commands that the data, being collected by
whatever means, must be authentic. In other words, if the
data are claimed to be audio-tape transcripts, they must be
genuine transcripts with the original audio-tapes readily
available for cross-checking. If the data are claimed to be
students' learning journals, they must not be actually written
by the teacher himself or herself.
The naturalness principle
This is a principle concerning preference. If circumstances
allow for collecting naturally-occurring data, efforts should
be made to do so. In other words, if you can afford the way
to collect the data in the most natural way, you should do
The maximum background information principle
This principle commands that you must do your best to keep,
as much as possible, the background information concerning
your data, e.g. the information concerning
| Who are involved? What are they?
What is going on?
When does the activity take place?
Where does the activity take place?
How does the activity take place?
Also required is the background information concerning the
way data are being collected, e.g. by whom, with what facilities,
The minimum manipulation principle
This principle is related to the naturalness principle. If
you cannot observe the naturalness principle, you should maintain
the minimum manipulation principle. In other words, you must
make sure that efforts are being made to reduce the manipulation
of data to the best you can.
So much for the principles of data collection. Now let us
turn to data storage. It is absolutely unacceptable to
say that the original data are lost or are not reserved when
they are required for inspection, except that they are destroyed
by humanly uncontrollable forces. Any invalid claim of the
loss of the original data automatically results in the failure
of the dissertation.
Data can be stored in various media, e.g. in paper, audio
and video cassette. Ideally they are stored in electronic
media with sufficient backup copies.
4. Data processing
Data processing can be a long painstaking process. It may
involve database skills and sophisticated tagging systems.
Surely we cannot expect our B.A. students to have reached this
level of competence. However, we must help them develop some
sense of data processing. For instance, after the questionnaires
are returned from the informants, it becomes obvious that
they have to be properly processed to retrieve the information
from them. Nowadays there are several popular database softwares
commercially available. For instance Access is part of Microsoft
Office which is easy to learn and use.
5. Data interpretation and utilization
Data interpretation refers to the way data are being understood
by researchers. For instance, here is a student's learning
| Top Girls 8/8
It is about two months since the term began. In this new
term we have a new lesson, that is Active English. I like
this course very much not only it's all new for me, but
also it really makes me more active than before.
As a group leader of Top Girls. I must work hard and to
be a good leader. Sometimes I must do something first,
for instance. When we were having a discussion. I should
speak first. When we were having performance. I should
play first. When we were having a vocabulary game, I should
say first and so on. It also improve my ability of organization,
and my courage. In another way of saying, it is very good
for me to be a group leader. Fortunately, my group members
are all listen to me, and we are united, too. I named
our group Top Girls, and all of us went to be the top,
so we work hard and everyone did a good job.
The books are new for me and the way of learning is new
too. I like this form, but I think it is so much to preview
especially the Comprehensive English. There are so much
new words. I always put all the afternoon only preview
one unit. It made me disappointed that I think my English
is poor. The other two books one not difficult. The articles
are close to our life. It is very good. I will try my
best to learn English well. I am full of confidence, because
I believe that "Where there is a will, there is a
way." Please believe me and believe our group Top
How are you going to make sense out of this journal? Well,
you cannot answer this question meaningfully without mentioning
the purpose of using this data. What do you want to use this
data for? Do you want to find out --
1) how good this student is with her grammar?
2) how good this student is with her writing?
3) how your students respond to the new textbook?
4) how your students respond to your new teaching method?
5) how students actually feel with their study?
6) what are the problems your students experience at the beginning
of the term?
For the first purpose, the journal is quite informative and
revealing. You can easily pinpoint the grammatical errors
and even calculate the percentage of errors per hundred words.
For the second purpose, the journal is again quite informative
and revealing. But you will not make a hasty judgment about
her writing ability on the basis of only one journal of one
For the third purpose, the journal presents only one possible
response. To what extent it is representative we do not know,
if we are only given one journal to evaluate. So as it is,
the data are far from being adequate for the purpose.
For the fourth purpose, like for the third purpose, the journal
by itself is far from being adequate. More data are required.
For the fifth purpose, the journal is quite revealing, because
people's feelings are changing, and one report is informative
enough for that particular moment.
For the sixth and last purpose, the journal only provides
one case, and you need more case reports before you can say
What I hope to demonstrate with the student's journal data
is that data interpretation and utilization are not a straightforward
business. Once the data are collected, the work is half done.
This may be true. But we have to be very careful with the
way we do with the data. We must be on guard against the danger
of claiming more than the data can actually support.
Questions for you to reflect upon
1. In what ways do the types of data affect the qualities
2. What are the advantages and weaknesses audio-taped data
have over the interview minutes and questionnaires?
3. Why do you think that the author first talks about the
authenticity principle of data collection?
4. Can one type of data be used for various purposes? Does
the force of the data remain the same?