Wednesday, January 23, 2019
In most online learning, instructors face challenges in achieving similar levels of effectiveness and retention to their on-campus offerings. With so many students to account for an the inability to meet in person, it’s important to find ways to supplement the interaction between teacher and student. As an instructor for a course in Georgia Tech’s Online Master of Science in Computer Science program, School of Interactive Computing Professor Ashok Goel introduced the world to Jill Watson, a virtual teaching assistant who was so good in her first semester on the job that even students thought she was human. Can AIs like Jill really improve course effectiveness and satisfaction? Will they be used to augment the production of the human assistant, not replace it? And can this method, which has proven successful in an academic setting, be used as a foundation upon which other sectors of the workforce can build?
Transcript:
Ayanna Howard: As an instructor for a course in Georgia Tech's Online Master of Science in Computer Science program, Professor Ashok Goel in the School of Interactive Computing came across a problem. In online learning, instructors face challenges in achieving similar levels of effectiveness to their on-campus offerings and as a result see lower levels of student engagement. With so many students to account for and the inability to meet in person on a regular basis, it was important for Ashok to find a way to enhance the interaction between a teacher and their students. Enter Jill Watson, a virtual teaching assistant who was so good in her first semester on the job in 2016 that even the students thought she was human. Today, we'll chat with Dr. Goel and learn about Jill and whether she offers a glimpse into the future of higher learning.
Can AIs like a virtual teaching assistant really improve the effectiveness of an instructor and the satisfaction and engagement of students? Will these virtual agents actually be used to augment us as human assistants and not replace us? And can this method, which has proven successful in an academic setting such as Georgia Tech be used as a foundation upon which other sectors of the workforce can be built?
(Instrumental)
I'm Ayanna Howard, Chair of Georgia Tech's School of Interactive Computing, and this is the Interaction Hour.
(Instrument ends)
Ashok has served as a professor at Georgia Tech since 1989, where he pursues research in human-centered computing, artificial intelligence and cognitive science. He's the editor in chief of AAAI's AI Magazine, co-chair of the 41st annual meeting of the Cognitive Science Society, which will be held later this year, and serves on a number of boards and steering committees dedicated to building the future of computing research and leadership. Thanks for joining us Ashok.
Ashok Goel: Thanks for having me, Ayanna.
Ayanna: So, let's start first on some background on this Jill Watson. I remember in the media, it was quite a shock and it was interesting. Why did you do this? What triggered this idea of having this virtual teaching assistant, and how were you able to achieve such success?
Ashok: It's an interesting story, Ayanna. I was teaching this class with about 400-500 students, online students, and they were raising something like 10,000 or more messages every semester. That's like about 100 emails each day for the next 100 days, and you know what that means. Hours and hours of work. And we did not have the teaching staff to be able to do all of that. And if we were to do that, we would be just answering questions and not doing anything more creative, anything more interesting. So, that's why we thought it might be useful to build an AI assistant that would automatically answer some fraction of the questions. Now, Jill Watson can answer about one-third of all questions that deal with assessments. About 80 percent of the questions that students ask have to do with projects and exams or assignments and assessments, and Jill Watson can answer about one-third of those questions automatically, which is a lot of offload of teaching responsibility to Jill.
Ayanna: This is great. I mean, first of all, 10,000 messages -- I mean, just putting "N-O" 10,000 times just seems like outstanding. So, given that, it manages the workload, and you can respond intelligently to questions that the students really need help on. But what do you consider your biggest achievement of this?
Ashok: So, the biggest questions that we have now that we did not begin with, frankly -- initially, we were just trying to reduce the teaching load -- now, the interesting issue is whether it helps improve student performance, engagement, you were talking about engagement earlier. And what we are finding out is -- and right now this is a level of correlations, not causations, so I cannot claim that Jill Watson alone is causing all of that -- but the entire set of technologies that we have introduced, there is a correlation that the student engagement in the class is very high. Students' self-regulated learning is very high. We have empirically measured these things. The student performance in the online class is comparable on all assessments to the residential class. So, there seems to be some benefit, and that's really the interesting part of Jill Watson.
The other part that I think is very interesting is that there was a human-interest element to it, which is, I think, why the story got attention at the beginning. The fact that you mention the students could not find out that this was an AI at the beginning, that was interesting in its own right and a lot of fun.
Ayanna: So, I wonder, does that say something about us as humans?
Ashok: Yeah. One thing it says, I think, is -- it is relatively easy to build AI agents that give the sense that they are human-like without doing a lot work to build those AI agents. Usually, we imagine you to do a lot of work to build a powerful AI agent that will have human-level behavior. It turned out to be relatively simple to do this. The technological perspective, the technology is something that took only about 1,500 person hours. It's not something that required 10s of thousands of person hours, a large team. We were able to do it relatively quickly. But what was interesting was the way we introduced it and the student reaction to it, which then led to a new stream of research.
Ayanna: So, is that primarily what you think is different than, say, what's out there now or even back then?
Ashok: There is one other thing that's different. What is out there right now, like chatbots for example, they can answer very broad set of questions. But what they know about is often inaccurate. Jill Watson is almost always accurate. In fact, she has not made a mistake over the past year. Not a single mistake. So, that's what is different. On the other hand, her knowledge is very specialized. She can answer questions only about this particular class at a current time. So, very specialized knowledge but very high accuracy and precision.
Ayanna: Ok, so you had said not a single mistake. So, there has been no failures?
Ashok: Oh, no, there have been big failures and big challenges.
Ayanna: So, what was the biggest one? And I won't tell anyone.
Ashok: (Laughing) Oh, no, you're welcome to share. One of the things we found was, and we were appalled by it when we first saw it, that Jill exhibited a bit of gender bias, and the gender bias came about like this. So, a male student asked -- I should mention that Jill can now not only answer questions, she can also answer student introductions. So, we ask students to introduce themselves and Jill automatically answers introductions. She also automatically posts announcements. So, one male student said that, 'You know, my wife is expecting a baby, and I may need to take a couple of weeks off during the semester and my performance may diminish.' And Jill's answer was something like, 'Welcome to the class, we think you're going to enjoy it, and we look forward to the new addition to your family.' The next semester, a female student said, 'I'm pregnant and I'm expecting a baby, and I might miss 4-6 weeks of the classes.' And Jill said, 'Welcome to the class,' and didn't say anything about the welcome the addition to your family. So, we talked about why didn't Jill say that to the female student? The reason is because the demographic distribution in the class is historically skewed. There are about 85 percent men and about only 15 percent women, therefore she had never come across that kind of notion about pregnancy, question about pregnancy earlier and didn't know how to respond to it.
Ayanna: Interesting.
Ashok: But there had been earlier incidents of some other male student saying his wife was expecting and some other human TA answering in a particular way, and she was able to replicate that answer. So, then, of course, if there is gender-related bias there is the possibility of some other kinds of biases because the demographic distribution is skewed in many different dimensions. So, that was an example where there was clearly something that failed.
Now, of course, we have cleaned it up, so that kind of problem doesn't occur. But something to keep in mind as we build AI agents. As we may want to build them to be completely unbiased and fair and ethical, as I'm sure AI researchers always will do, based on the data that they are processing, those algorithms nevertheless may exhibit a behavior that we might find from a social perspective undesirable or unfair.
Ayanna: Because they're learning from skewed data.
Ashok: Yes.
Ayanna: This is a challenge, I see, then.
Ashok: It is a real challenge. It is also a challenge in a different sense. So, we have been trying to take Jill Watson to other classes. In fact, we have introduced a new version of Jill Watson called Noelle King into a class Introduction to Programming, a class which has about 60,000 students. So, Noelle King is just doing student introduction responses right now. She has just begun to answer questions. We don't have a lot of data, but she has been answering student introductions for awhile now. The challenge is that each class turns out to be a little different. The kinds of questions a student asks in Introduction to Programming are not the same kind of questions the students ask in graduate-level Introduction to AI class. So, you know, we have been actually very fortunate. We have been inundated -- I think since Jill Watson came into the media, I have received at least 200-250 requests for building a Jill Watson for this class or that class across the world. The reason we have not yet delivered on it is because we need really a taxonomy of questions. That taxonomy varies so much from middle school to high school, from biology to history to algebra to whatever else the topic might be.
Ayanna: So this means -- so, one of the questions might be, well, Jill Watson was such a success, why not apply it to every single class anywhere? And you're basically saying it's the data that's available.
Ashok: It's the data that's available. Another problem is that if you go to middle school -- and we have been talking to middle school teachers -- they do not have a record of all the questions somebody asked and all the answers that were given. So, the data is missing.
So, now we are trying to build a new version of Jill Watson that works on data that we think is available for most any class, including yours, I expect. I'm sure that in your class you'd give your students a syllabus. And almost every student of every class in the world prepares some kind of initial description or syllabus and gives it to the students. And so now a new version of Jill Watson is answering questions based on your syllabus.
Ayanna: Interesting. 'When is the midterm exam?'
Ashok: Right, when is the midterm exam. Of course, we know students don't always read the syllabus, so I think of this as an interactive syllabus where some AI agent has enough understanding of the syllabus it can answer questions automatically and that we think is now probably going to be transferable to other classes more easily because the issue of data is addressed at least to some degree.
Ayanna: So, how much data do you think you'd need? So, if I was to, say, teach a new class and I had it for a semester. How much interaction do you think I would need in terms of questions and answers?
Ashok: Yeah, what we have found is that in the AI class where we initially built Jill Watson, as I mentioned, there were about 10,000 messages. There were 120 different types of questions that were being asked about performance. We found that for every question type, if the number of question instances is in double digits, 10 or more, then Jill's performance comes to point where it becomes acceptable. You can think, so if it's a middle school, it will depend upon how many different questions students ask and for each question type you need at least 10 or more.
Ayanna: And so do you think there's enough of a common theme across courses that you can at least come up with a repository of basic syllabus and things like that that we can use?
Ashok: I think so. So, from the perspective syllabus, the research that we have been doing for the past one and a half years on this new version of Jill Watson seems to suggest there's enough commonality. The reason is that when you write a syllabus and I write a syllabus, we -- at least I -- often write it off-the-cuff. I don't put a lot, days of work into building a syllabus. But now that we know how to build a Jill Watson that can answer questions based on a syllabus, we can now give you guidelines of how to write your syllabus so that Jill can understand them.
So, it's the other direction also. In some sense you can go from data to AI technology to behavior, but you can also go from AI technology to data and say I need this kind of data. So, now if we can take these guidelines of syllabus building to everyone and whoever wants to follow those guidelines, we will provide a Jill Watson that can answer questions based on that syllabus.
Ayanna: And you make instructors actually better themselves.
Ashok: (Laughing) Yes, hopefully.
Ayanna: So, speaking of making instructors better. One of the fears that comes out related to this is about automation. In our last episode of the Interaction Hour, we spoke to Mark Riedl about the future of automation and how it relates to human workers. What we discovered was that many individuals hold this fear that AI is coming for their jobs and that we as people might be defunct in a few years, depending on the job. So, there's this fear that there will be no place for humans in the future workforce. Now, we have Jill Watson dealing with TAs, teaching assistants, and maybe even possibly faculty. Is there a potential that Jill Watson might make us as teachers obsolete?
Ashok: That's a great questions. A journalist once asked me, 'Professor, are you committing professional suicide?' That is the same question you're asking -- you asked in a very nice manner. The journalist was more crude. That's a great question. I've thought a little bit about it. There's no doubt that when there is a new technology some jobs are lost and some jobs are created. It's inevitable that as AI comes along, automation comes along, some jobs will be lost even as others are created. But I'm not as concerned about it as I think some other people are, and here's why.
Consider Jill Watson -- the difficulty is not that there are too many teachers and Jill Watson will take somebody's job away. The difficulty is that we don't have enough teachers and the teachers that we do have, they're often doing such mundane things that they don't get a chance to do the kinds of things that they really want to do. Now, there might be some areas in which there is an overabundance of humans available. Teaching and education is not one of them. Right now, across the world there are a large number of people who do not have access to any education at all, or very poor education. So, I think it's not so much a question about Jill Watson and things like this taking jobs away as much as a question of making human jobs better and more effective.
Ayanna: So, in this case AI basically makes education accessible to a larger audience.
Ashok: I hope so, yes. Accessible and affordable and achievable. There is another aspect of it. Accessibility, affordability, achievability in the sense that even when there are humans who have access to affordable education, they don't do necessarily very well on it because doing well, achieving good proficiency requires some help and they don't have access to that help.
Ayanna: So, we have three As. Affordability, achievability and accessibility. I like that. So, the three As of Jill Watson. What's next? We have the three As of Jill Watson, and you mentioned looking at other courses and the taxonomy of information. What is the future of Jill Watson and other TA types of technologies?
Ashok: In our lab, we are building now a suite of five technologies, all connected to AI in education. And this suite of five technologies put together, we have also entered them into the XPrize AI competition. Very briefly, if you can have AI agents that can answer questions, then you can also have AI agents that can ask questions. So, now we have an AI agent that we call Errol, which asks questions. If you, for example, build a model of some kind -- the assignment is to build a model of something -- and you build a model, then Errol will help the teacher do some formative assessment by asking questions of the students. Another one that we have is an virtual librarian. Imagine an AI agent that on your request, on your demand finds the five best books or articles that you should read on a topic of your choice. So, this is different from Google search in the sense that Google will give you 5 million hits and then you have to worry about which one you really want. This will give you five or some small number of really selective, really relevant hits. Another is a virtual lab. So, the fourth is a virtual lab. In physics, in biology, in chemistry, in other sciences, you do experiments but it's hard to get access to labs. So, we have built a virtual lab so that you can generate hypotheses and run experiments in simulated worlds and then revise the hypotheses. And the fifth one is a good old-fashioned intelligent Turing system.
This suite of five AI technologies right now -- each of them is now working, is operational. We are just in a beta release of the virtual lab, but put together I think that kind of suite of technologies is where I think the field is gradually moving.
Ayanna: And each one will have a personal agent, I guess, helping?
Ashok: It will have a personal agent.
Ayanna: So, here I'm very curious. You mentioned three agents by name. Noelle and Jill, which are both -- you mentioned 'she,' so I'm assuming female.
Ashok: Yeah.
Ayanna: And you mentioned Errol, which I assume is a 'he.'
Ashok: Yeah, it's a male name.
Ayanna: So, was there any preference for, you know, sample right now is three. Is there a preference for 'she' versus 'he' in virtual assistants?
Ashok: Yeah, that's a great question. We have done experiments with very large variety of names. Both male and female names, as well as names that are sort of anglical sounding versus ethnic sounding, so various combinations of that. When we introduced Jill Watson we don't call her Jill Watson. We give her some other name. And, in fact, the entire teaching staff has some automated AI assistants and students at the end of the class, we do a poll to find out whether they can figure out which one is an AI and which one is a human being. And I can tell you that they can't always figure it out even now.
So, we have done experiments with a variety of names, and we were also concerned about the kind of question that you are asking. We have not yet found any significant difference. That might be because our sample size is too small or our instruments are too blunt. I don't know that yet. But we have not yet found any different there.
Ayanna: So, with these five new technologies coming out and focusing on education, where does the human -- so, I know you talked about virtual library and virtual lab and teaching, I mean that's an occupation that will probably not be eradicated by these agents.
Ashok: Not at all.
Ayanna: But does our role then change as educators or teachers?
Ashok: I think so. It changes in very fundamental ways. As I see, the kinds of things that you and I will be doing five or 10 years from now -- we will probably become much more creative. So, the demand will be for very creative kinds of teaching, which is what you and I will do, and many of the mundane, routine things in which you and I spend so much of our time -- in fact, waste our time -- will probably be handed off to AI agents. You know, there is a book called Diamond Age by Neal Stephenson in which the protagonist is, I think, a 9- or 10-year-old girl called Nell, who is born on the wrong side of the tracks. But she has access to this young lady's illustrated primer, which is really an AI storytelling book. And she learns from that book on her own. She was born on the wrong side of the tracks, but she's smart and sharp and so on. So, by the age of 15, she emerges as a young sophisticated lady who joins the elite of the society.
So, I think that we are heading toward that area, where our role that you and I play as teachers will become much more of that generation of that young lady's illustrated primer rather than all the mundane things of grading, for example, that you and I spend so much time on.
Ayanna: Oh, grading. I remember that.
Ashok: (Laughing)
Ayanna: So, where else -- so Jill Watson and tutoring assistant. I'm sure there's other domains besides education that this type of technology could be useful. Any thoughts on that?
Ashok: Yes, we have been talking with CHOA, the Children's Healthcare of Atlanta hospital. They are very interested -- the kinds of situations that we are thinking about here, not just with CHOA but healthcare in general -- is where a parent with two young children at 2 a.m. in the morning wants to talk to a nurse hotline. There's a shortage of nurses in the country. Where do you get assistance from? If there's an automated assistant who can give you reliable information -- and that's the key, because when you are calling a nurse hotline, you do not want unreliable information. That's where Jill has a promise. As I mentioned, Jill is highly selective in the kinds of questions she can answer, but when she does answer questions, her accuracy rate is very, very high. ... We're talking to some people in the finance industry. They are very interested in a similar technology for answering clients' questions. Some banks are very interested, including the Federal Reserve Bank, in answering similar questions of clients.
Ayanna: So, what happens when Jill doesn't know?
Ashok: She doesn't answer. Here is the interesting part of Jill Watson, another interesting part. She calculates her own confidence in her own answers. She answers only if her confidence is at least 97 percent or higher. So, that's why her accuracy rate is so high. She keeps quiet and some other human TA has to then answer it.
Ayanna: Can we teach that to humans?
Ashok: (Laughing) Ninety-seven percent, I love that.
Ayanna: I would. So, finance, banking, healthcare and education. These are the areas and domains that we can potentially bring AI and assistants and this virtual knowledge-based information, which I think is wonderful. I want to ask you one last question then. Where does the responsibility fall? I.e. Are there areas that we need to be really concerned about? You mentioned one, which is bias. Are there areas we need to be concerned about as we push forward this technology.
Ashok: There are a number of areas in which we need to be very concerned about, in addition to the question of bias and unfairness. Another area is that as we introduce Jill Watson-like agents into micro-societies like a class, does the interaction of humans with each of them change in fundamental ways? So, if there was an AI sitting next to you and me right now, would our interaction change? We really don't have an answer to that question. I don't know that we've ever seriously asked that question. I would expect it changes human interaction in some ways, in subtle ways. But I think one of the nice things about the Jill Watson project is it allows us to do those kinds of experiments. Now we can make observations, because we can look at the discourse that happened before we introduced Jill Watson and that happened after we introduced Jill Watson and see if there is any contrasting factors.
I would expect that there are. I don't know what they are, but I think it's concerning. I want to know what exactly the effect of AI will be on human society before we introduce AI into human society, and we don't know that yet.
Ayanna: This is great. So, concluding thoughts are it's a great technology, but we really need to do better at understanding the effect on our own interactions between humans-humans, and its effect. But even with that, there's positive benefits I see, especially in education, which you've shown, but also in these other domains in healthcare and finance.
(Instrumental)
I really appreciate this conversation. I appreciate Ashok for joining the podcast today to discuss, among other things, Jill Watson and AI and virtual agents and even a little bit about this bias with respect to the data. I appreciate the direction that you have in this domain and the expertise you've shared with our audience.
As always, be sure to check out ic.gatech.edu for updates and feature content on our school, and follow us @ICatGT. Thanks for listening.
(Instrumental fades out)