Anne: Welcome to the Voices in Bioethics Podcast. Today I’m interviewing Michael Scroggins, a lecturer at UCLA. He’s at the Institute for Society and Genetics, and he is a lecturer on data science, pandemics, and generally disruptive innovation. Welcome, Michael.
Michael: It’s nice to be here. Thank you for having me.
Anne: So I was going to start by talking about knowledge today. Nicholas Carr describes the change: “Now knowledge is conceived as something we swim through and consume, but it used to be an ideal.” Many see knowledge as information that is contextualized. But in what ways do you see knowledge changing? And will chatGPT effects knowledge and what it means to know?
Michael: Yeah, that’s a really complicated question, right? I mean, the history of knowledge is this philosophical concept that has a long history from the very, very start of philosophy—which is the seeking of truth as an ideal. I think one thing to keep in mind is that there’s a difference between knowledge—in my estimation, knowledge is a very… it’s something humans arrive at. Now whether a machine can arrive that knowledge is a, is a, is an interesting question, but I’ll leave that for a proper philosopher and I won’t wade in there. But I think one thing we need to understand about chatGPT is that it’s really a calculating machine that makes statistical predictions from large data sets and it traffics in information and data. Data is an interesting term that comes about in the late 19th century and early 20th century, that really means the process of making numerical data fit, putting it in order so it can be calculated. Information is a term that comes into being with broadcast media, and it means something like transmission across a channel, right? So these are very technical, machine-centric terms, as opposed to knowledge. I think Carr is right: it is something we swim through but it used to be an ideal. But to kind of push back maybe a little bit about the premise of this question, I do think it’s worth holding knowledge apart still as an ideal, or something that humans achieve with other humans. And they may use machines as tools in this process, but it is very much a part of human achievement. One thing I think we really need to kind of clarify before, when we’re talking about these, um, what are really calculating information systems, is that data is one term, and it really originally—now it’s kind of a catch all garbage truck of a term—but it didn’t originally mean to put numerical data in order for calculation. There is a term that has fallen into disuse that I think actually gives a better sense of what the systems do, and that is capta, which means that which is captured or taken for calculation. And when we’re in the world of, sort of, data emitting devices, as we all are—this Zoom call, our phones, you know, the whole, everybody knows the rigmarole of digital devices we carry around—we are really in that world of what is taken from us and captured from us, rather than what is arranged for calculation, this sort of, this passive objective sense.
Anne: Then along those lines, where we don’t see the computer itself as having knowledge, knowledge is particularly human, technology really just organizes data and kind of delivers this information. Is the view that technology is smart, inaccurate? And is there any issue with people referring to it as smart technology, smartphones? Is there something harmful behind that?
Michael: I mean, on one hand, I understand it as a sort of a practical expression, right? Technology can be smart when compared to other technology. But I mean, I think we should really guard against imputing sort of human ideals and emotions to technology. It’s not smart, like a clever dog might be smart, or like, like a human being can be smart, or a crow might be smart, right? It’s not the same thing. It’s still, it’s a calculating machine. It’s, I really think we should kind of try to push digital technology back into the box a little bit by not falling victim to the hype. I mean, if there’s anything I would say about chatGPT in most digital—and, you know, the promises that are being made and the hype around AI today—it is, I mean, to quote the great Chuck D, you know, “Don’t believe the hype.” The claims are overstated, right. And I think this becomes really obvious when—let me back up here. The most hype, the most intense hidden hype around AI right now, is about economic productivity and efficiency, how it’s going to increase efficiency in healthcare, how it’s going to increase efficiency in manufacturing, and all of these other places where the efficiencies will touch. But actually, if you take a step back and look at sort of the broad scope of economic productivity since the late 70s, what you see in the global north is a steady decline in the rate of productivity, increasing. It has been steadily declining now for 50 years. You know, the computer revolution came and went, and it didn’t really change that. The Internet came in is still a month and went, and it didn’t really change that. And I don’t really think there’s much reason to think that AI is going to reverse that trend either. So I think, really, when you’re in the world of these new technologies, you really have to guard against believing the hype too much, because it does have the effect of bringing out our most utopian and most dystopian thoughts about technology, when, as a practical matter, probably what will happen is you’ll, you’ll run AI every time you engage in some customer service or maybe minor medical diagnosis or something like that. It will definitely be used, but it’s not going to revolutionize efficiency across the range of industries like the claims say.
Anne: Do you think in a way particular to, to science and medicine? Do you think it might change research significantly, or even change researchers, because there is so much information at their fingertips?
Michael: Yes, actually, I do. In some ways that will be important. I think one of those ways—I worked for several years, I was a postdoc at UCLA and I worked with astronomers for several years, and when I started machine learning, AI and astronomy wasn’t really very important. But then in the two or three years I was there, AI became very important, and machine learning became very important in a handful of new sort of subfields. One of them was the discovery of exoplanets and so forth. Mainly because the new class of telescopes is producing so much data that it’s not really human size data, you can’t wrap—it’s like a firehose, you really can’t wrap your head around all the data that’s coming through. Something about big data—when we talk about data and big data, what we’re really talking about is volume—that’s size, big—but also velocity, right. So volume and velocity. Veracity is not usually one of the things we’re worried about so much, but volume and velocity are the two key components here. And so machine learning has become important in something like astronomy to look for exoplanets, but also, you know, in health and medicine. I think one of the earliest places you’ll feel it is drug discovery. It will probably change drug discovery a lot. And will it change how research is done? In some ways, yes. I mean, it’s already something that’s been felt in the sciences for two or three decades now, ever since large databases came into existence, which is that you do need specialized coding and statistical expertise in any large data science that’s involved in, that relies on large amounts of data. So I think there’ll be more call for data scientists and coding skills, and other techniques will be developed to kind of trawl through these big trenches of data.
Anne: And do you think in that medical arena, that there’s a little less concern with cause now, because there is just this huge, huge body of data, so you can look at correlations and you might be able to predict and find that new drug discovery? Should we still be concerned with why the drug works? Or should we just be doing this matching?
Michael: Well, I mean, this is really kind of like a, this is an interesting, interesting, conflict between sort of science as it’s traditionally been practiced and engineering as it’s traditionally been practiced, right. I mean, one of the great hallmarks of good engineering is that the bridge stands up, right. And as sort of the empirical evidence that the bridge stands is a hallmark of good engineering. So I mean, this is an ongoing debate, whether causation, whether that’s important—or what, like, fundamental, how much fundamental science is important and how it should be funded. I think, ideally, you would have it working from both directions, right? You would have both these, sort of, trawling through these big data sets, looking for correlations that are interesting, and then you will try to find out the fundamental reasons why they are correlated or not. But probably what’s going to happen—and it’s just, I mean, the thing about science or engineering or technology in general, is that there is no such thing as unlimited time; times arrow points one way and time is limited, and there isn’t always, you know, the demand for speed is an accompaniment of technological intensification, and the demand for speed often precludes this kind of slow, slow work that uncovers some fundamental things—so I do think probably what will happen is they’ll be you’ll see some of the more fundamental work fall by the wayside as these new techniques come to exercise their power and some results from them, which I think is potentially very dangerous, right? I mean, if you think about prediction: prediction is interesting, because this was really the promise of the Human Genome Project, right—that it was going to disrupt medicine by changing from the “diagnose and treat” paradigm to a new “predict and prevent” paradigm. I mean, that has yet to happen. Will it happen because of machine learning and AI and these large language models? I don’t really think so. Right? And, you know, if you look back at the history of these genome-wide association studies, a lot of the early, most of the early ones don’t stand up. It’s been a problematic field. They’re better now. But you know, there have been a lot of mistakes and miscues there. And if those mistakes and miscues start to enter into sort of clinical judgment, I think there’s a real possibility for serious problems.
Anne: It seems like part of what you’re saying or the process you’re describing is kind of backwards. It’s like you’re backfilling cause. We’re observing these things that might be predictive, and then we’re looking at why after the fact. And I just wonder if observing those correlations and doing it that direction—at what point in that process should patients be affected? I mean, do you think it is fine to just observe this large swath of correlations, and whether you’re in that sort of genomic prediction or you have found a new drug discovery but you really don’t know why it works—at what point should that be released to the patients?
Michael: That’s a really hard question. But let me say this, I think one of the great safeguards of medical science is that clinical practice still has a very strong professional identity, right? Clinical practitioners are still very strong professional identities, and they control a lot about clinical practice. It can also be very traditional and very slow to change, and I think this is a real safeguard against some of that. I mean, I think it’s been a dream to push this sort of data-generated insights into clinical practice for several decades now. But it just been very slow or non-existent to—it’s been very slow to happen. I think, for very good reasons, that medical professionals operate, you know, on a case by case basis, right? They operate with a patient in front of them. And so they operate by a very different logic than research scientist, in many ways, and translating this kind of research into clinical practice, is, in the future, definitely in the future. I don’t know if it’ll happen. But I do think that doctors are kind of nice, traditional kind of conservative bulwark against that kind of widespread adoption.
Anne: I think there is some evidence that doctors want to stick with sort of the status quo, at least as far as how they address patients and discuss things, and so they might be a little bit able to slow down some of these changes.
Michael: I also think you have, I mean, the other great change here is that when you’re looking at a statistical data set, you are you are talking about statistical correlations and averages and medians and means and so forth. But when you’re doing clinical medicine, you’re talking about an individual with very particular, idiosyncratic, you know, priors, who is right in front of you. So the logic is very different, especially the logic of treatment is very different. So I don’t know quite how those are going to be bridged here. This is just a conflict, an epistemological conflict that will play out.
Anne: Yeah, it will be interesting to see, I think, from the patient’s perspective, a lot of patients really want trust in that patient-doctor relationship in traditional ways as it has been in the past. So moving on a little bit to confidentiality and all of this data that is out there, what do large language models really mean for confidentiality and secret keeping?
Michael: Nothing good, I can tell you that. I mean, the way that these things are trained are just on publicly available data sets, right? Or even not publicly available data sets. When you’re talking about confidentiality, we’re really talking about this shadowy world of third party data brokers. In Facebook, when these social media companies collect your data, you know, they sell it to aggregators, third party data aggregators, who then resell it for various things. So there’s been studies that show that it’s very, very easy to destroy confidentiality or privacy by combining certain data sets. I mean, I read a good example the other day: let’s say there’s a, you want to identify a single white female in a metropolitan area with, major metropolitan, say, Los Angeles, with with no dependents. Okay, that’s a hard problem. Now, let’s say you want to identify a single white female in Los Angeles with five dependents, two of whom are twins. That turns out to be a very easy problem to solve. So privacy has its differentials in the sense as well. This is going to be a very hard problem, especially in the US where there aren’t any strong rules against this. What could happen is, you know, there are new new EU draft rules for how these large—what data sets these large language models can be trained on and that involve copyright and trademarks. So if something happens there that would probably slow the spread. But other than that, I think it’s very easy to trample over confidentiality and privacy these days. That is in the absence of strong laws in the US against data protection, right? I mean, there’s no control over what is taken or captured from you in the US, except by reading the end-user agreements on these social media companies, which I mean 99% of people do not to, so.
Anne: I think sacrificing confidentiality is sort of the give-and-take for improved access and improved connectivity, and we notice that people are more connected and divulging more and more to it. And even during the pandemic, people began using the internet even more for things that they could have done in person with perhaps a little bit less data collection. For example, online shopping, online learning, Zoom online meetings—so there was just more and more and more data. And it seems like that coincided with the evolution of the large language models and the two phenomena feed off each other. Is there a way that privacy tools can do more to keep the data and misinformation and private information all out of the purview of large language models?
Michael: You know, a lot of this data privacy conversation in the US has revolved around tools for end users, right? What can I do as a consumer to prevent this from happening to me? I don’t really think there’s a consumer—this isn’t really a consumer issue. I think it’s a more fundamental political issue that can really only be addressed through legislation at the federal level.
Anne: Yeah, it seems like even the strictest states, when they adopt legislation, and it’s not federal, it’s really difficult, because we are at the point where the apps collecting the data and the online shopping, all of the vendors, are relatively global.
Michael: So it’s very easy for a large multi-national corporation to evade to evade a set of regulations it doesn’t like by simply changing locations. That’s one of the great advantages of being a multi-national corporation. You can take advantage of differential regulations and country. I mean, did you know this Ireland loophole for recommend for taxes that all the technology companies take advantage of?
Anne: Yeah, there are lots of examples of shopping for the most lax law. So to change course, here, I have a little bit of a lightning round. These are just all really quick yes/no questions. Should author use of chatGPT considered plagiarism?
Michael: I think that really depends on how you use it, right? I mean, if you’re using it to compose something from whole cloth, certainly that’s plagiarism. But if you’re using it as kind of an idea generator—was this old rhetorical term—a sort of commonplace book, let’s call it, I think that’s a qualitatively different kinds of use. And in fact, if using chatGPT did just defacto constitute plagiarism, I think about 90% of university students would be leaving.
Anne: Yeah, some people have had academic ideas, like citing it a certain way or even looking at how it is deficient. So a student could kind of use it for a paragraph and then maybe explain what it missed, or what it didn’t get, or what it could do differently. But I agree with you, I think it’s used as already widespread.
Michael: I think the genie’s out of the bottle there. I would say that it is dangerous in one very particular way for naive users. And that is, it will make up citations or just, you know—it is like Mad Libs, that’s what chatGPT is, and it will make things up about 20% of the time. And if you don’t know enough about a topic to realize that you can really make some just outlandishly stupid mistakes, something that a knowledgeable user would never have had never made, right?
Anne: Yeah, it does seem like you know, recently there was a lawyer in trouble for that for citing fake. So it is happening, that people are relying on it and trusting it. So that brings me to my next question, which is a little more about truth. Is it possible to create programs to verify the truth of the information that these large language models produce?
Michael: No.
Anne: And do you think it should be illegal to embed chatGPT in Windows 11?
Michael: No. I think it’ll probably be embedded in the Office Suite initially, where it’ll probably have a role in developing macros for Excel and probably make its biggest impact in Excel use, right? Because if chatGPT is good at anything, it is computer programming and those kinds of formal rules based languages.
Anne: Should it be embedded in social media apps that children use?
Michael: What’s in an ideal world? No, but that, I think, is probably an impossible policing to ask.
Anne: Yeah, it seems like there’s a bit of an arms race there, and that there is already word that it will be in that sort of suite of social media apps. Do you think there should be government approval prior to public deployment of new types of technology?
Michael: No, not to new types of technology. I think what government—the role of government regulation here should be to protect the rights of citizens. If there’s anything we can learn about new technology, particularly digital technologies that rely on data, is that we need to think of ourselves as citizens not as consumers, and demand rights as politically active citizens who are concerned about our democracy and the world we live in, not as consumers who are kind of flitting from new sensation to new sensation. I think that from consumers to citizens is the switch that needs to be flipped in regulation.
Anne: And do you think the benefits of access to so much information makeup for the harm that can be caused by misinformation?
Michael: No. No way. No way. I mean… I think I have a real problem with these terms, misinformation and disinformation. I think for one, they kind of add a kind of human element to information systems that they haven’t earned or don’t really deserve. It is people who do things with information systems. I mean, certainly information systems can reshuffle information in new ways, deliver it to new places, but it is humans that do the work. And I also, I think it covers up an older you know, a lot of what goes by as misinformation and disinformation, 50 years ago or 100 years ago, we would probably call it folklore. I mean, it is sort of the rumor, innuendo, traditional knowledge, the ways, you know, obscure reasons given—it is sort of a meaning-making process for people, I think, to engage in misinformation, unfortunately. And I think it kind of covers that up, it covers the human element up, I think too often.
Anne: I’m gonna ask you if you have anything to add.
Michael: I don’t have anything to add. I will be very short and sweet here. Yeah, I think it’s time we rehabilitate human judgment and stop them selling ourselves short and comparing ourselves negatively to machines, you know. We should govern ourselves and pass regulation in terms of our being citizens of this country and of the world and interested in democratic governance. And we shouldn’t think of ourselves so much as consumers. The stakes are political, not so much consumers. So that would be my final comment.
Anne: Thank you. I think that’s a great point and very important in deliberative democracy for us to have a good understanding of what the role of citizenship entails. So thank you. This was Michael Scroggins from UCLA, and this has been the Voices in Bioethics Podcast. I’m Anne Zimmerman, and thank you. Thank you, Michael, for joining us.
Michael: Thank you for having me. It’s a pleasure to be here.