The Effective Statistician - in association with PSI

The Effective Statistician - in association with PSI

The Effective Statistician - in association with PSI

Transcript

Back to episode

00:00:00: You are listening to the effective statistician podcast.

00:00:04: The weekly podcast with Alexander Schacht and Benjamin Pisker designed to help you reach a potential, lead great science and serve patients while having a great work-life balance!

00:00:22: In addition our premium courses on the Effective Statistician Act Academy.

00:00:28: we also have lots of free resources for you across all different topics within that academy.

00:00:37: Head over to theeffectivestatistician.com and find The Academy, and much more

00:00:44: for you.

00:00:45: become an effective statistician!

00:00:49: I'm producing this podcast in association with PSI a community dedicated leading and promoting user statistics within the healthcare industry as benefit of patients.

00:01:00: Join PSI today your statistical capabilities with access to the ever-growing video on demand content library, free registration for all PSI webinars and much more.

00:01:13: Head over to the PSI website at psiweb.org To learn more about PSI activities.

00:01:19: you can become a PSI member too quickly!

00:01:31: Welcome to another episode of The Effective Statistician Today.

00:01:34: I am super happy to talk to Andy York about programming.

00:01:39: And I'm not a programmer, but of course the statisticians do program quite a lot and when you ever worked in the pharmaceutical industry You will soon learn about quality of programming and validation of programming.

00:01:57: When i first started in this industry had no clue about it But since then couple decades later there's actually some interesting things moving forward.

00:02:08: well-known assumptions are getting challenged and there's new ways of doing things.

00:02:15: And so this is something that we will talk about.

00:02:19: any way you can introduce yourself sharply where your coming from, what led to it?

00:02:25: Hi

00:02:27: Alexander yeah no great to be talking today.

00:02:31: I totally agree.

00:02:31: i'm a statistician also by trading but I'm a statistical programmer by specialism, if you like.

00:02:38: So i fell into that quite quickly as a placement student when i worked for a company called Smith Klein and Fringe which gives an idea of how far back ago it's now obviously Black Sabbath.

00:02:47: Smith Klein And was probably one the first programmers in the industry because they really didn't exist then mostly at the statisticians doing programming And obviously they had mainframe SaaS, and it just got PC SaaS.

00:03:00: It piqued my interest if you like in terms of that.

00:03:03: I knew full-trend or new basic all these old languages.

00:03:07: Yeah fell quite naturally into doing that.

00:03:08: But its grown ever since and i think is growing to the point where we have to do exceed a number of programmers.

00:03:15: actually That's been real problem over years.

00:03:23: We've introduced standards like CDIS, GSTTM and CDISC Adam.

00:03:27: There are various other things that have come along And now we're starting to look at my solutions To help us with what were doing.

00:03:34: My career went from being hands-on upto management levels for Various companies including seros as well as pharma companies.

00:03:43: So I worked for different sized companies throughout my career Until I joined Parisian, most recently was the Vice-President of Clinical Data Science at Nova Nordisk which is a great role.

00:03:55: It's like an SME type role for business because interestingly their programmers were fully integrated within their statistics department rather than being separate as they are in other companies.

00:04:05: so there has been no head of programming but obviously senior programmer if you liked that company.

00:04:12: and when i left them decided well slow down little bit go part time.

00:04:17: set myself up as a consultant and I'm currently working, but as a specialist advisor for Virussian.

00:04:23: It's an AI technology company in the pharmaceutical area And it was...it has been going through about three years and is setup.

00:04:30: another of really interesting tools to do with validating statistical programming The output data that goes into there and creating traceability matrix of everything you do.

00:04:41: So first question i want start this when we talk quality and quality of programming?

00:04:48: What is that for

00:04:50: you?".

00:04:51: So, to me it's creating your own product.

00:04:54: The table the figure listing the statistical analysis if you like with absolute confidence That what you've created is correct To the best of your knowledge.

00:05:04: It's less so about the steps in between but by inference they also have to be a very high-quality A very high standard.

00:05:10: as you go through And...it's about having the right knowledge to craft that from a very disparate set of clinical data, you may have from study-to-study because obviously there are a lot of indications we work on.

00:05:27: New indications come along all the time and in different ways we collect and record data for our clinical trials.

00:05:34: so understanding how to manage it is extremely important

00:05:38: I think.

00:05:38: when I speak about quality basically meet the expectations of the customer.

00:05:47: And now customers, you can have many different things.

00:05:50: Customer could be your future self.

00:05:53: so when next time look into a program and usually a programmer is always changed You can easily change that.

00:06:01: it's fast and easy.

00:06:02: It meets the expectation for the statistician and the specifications.

00:06:14: It meets also the demands of regulators, and the regulators have all different demands in terms how they ensure that what they get is actually correct good quality.

00:06:40: in terms of you have a lot of insurance that this program does what it's supposed to do.

00:06:46: So, what are typical approaches to reach through this insurance for the regulator?

00:06:53: The regulator is obviously looking at all of these in terms who is the ultimate customer?

00:06:58: and I think its just so important control and traceability from end to end of what you've done so that when you take a data point from raw or maybe mapped it into your SCTM, created your analysis variable within Adam for example.

00:07:15: You can trace all the way back with full confidence including all assumptions around subgroups demographics stratification for example.

00:07:22: they're done in that analysis to know if this is correct.

00:07:26: Your point about your future self is really important too because How often do regulators come back and ask questions five, ten years down the road?

00:07:34: And say oh let's have a look at this particular adverse event.

00:07:36: Can you give us some tables based around that so we understand what has gone on

00:07:41: there?".

00:07:41: So having to revisit that code maybe five or ten years ago... The ability of understanding those processes is critical but I can't expect to get into traceability because chances are that programmer who wrote them originally may well have moved in their time periods.

00:07:58: how the program was written needs to be transferable at some help in a way, recorded somebody.

00:08:04: So incredibly important that we can prove what we've done and obviously just doing the coding.

00:08:08: We could talk about validated systems as well conforming to all of the regulations if that's a given.

00:08:13: The piece I have talked about is the big must-have.

00:08:16: Let us focus on the programming not so system part.

00:08:20: yet another completely different level which i also don't know a lot.

00:08:26: Think back of my time in terms.

00:08:28: when I did programming, the key for traceability would be around things like footnotes.

00:08:35: So i have a table and then look into okay is there's a footnote that gives me link to some folder system somewhere?

00:08:43: The first problem very often will be... ...I don't know how to get there!

00:08:48: Maybe I do not have access or maybe it sets a CRO but theoretically its at least linked.

00:08:55: But if I have access, then i get to a program.

00:08:59: And so now you need to open the program and in the programs there's hopefully also links into further programs yeah?

00:09:07: So okay these datasets were pulled from this one and set from that one and just three different data sets pulled in from other programs... ...and then you go from here!

00:09:17: That can create quite a big tree in terms of where things are coming from, and it's a pretty manual process.

00:09:28: Not something that you do.

00:09:29: five minutes after lunch can take quite a lot of time.

00:09:32: so is there what we think about traceability how it should be?

00:09:38: Yeah absolutely not!

00:09:39: We need far more than that.

00:09:41: I think the core elements to what your looking for are absolutely correct.

00:09:44: this is how you do manual traceability.

00:09:47: But one of the reasons why I loved the Varusian software and on other reasons, I joined them is because their software actually completely automates that traceability.

00:09:57: So it builds a huge matrix with all related components to your clinical programming from raw data through to table figure or listing at the end And you can pick any point in our graph like trace backwards or forwards To see exactly how your database derived.

00:10:14: It will show you this source code that actually was used to derive it.

00:10:17: It doesn't matter which program it was in, you'll run across all of your programs and understand the connectivity between different elements in the programs And it will display this pictorial so you can literally step back piece by piece To see The full traceability matrix Of how or where data got from raw to table.

00:10:35: But its just really is...to me Its a future of understanding Full traceability code.

00:10:41: Sorry Just to interrupt because You've made a distinction.

00:10:45: I just talked about backwards traceability, yeah?

00:10:48: You have a table and you go back to something.

00:10:52: But you now also talk about forward traceability.

00:10:56: so from the data point what is actually happening with that?

00:11:01: And...I don't know if i've ever seen anything like this!

00:11:08: This is absolutely correct, you can go forwards or backwards.

00:11:11: You start in the middle if you really want to and go either direction.

00:11:14: it's just so flexible for understanding.

00:11:18: It works almost with data managers as well as statistical programs and even medical writers that could dig into how a particular piece of data arrived at a table If there was derivation involved.

00:11:32: It does require knowledge of programming language to understand the code that was there, but it's actually embedded really deeply within their tool.

00:11:39: So less deep is understanding.

00:11:42: did this particular program or this piece data conformed your study level specifications as well as the standard as it flows through a process?

00:11:53: And system will automate and say yeah, it follows rules with TecMLT Next to it, or else in these investigation by somebody who understands what's going on.

00:12:02: To say yeah actually that is correct.

00:12:04: well maybe there's a problem with the specifications Or There's A Problem With The Coding That Was Done To Get There.

00:12:09: So For That That's Going A Little Bit Beyond Traceability.

00:12:11: More Devalidation Of Course But It's Yeah Something We've Never Had.

00:12:15: I Totally Agree And It Is Something That Has Been A Very Manual Process.

00:12:19: Its Quite Nice.

00:12:20: Use Cases for That.

00:12:21: Yes Just Thinking About There Some Change In The Data and You Want To Know Which Tables Are Affected.

00:12:27: And for certain types of data, maybe it's pretty clear which labels are affected.

00:12:31: But others may be not on first glance and that would make things so much easier... ...and you can directly show it improve it and document it.

00:12:43: Yeah!

00:12:44: It is absolutely right.

00:12:45: the system will do that.. ..it'll also generate what you need for submissions.

00:12:49: So it'll generate the define.xml automatically now and it will also generate things like reviewer notes as well, all based upon the traceability of your code.

00:13:01: So you have these things.

00:13:02: not only that you can track it back yourself manually but actually is now starting to create the submission level documents.

00:13:07: The program's always great.

00:13:09: It'll take a lot of time To get done.

00:13:11: You could do in days-to minutes I would say... The goal was be able this in five minutes At moment on the test.

00:13:18: Well no testing.

00:13:19: the pilot study that we ran took two days rather than several weeks.

00:13:24: So

00:13:24: very good.

00:13:25: That helps us a lot with on the traceability part, which I think is really important thing.

00:13:32: in the future it would be awesome to have extensive traceability beyond clinical trial reports and tables for that also into the medical affairs area because as soon as you have some work product goes beyond the clinical space.

00:13:52: At the moment it was a marketing, medical and fair space.

00:13:56: Very often all of traceability is completely lost And specifically to forward traceability.

00:14:02: there were never other things here But even backwards traceability really difficult in these big organisations.

00:14:09: Yeah I totally agree.

00:14:11: that's not an application that Mauritian has so far considered but i think its'a good point.

00:14:16: once your information reaches the marketing guys you just don't know what they're going do with.

00:14:21: And having that stamp of quality, it goes forward to marketing.

00:14:25: It's really critical I think for companies because they be quite costly to make incorrect claims about your drug and what you can do.

00:14:32: so as we've seen in the industry is a whole other thing.

00:14:34: super suggestion from the future.

00:14:37: Yeah there are lots opportunities cost savings for quality improvements for speed improvement.

00:14:43: So let's talk a little bit more about the validation step.

00:14:47: I learned there are couple of different validation things.

00:14:50: you can have, basically here review your own code someone else reviews your code.

00:14:56: double programming.

00:14:57: i've seen people doing triple and quadruple programming.

00:15:01: even to zero did double programing but we didn't trust and therefore we programmed that as well because for some other things we the senior statistician still wanted to program it himself.

00:15:14: I've experienced that directly myself at a company i'll work for in my past life.

00:15:18: yeah quadruple programming is just why.

00:15:22: so what do you think about all of that work?

00:15:25: Is it really necessary?

00:15:28: Yeah we're going full circle, these days were forced to be and go full circle because when first started out as programmer there was no double program actually had the table contents that I programmed everything from.

00:15:39: It's upto me how she designed the tables mock table, or maybe a few but not many mock tables that we go with.

00:15:46: And so I programmed it and gave it to the statistician said yeah that's good although this needs to change.

00:15:51: uh... That was it!

00:15:52: Then we started working in more structured fashion where We had a lot of specifications that needed be produced In order to produce our outputs.

00:16:01: You essentially have to work power at fashion from those specifications to future output check by another independent programmer who would go through all your SAS logs and check each data step, and prop prints from every dataset to make sure that the dates were being moved through in a process correctly.

00:16:18: And of course it went on for there double independent programming came along.

00:16:22: we started getting these different options.

00:16:24: as you say do have an independent program into yourself checked?

00:16:27: Do you double-program things so it's all adding layers and layers of inefficiency.

00:16:32: if you like what we do.

00:16:33: It has been essential.

00:16:34: Quality is everything because the last thing you want, there's a client coming back and complaining at that.

00:16:42: You've done something wrong And particularly important as an industry move towards low cost centers in India where you had to spend A lot of time training people who didn't know clinical trials very well To get the quality they needed out of those people.

00:16:54: Of course industries moved along way In that direction since but it's extremely inefficient.

00:16:59: I think we're The only industry in the world That would do Something like this.

00:17:03: so if you go to finance or anything like any bank You can't tell me that they double-program all of the financial reports, using sales or whatever language their'e using.

00:17:14: Even if it's developing software then you're not going to double program Microsoft Word and make sure your runs correctly.

00:17:22: We are really a bit of an anomaly in that sense.

00:17:24: coming back from validation built into what Parisian has done is takes away this need for two people programme piece of output because of the validation reports that it throws up.

00:17:36: You can go down, you could shrink maybe from two FTEs and want to program... The original I wanted do QC programming.

00:17:43: so one in a bit FTE's.

00:17:45: So if they want to do the programming or check out the outcome That has been pushed through the RISD AI To tell whether things look correct Or not.

00:17:55: And thats just huge saving for resource when looking at where industry is now And it is full circle, so we're going back to a more efficient way of validating what you've done from the very inefficient process that's grown up over time if you like.

00:18:11: This where AI I think truly starts coming into equation and can really help us.

00:18:16: That removes the mundane work or busy works that were forced to do To guarantee quality But its doing in an efficient and sensible way that still controlled by humans.

00:18:29: ultimately And I think that's so important, we have a degree of control over what the outcome is.

00:18:35: And regulators are basically fine with that approach?

00:18:39: So yeah, I don't think regulators mandate you to do double programming and all they're mandating it has to be quality... ...so i dont see any reason why there should be an issue because there still human in loop doing checks And I think even now if you were to do a submission, nothing the FDA might require you say.

00:19:01: If anything's been generated by AI then you have to declare that.

00:19:06: but with this tool You're not actually generating anything with ALA just using it to check what you've done.

00:19:11: The still is that assumption an original programmer has done at the initial work To get the output in first place.

00:19:17: But of course That's another step along the AI path and maybe taken by some companies.

00:19:23: Yeah i can see that saves A lot of time a lot of budget and for the programmers, a lot mundane work.

00:19:32: I don't know... I think tricking someone else's work is not probably the most fancy thing!

00:19:36: I think essentially it's going to change our roles somewhat as programmers.

00:19:41: so i think in days where you can just sit on the corner follow your specification and turn out programs to do at tables are probably over..I think what we tend become now more data scientists and people who understand the data, understand equality of what you're producing.

00:19:59: And really we are not interested in just fottically following the specifications to get it out.

00:20:04: but that may or might be correct because there's no guarantee with double programming course.

00:20:08: that is correct which made both have followed the specification incorrectly.

00:20:13: so yeah I think this does change the landscape.

00:20:16: It takes a lot pressure off statistical programmers who spend an awful time doing other things as well.

00:20:22: they aren't just programming.

00:20:23: They're in meetings, discussing the data and how to meet the timelines of things like that these days as well.

00:20:29: As considering how to produce things like DSURs PSUR is what you name it?

00:20:34: There's so many different things we do today And this starts to alleviate some of the resource issues That have been industry.

00:20:41: The other thing about removing jobs or replacing with AI Is actually using What You Have Better In A More Efficient Way As You Go Forward.

00:20:48: Awesome!

00:20:49: Thanks So Much.

00:20:50: That was a very, very insightful episode about where we have came from in terms of programming.

00:20:57: How our demand for quality has evolved into a lot of specifications and additional work And what now can do with technology to make things much more transparent, traceable higher quality with removing all the mundane work and what that has implications for a future of programming.

00:21:23: Any last idea or thought you would like the listener to leave?

00:21:28: The

00:21:30: main thing I've been doing this for almost forty years, is looking at crystal ball thinking about where we're going in terms how we present our results to the regulators.

00:21:40: And you know, just would love instead of presenting them with this realm of static information that we start thinking more about giving us more dynamic tools so that we can say well here's are interpretation of data is a tool for everything in it.

00:21:53: see if you agree and move to very different style approval.

00:21:57: but what we've done I hope comes into the next five ten years.

00:22:01: With technology i think the advancements.

00:22:04: Basically, our biggest transfer was that we don't chip piles of paper folders but do the same in a PDF.

00:22:12: But with hyperlinks it's bigger than transomance because sometimes my feeling... Of course there are some other things going on behind it.. There is more interactive things and so much opportunity

00:22:25: Exactly!

00:22:26: Thanks so much Andy, great to have you.

00:22:29: Perhaps a few things that will not be the last time we talked and if you want to talk to Andy, check them out on LinkedIn.

00:22:37: And there you can also find all of our details about the company Ferrisian that we talked about.

00:22:43: Thank You Alexander!

00:22:44: It's been a pleasure talking to you as ever.

00:22:47: Bye!

00:22:52: This show was created in SOSCHU vs GSI.

00:22:56: Thanks to Rain and Thirteen Afivias.

00:22:59: Well, position on the back count and seat you from the knee.

00:23:02: Reach your potential lead-pride summons to serve patients.

00:23:06: Just be an effective statistician.

About this podcast

The podcast from statisticians for statisticians to have a bigger impact at work. This podcast is set up in association with PSI - Promoting Statistical Insight. This podcast helps you to grow your leadership skills, learn about ongoing discussions in the scientific community, build you knowledge about the health sector and be more efficient at work. This podcast helps statisticians at all levels with and without management experience. It is targeted towards the health, but lots of topics will be important for the wider data scientists community.

by Alexander Schacht and Benjamin Piske, biometricians, statisticians and leaders in the pharma industry

Subscribe

Follow us