BEBPA Blog
Reference Material Conference Day 1 Panel Discussion
A transcript of the Day 1 “Question and Answer Session” from the BEBPA Virtual conference focused on Reference Material for late-stage clinical and commercial pharmaceutical material is now available. The conference was held on February 25 to 26, 2025 (View conference brochure here). The conference was well attended by many small and large biopharmaceutical companies, all trying to find the best path to developing and maintaining an in-house reference program. To get the best practices and ideas of future approaches we reached out to regulators, individuals working at standard organizations, statisticians, managers from industry running reference programs supporting global commercial products and consultants who have been involved in solving existing problems in the reference world.
On the first day, the panelists included Jaana Vesterinen from Finnish Medicines Agency, Sian Estdale from BEBPA, Raffaella Rosi formerly from Bavarian Nordic, Laureen Little from BEBPA, Seth Foltz from Eli Lilly and Company, and Jane Robinson retired from NIBSC.
After presentations on the first day an hour was dedicated to answering questions from the attendees. The questions came fast and furious. Key topics of discussion included:
- How to bridge old and new reference material.
- Types of characterization tests to qualify reference material.
- Replacement of Primary Reference material during commercial manufacturing.
- Calibration of in-house reference material.
- Value assignment of reference material and whether to assign 100% or actual value within the CI for 100%.
- Lots in late clinical development that should become the primary reference standard and the first working standard.
- Number of labs involved in the value assignment of in-house reference standards.
Read the full Q&A transcript of day 1 below.
Question 1: If the same in-house RS is used for content and potency measurements of the product and a well defined correlation between content and potency is justified would it then be sufficient to evaluate shift in content (and not potency) at change of in-house RS?
Jaana: I wouldn’t, I wouldn’t recommend it. I think that’s not the proper way to do it. You need to do both. You really need to do both. And focus that it is important to be done.
Laureen: I’ll just say, Yes, I absolutely agree with Jaana, you need to assess potency. Oftentimes we don’t know all the degradation pathways of a molecule. Sometimes potency is the only assay that will pick up new and novel degradation pathways we don’t know about. It will also pick up the accumulation of the mix of “hits”. You know, 2% less here, 3% less here, 4% less here, and although it passes all the individual specifications, it all adds up to a whopping 20% drop in potency.
Seth: Yes, I would agree with that. The BEBPA white paper of Reference Material covers this a little bit. We call it a multi-faceted approach, where you’re looking at all the physical chemical testing along with the potency. Sure, in a more accurate manner. If you are shifting or drifting, all those data sets are needed to guide you on exactly what’s going on. This is probably the more appropriate to take.
Question 2: Should in-house RS be calibrated against all existing pharmacopeia RS?
Jaana: According to my understanding, if you have a product licensed in Europe, you need to relate it either to the European Pharmacopoeia standard or to the international WHO standard, but not to the other areas’ compendial standards. What the other compendia or authorities request, I’m not sure of.
Jane: Well, all I can say is it would depend on what the other authorities were asking if you were trying to license in their area. I think the WHO standards, provided they’re suitable for your product, are generally regarded as international, but each authority has the perfect right to ask for whatever they want.
Laureen: Do the International Authorities ever provide any kind of linkage between their references? I’m not sure it would help you out in this case. If you’ve got a Pharmacopeia Standard in Europe, and then perhaps a WHO or some other location, like Japan. Do these standard organizations ever get together and look at them?
Jaana: Usually the European standards are aligned with WHO standards. If a standard exists for both regions. But I wouldn’t know about Japanese authorities, or any of the Asian authorities, how they work around this.
Jane: There was a wonderful example of the USP, I think the low molecular weight heparin standard. One of the heparin standards diverged from the international standard, and a big effort was made, after quite a long time of divergence, to bring them back together again. That was quite an example of where things diverged, and there was a concerted effort to get things back together so that the USP and the WHO standards were, in fact, the same.
Laureen: I assume those kinds of studies are in the public domain? A company could quote those studies. I would think you would still have to demonstrate how your product and your assays look with these references, and verify that these references still look the same in your hands.. But it may not be quite the burden of assays that you would have to have to link it to the two that have been bridged by the authorities at some point.
Jane: Some of the standards, in fact, have been prepared with the same material for more than one standard. A batch will be split between different authorities.
Laureen: In that case, it’s just a waste of resources, both of the standard itself and the company’s internal resources. Seth, have you ever run into this issue where you have to use multiple international standards?
Seth: Yeah, not so much for large molecule, but small molecule for sure, where we’ll have to do comparison, and show equivalently, at least for multiple.
Sian: I’ve definitely experienced having to perform analysis to different pharmacopeia standards, but not for potency. Definitely for pH appearance, you would have to do separate assays, whether you were releasing in Europe or US, but I don’t know, it’s slightly different for potency.
Question 3: I saw that [Sian] described in one of your test cases that also charge distribution, low molecular weight, high molecular weight and other tests which are not potency on content were part of the qualification using several vials, so there are multiple repeats performed, also for physical, chemical and glycosidation testing. Is this something that is standard of use? We usually do only extended testing for content and potency, and of course, we do characterization and physiochemical but, but not multiple tests?
Laureen: So the question is; Is it a requirement to perform several tests, such as tests for charge distribution, low molecular weight, high molecular weight content etc.. Sian, this gets back into your talk, and Jaana also into yours, about not only for potency, but for content also.
Jaana: Do you mean monitoring, repeated testing, let’s say yearly, based on your stability of the reference material? Yes, you need to do that. Or is the question related related to the number of testing per or number of assays per reportable results? That would be a different thing.
Sian: Yes, there were three vials. I wasn’t involved in the testing for the charge distribution and honestly, I don’t really know the details. This is where my brain goes: it’s like you completely understand why you would need several vials for potency testing, for charge distribution. There isn’t inherently a massive change whether you’re doing it by capillary electrophoresis or ion exchange HPLC, so I’m assuming that they are the reason for it is because you are testing each vial by each different method. Therefore, it’s indicating a test for charge distribution for each of those vials that you were using as independent samples within the potency test. Cumulatively, it gives you that confidence in the data that you are generating as a standard. You wouldn’t do multiple testing for charge distribution, because it’s a reliable test.
Jaana: Maybe in a sequence you just inject, typically twice from the same sample to be on the clear side if there is something wrong with one of the injections.
Sian: Yes, but these were separate files being used as well, but you’re absolutely right.
Nancy: I had a question to Seth in the chat, which he answered, about applying quality by design for reference materials. I think all of these questions that we just been talking about really can be addressed nicely in a quality by design approach, where you start with the intended use of the standard for across all different assays, you know how it’s going to be used, and getting at some of these details. Seth, could you maybe expand a little bit more on the QBD answer that you gave in the chat?
Seth: Yes, you’re absolutely right. You do that initial characterization, you’ll have a large n. But then when you put it all together, you can use QBD to kind of assess those types of strategies, and put together a control strategy. For instance, we have a monoclonal antibody strategy where we’ve holistically looked at a bunch of different compounds, and put together suggestions such as charge, for instance, you do in all three for initial, for requalification. You would just have to do in one. And you kind of holistically apply that to all of your monoclonal antibodies. With caveat being, if there is an assay that has a large variability, and there are something inherently about the assay that may require a larger n during requalification, then you’ll have to address on a yearly basis. But overall and overarching, you use that quality, but it’s time to be able to put the infrastructure for basically the basics, of what you need to do now. If you need to do additional work because of the scientific issues or anything else that comes up, and you can reapply that. But that’s kind of the quality, but it’s a design approach that we build for each different modality, for all these different types of drugs.
Nancy: Seth, did you also include the positive controls for the assay within the reference material as part of that quality by design?
Seth: Yes, absolutely. You can look in the BEBPA Reference white paper too. Because that’s built in there too. We suggest having not only the two tier, but also having that assay control that you would build into your strategy. These lots should be different lots, which is sometimes challenging.
Nancy: I think, in the vaccine space, if I’m not mistaken, people have used their positive controls as a way to reject the international standards when they switch and give a very big negative impact on the readouts, right? I really like the quality by design idea, and really hope that we can start promoting that as just a general concept to be used for, again, reference material and control design.
Question 4: To Sian: For those first two case studies, were there physical characterisation methods used as well as potency? Or was it purely potency methods used for reference qualification?
Sian: Honestly, the potency assay is the one that we always spend so much time on. Everything else sort of goes through super quick, and then you come to the Elisa, and then you come to the cell-based assays. These are the ones where everybody starts tearing their hair out and saying, How many times do we have to do it? So yes, those are the ones that I focused on, because that was the data that we got when we were all sitting down, discussing it and talking about different approaches. It’s all about that N number, how many do you have to do to be able to have confidence in your results? Thanks for that question!
Question 5: What do we do when we are in a commercial phase and we are running out of primary reference standard? Are there special considerations when we qualify a new primary standard within the commercial phase?
Laureen: This was a question I’d actually jotted down too. Okay, we have protocol for qualifying working standards. But how is that protocol going to change when suddenly, 10 years on, we have to replace the primary? Seth, perhaps you could jump in on this. I bet you had to do this, I bet you’ve had to replace your primary, because, as Jaana mentioned, we picked the original primary to have the link to clinical material. But clearly, if you’re 10, 15, 20 years on, you may not have that anymore. So what do we do? We do something different, or is now the primary just become a very well behaved working reference?
Seth: I mean, I would jump in there. The new primary. You pretty much want to treat it almost as a working, where you’re going to chain that original primary to the new primary. So you are going to do a large N of testing for the potency. And it’s basically what you would do for a working reference standard, if you want to think of it that way. You have that correlation between those two and then be able to justify, if there is a shift, why there’s a shift, and everything else for that. You’re going to have to do the full package. It won’t be just one run. It’s gonna be a big comparator to show that they’re either equivalent, and then if they’re not, have to justify why there is a change and the scientific rationale around that change. But like you said, hopefully it doesn’t happen often.
Nancy: Do you use 90 to 110 as your equivalence bounds, typically. Or do you make them tighter?
Seth: We use the 95 to 105 with the confidence interval containing 100%. The 90 to 110 is also a valid way of going about it, also, long term. And again, I’m not a stats person, but stats wise, they’re almost equivalent when you look at those two, but us personally, we go with the 95 to 105.
Laureen: But I do want to make a plug. I know of a case study and actually multiple cases, that I kind of roll into a single case study, where you get new information about “new” MoAs since you laid down that first primary reference. Anytime you replace that primary I think you need to do your due diligence. You need to go back into the literature. You need to find out if your subject matter experts now say, You know what there are some new mechanisms of action that we now know about, that we didn’t know before. This is going to become especially important as we get to complex product types that might have 100 mechanisms of action. For arthritis, I have arthritis in my knees, and I’m keenly interested in this, but you know that right now they have hundreds of mechanisms of action, they pick what they think are the top three, but you know that the top three could change. I think when you’re replacing a primary and you can’t link it to your clinical trials, I think that’s the time you really have to do your due diligence and determine if there are new proposed mechanisms of action out there. Make sure that you are characterizing that new primary in that space. If you don’t do this and keep up with current knowledge, you’re going to find yourself in a lot of trouble.
Nancy: Actually, Laureen, I think when you make that point, it really, to me, reinforces the whole idea of quality by design for reference material. Because if you have a target reference profile, what you’re saying is we need to update that. So in that framework, you need to update your target reference profile, because we now have a new mechanism. We might have new information. And so it’s the perfect place to go back to what you had, update that, and perhaps not be able to make a direct link, but really kind of start afresh, right, in some way, with some justification based on, you know, some total of,
Laureen: Yes, I’m not saying I think this will be easy. I really don’t, because now we’re not talking just about analytical results. We’re talking about biological impact. And so I do think, yes, quality by design is a nice approach, but I do think that when we say that we need people to understand that it’s going to be a very biological story, if you will, as well as the design.
Jaana: In such situations, I think it would be very valuable to have the old, preserved, retained samples, or whatever things you have working standards that have been sort of preserved, that you can retain and compare to.
Jane: I was just going to add also product. I mean, it depends on stability and so on. But if you find that you’ve got something that’s appeared, can you show that it was exactly the same in the product? It might be the product retention samples that went into the clinic. It depends on the stability of those. But that is absolutely essential. If you find something a nasty surprise when you’re looking at the new mode of action, can you show that, in fact, your product always had this, you just didn’t know about it.
Laureen: Rafaella, you went and compared your two methods that you put together, you made the comment to me, I guess we were lucky, because they were really the same. Did you have any experience where you were looking at different MOAs also for your product?
Raffaella: In a different company, we had to bridge an in vivo to an in vitro within, but they had two very different mechanism. That is way more difficult, as you know, when you go to compare the in vivo method with in vitro. That creates another problem, right? In this case, we were very lucky, because we tried to keep the same critical reagents, and there were two binding assays.
Laureen: And part of it, circling back to Jane’s point, is that you were doing this in a relatively short period of time. You had access to all of the retention samples. But to Jane’s point, if you still have those retention samples, and it was 10, 15, years on, but you can demonstrate stability of those samples it would make your life so much easier, make it more similar to your experience that you had.
Question 6: Question for Jaana: Your talk is interesting to me. I was holistically thinking about it if you had a potency at 94%, but we assigned 100% what impact would that change have? For instance, if this is the second working reference standard, the first one was at 100. What impact would it have when you release products? And do you need to look at those final release potency specs? Because now the variability is wider. I haven’t, I’ll be honest, run into this case. But pondering the question: What if you did have a 4% difference or, even worse, right? If your initial standard was at 98 This one’s at 104 that’s a pretty large variance that you’re not accounting for in your final potency release. So that was just made me question, and kind of wanted to get your thoughts on that aspect of it.
Jaana: Yes, you were talking about the possibility of widening the specifications because of the variability of the reference standard. I see your point, and I think these are very interesting sorts of things to think about. I do not think it would be the best solution to widen the specifications, because usually the specifications are set relatively wide. They may be two SD already, and so that might not be the correct way around it, but I see your point. And this I think it goes more to the point that there is a danger for drifting. What is the correct way of setting the value for the reference standards? I think it’s an interesting thing, and I do know that there are different different practices that companies have for that.
Seth: I was curious, looking at Laureen’s chart that she showed earlier, with a little bit of that drift that you had in your demonstration, if you picture an 8% shift, for instance, and with the variability of that assay, you really could kind of push those limits. I agree with you. The release specification limits are much wider than we do it for reference standards, for instance, but I could see that causing some issues later down the road
Laureen: I remember, Seth, when we were working on the white paper that you and Matt, (Matt works with Seth at Lily) were always talking about shift and drift. When you think you’ve got something that’s really 104 and you assign it 100 depending on your control of your product, you may actually see a shift. I would think that should be part of our trending, and that’s why you’ve always got to monitor and trend your manufacturing batch releases. And that’s something we as manufacturers have, that the reference standard organizations don’t . They are not manufacturing the product on a routine basis. We are. We, the manufacturers can see, we can sometimes differentiate, depending on how often we manufacture, between drift and shift
Seth: Yeah, and when I look at final release potency data in honesty, it generally gets tighter, just because the more you manufacture it, the more you do it, the more you run those assays, you do tend to get better at what you do. So it isn’t necessarily a huge issue. It was just something I was thinking of.
Question 7: Question for Seth: When we were talking about it’s 100 or 104, but we called it 100. You had a comment in the chat about this. Several people have mentioned correction factors. Could you discuss that here? I thought your chat made some good points.
Seth: Correction factors is definitely something I stay away from. When you’re defining that potency versus that primary, that’s a definition, right? You’re saying if you’re comparing that working reference standard to that primary reference standard. You’re working set, working reference standard. Let’s, for instance, say it’s 96% versus that primary. It’s not a correction, it’s just that you have defined that as 96% of its potency. If I equate it back to a small molecule, if I’m doing purity testing, and I get 99% and then the second one, I get 95% as the new standard. You use that, that new purity value to assess your content. It’s the same type of thing with this potency so basically if you have that 96 you would multiply that by your .96 times your content value. And then that’s what you’re going to use to report to do your potency testing. It’s not necessarily a correction of any sort. It’s just that you’re using material that’s 96% pure compared to your primary. That’s the way I tend to think about it.
Sian: The only times that I’ve come across this where it’s been a challenge is when showing correlation with time. The correction factor has a meaning when you think about the results that you’re generating compared to what you previously had. Therefore I’m not quite understanding you, because that’s how I appreciate the need for a correction factor. But am I thinking about it differently from you?
Seth: Let’s say, for instance, your first working reference standard was at 101, or 100%. Now you have a 4% difference in your second working reference standard. To keep that shift from occurring, you would have to use the appropriate potency value of that material. So the potency may be 4% lower. You will see in your final release that 4% because that’s what the actual value is. Now, where I could see it getting problematic is if the first time you didn’t use if say both of them were 96% and you didn’t correct on the first one, you do on the second one, then I think, see, you can get an issues with that.
Nancy: I’m going to talk a bit more about this tomorrow in my talk, obviously, but I think the idea is, the two different sides we’ve been talking about, is what’s the impact on the results? I like to just think of it. Imagine our positive control is stable and homogeneous over time. So let’s just say it’s, you know, starts out as 100% it’s not the primary reference, but it happens to be 100% right? So we start there, and then, when you assign the value to the reference, the working reference, you can either take an equivalence approach, like, if you’re telling me, you’re using equivalence 95 to 105 and if the material comes in between 95 and 105 with confidence interval, you’re going to assign it 100 because you’re saying any difference between 95 and 105 we’re considering not significant. Then what will happen in reality, though, if it is 96% you will see a bump up in the control on average over time, simply because you’re calling it 100 when it’s actually lower. So you’ll overestimate, or it goes the other way around, right? So then, like your first your question, Seth was, well, how do we manage that in the spec? And that’s exactly what you have to do. You have to budget for that in your spec, which I’ll talk about tomorrow. More in detail mathematically how to do that, and it doesn’t change the width of your equivalence interval if you use an equivalence model for the release potency, because what you do is you just bump out the release values, but then you have to have a smaller confidence interval as you move those out. So you don’t change the actual concept of what you’re doing at the level of release. You just change the budgeting of the materials, right? You budget a certain amount for bias and a certain amount of variability. A kind of total error, done in a slightly deconstructed different way, which I’ll show schematically tomorrow. It works out very nicely, though, and that’s the reason I asked you about the 90 to 110 versus the 95 to 105 because whatever way you set it, you don’t necessarily have to set a correction factor, but you do have to budget in your release spec or not budget for it, and you do a correction factor simply because it’s a relative potency assay. So you’ve established the number very well. There’s no reason to call it 100 you call it whatever it is, and you measure that. The trouble is, how well do you know that mean? I’ll get into it tomorrow, about the concept of trying to show equivalence and not different at the same time. Think about it. One requires a small confidence interval for equivalents, and then to include not different requires a large confidence interval. So who powers to go large and small at the same time? Nobody.
Question 8: During the first presentation, it was indicated that if the potency tested within 96-105% we would be able to assign the new RS’s potency as 100%. Could you explain a little bit more about the rationale for doing so? Is it based on accuracy and precision of the assay or something else to decide the interval of 95-105%?
Attendee: When people talk about, we have a potency result to come out, say, for example, 96 with the 90/95% confidence interval as 93 to 97 for example, so 100% is out of the range. Then you would say, 96% is within the 95 and 105% so I’m still going to assign it as 100%. I don’t quite know usually how people decide 95 to 105% range is acceptable, is that based on your method qualification, accuracy and the precision? Most cell-based potency assays have about 15% accuracy/precision, or something similar, like that. So I was wondering how can we say a 4% shift is meaningful or not meaningful? What if the method itself is 50% variable? I am confused, how do we balance? How do we consider, the precision or accuracy compared with the line we draw there? Should we say the range should be more than the accuracy to make it meaningful? I feel that’s probably true but I don’t know so That’s my question. Always confuses me. If you could, like, explain more and give suggestion, thank you.
Seth: If you got a result of 96 and your 95% confidence interval did not include 100 you’re going to assign a potency of 96% because you didn’t meet that requirement of including 100%. All you would do is take you 0.96 times your content value, and that’s what you would you would use to run your bioassay. I think that was the first part of your question, but the second part of it is, is, what is the appropriate interval to set? And I would be the wrong person to answer that, because I am terrible at statistics, but generally, again, that’s where you want to have, like Nancy was talking about, your quality-by-design, you have those set parameters preliminary before you do your testing. Of saying this is what you’re going to need for, you know, for us, and since we’ll do a 95 you have to 95% confidential has to include 100 fall within that range, and therefore you would consider it to be 100% if you fall outside of that then you need to have a path of exactly what you’re going to do and how you will assign that that new potency. As long as you have that mapped out beforehand in your protocol, stating that, then I think you should be fine. And like Nancy was talking earlier, you can use 90 to 110 also is another valid range to go with.
Laureen: And I think what range do you go with? I think that’s part of the question. It does get back to what the methods are. I mean, if you’ve got a very precise method versus a less precises method such as an LD 50 assay, you know, it’s, it’s going tobe different. Work with the statistician, and figure out what these values are. They will differ from method to method. We have to do the characterizations utilize our development tools and understand the variability. This is based on the data and in knowing how many assays we can physically do. Let’s be honest. If you’re running an animal assay, you know, you are going to be limited in the number of assays you can do. There’s no magic number 95 to 105 or, 90 to 115 or whatever you pick. There’s no magic number. You have to have the data to support that.
Nancy: Well, I mean, I think you’re a couple of things here, just by setting equivalence, if we do equivalence bounds and no correction factor, that definition that whether you choose 95 to 105, or 90 to 110 or, you know, something wider, 80 to 125, whatever the numbers you choose, what you are saying is you are allowing for that much bias to be introduced into your future lot release, and that needs to be accounted for your specification if you’re going to do that. If you’re not going to do that, and you’re going to still put those equivalence bounds. But for a different reason, what you’re saying is we want to make sure the manufacturing process is operating within a certain range. We want our products to be in there. However, we’re going to assign the value based on what we measure now, Seth, where I’ll get into it with you. I won’t do it now, but because I want to have something to fight about tomorrow out of my own talk. But no, basically, you know what you think about it for a moment, when we’re talking about a value and its confidence interval. Okay? So think here’s the equivalence bounds, we’ll make it like this, equivalence bounds, and here’s your center and now you have a value and you have some confidence interval. If that confidence interval is really wide, meaning you don’t have much confidence in the result, right? You have a wide interval. It includes the center point. You’re calling it 100% because you don’t know it very well. But if you have a really good measure, you’re calling it something else, which you then can either apply a confidence correction factor to or except within some bounds, you will allow that much bias to your product measurements. So you have to decide, if you can’t make the decision in isolation. They all link to each other. Those decisions right and making the right choice. The problem is, like, I’ll say is where we got off on, I think a bad start just historically, is we did not different. We try to prove not different, which you don’t prove not different, yeah. And when you now say, well, we wanted to be equivalent and not different, that doesn’t make any sense. You always get higher statistical power, proving differences in equivalents with increased sample size. We want to encourage better measurements, right? Always. So we have to rethink, which is what we need to do. And your questions, by the way, take us exactly to that point, and that’s what I’ll be talking about again. I won’t, I’ll say no more on this, just because otherwise it’ll wear out the topic.
Laureen: This isn’t, a new topic we talking about, how much do you allow and how much do you budget. I think this is just new terms for what Tim Schofield and Phil Krause discussed about 15 years ago. It’s more to the point of setting specifications, is how they couched it, but they had cushions on either end. And I think that’s what you’re talking about when you’re budgeting. You budget this. You budget that. How much of a cushion did you put in, and whether you can, and whether you can clinically support that. For vaccine products, you often can, but for other interleukin products, you oftentimes clinically cannot, and so a lot comes back to that. That’s not statistics, that’s the clinical characteristics.
Question 9: Would it be acceptable to qualify material from an engineering run (new commercial manufacturer) as a PRS if material/manufacturing comparability has been established? There is no current 2-tiered RS system. If so, could the WRS then be established/qualified from a PPQ run? Or would it be better to establish the 2-tiered system from 2 PPQ runs? The engineering run is non-GMP if that is a consideration. This is a current commercial product.
Attendee: My question is related to batch selection. We have a commercial product that was approved 15 plus years ago, and it currently does not have a primary reference and there is no two tier reference system. We’re in the midst of a manufacturing process transfer to a new manufacturing site and looking to establish a two tiered system going forward. My question was whether it would be appropriate to use the engineering batch at the new manufacturing site for establishment of the primary or if we needed to select a PPQ batch
Seth: This is the question we get every day. I let the scientists and the ones doing those engineering runs help me understand, maybe the potential variability risks of using that engineering batch versus going to one of the PV batches. We would call it the process validation batches. Engineering batches are generally smaller, 10% scale, right? And then you go up to a process validation batch, which is generally at least full scale, or least half of what you would do as a normal batch. I generally lean on those scientists to help understand the impurity profiles and everything else that could potentially increase or decrease by going up to that pivotal batch. This helps me determine which material I want to use again. You want to use a representative material of batches when they are being produced. So I tend to push back on the scientists more and make them tell me which batch I should really use for those process validation scientists.
Attendee: Okay, in this case, it is the full commercial scale. Our engineering batch is manufactured at that full commercial scale. We did a pre comparability evaluation for pre change versus post change. It’s not the formal comparability, but at this point in time, the new batch is consistent with the pre change batch. Part of my question also revolves around supply. You know the process validation batches will be the commercial batch size as well. Thinking about a 20 year supply of a primary bank, we obviously have more material to use from the engineering batch than taking it from a potential commercial batch. So just wondering if there’s extra considerations to be taken into account if we did select an engineering batch.
Seth: Looking at those risks, I would say that risk seems low that there would be much of a change. As you mentioned, you could probably get more material from that engineering batch. And again, as long as you outline those risks beforehand and know that the potential would be that maybe you’ll have to implement a new or different standard at a later stage, if for some reason it was different. But it sounds to me the risk is fairly low with that, so I would probably move forward with the engineering.
Laureen: During the white paper writing process we discussed a similar question but it, was a little bit before this stage. It wasn’t trying to discuss this issue at this stage. I remember the discussion, and Jane, you referenced this earlier when you stated “I had never realized that you shouldn’t split a batch and put some of it down for the references, some of it down for the working”. Seth, you and Matt were very strong on that during writing the whitepaper. I can remember being absolutely shocked by this because my experience was predominantly that we were told not to have two different batches. We were told to split that pivotal batch as opposed to having two batches. I’ve seen both approaches being used. However, I remember it being specifically recommended, during negotiations of our dossier, that we should be splitting that first reference into the PRS and WRS. We were requested to not use this first WRS for more than a year, but they definitely wanted both of those because they wanted to get a better sense of how well the bioassay was working.
Seth: The batch was acceptable until it wasn’t. I think scientifically, it makes sense that you want those different batches, because you want to see the differences between those. But again, a learning experience, right?
Nancy: The whole concept of representativeness:what makes something representative? If your spec is 80 to 125 is having something at 85 representative?
Laureen: There is also the concept of clinically validated. Although we’re saying representative, but for that primary reference the real key is, what Jane said, it needs to be clinically validated. Clinically validated means you took that primary material from a pivotal batch which went it into patients to demonstrate efficacy. That’s what we really want for that primary.
Nancy: So that goes back to the question previously asked. If you haven’t put that engineering run into the clinic that doesn’t meet that requirement for a primary. It’s representative by other clear mechanisms of comparison. So I think that becomes really, really important,
Laureen: At some point we have to do that. Jaana was mentioning that at some point, you can’t get back to those clinical materials. You just can’t. It can happen in two months or two years or 20 years, but at some point you won’t be able to get back. That’s where you have to say it’s representative of what we put into the clinic. But that first primary reference material, if at all possible, and I have run into some cases where we couldn’t do that. Then we had to demonstrate that it’s representative, but to begin with, clinical validation is always our first preference. Sian, have you seen the same sort of thing?
Sian: I honestly wouldn’t want to comment.We’ve been at the other end of it, which is the testing side, looking at the results and looking at what comparability means. And it’s always the bioassay, which is the pain in the neck. You know, that’s always the as though that poses the most, most challenges. But as I understood from what Melissa was saying, it was already a commercial product. They had already demonstrated what that reference standard ought to look like. Therefore, as Seth was saying, it’s a low risk, because you’ve got so much history, right? You know, you’re making a change, but you’ve done all that comparability. Therefore, you know, it’s carrying on that representativeness,
Laureen: What you’re what you’re saying is that basically when we lay down that first primary reference, it is clinically validated, but when we move on, the working standards are not clinically validated, and at some point we run out of the primary so you have your second primary, primary reference, which are no longer clinically validated. It is what you’d said earlier, right? The replacement primary reference is essentially a working reference that may have more biological testing, especially if we’ve updated our characterization protocols
Question 10: If an in-house reference standard (RS) is to be used at several testing labs in potency assay(s), should all these labs be involved in the qualification of this reference standard? If yes, could you provide a recommendation for a better design for the RS qualification study (more specifically: how potency of RS is assigned, based on the combined results of all labs or potency of this RS is assigned for each lab based on the RS qualification results of each lab separately).
Jaana: I’m less experienced than Jane in this respect, butI think the more the merrier. Of course, it’s a benefit. If you can do it in several laboratories, it gives you better confidence on the actual value of the reference material. It wouldn’t be the way to go to to define a different value for each participating laboratory. The whole idea is to take all the data together and make conclusions from that combined data.
Jane: I think we’re talking here about a commercial setup where presumably you’re going to be using, nominally the same assay in the different labs. Is that the case? Can we ask, Are you nominally using the same assay?
Laureen: Let’s assume they are using the same assay.
Jane: That’s somewhat different from the collaborative studies for most of the WHO work. But yes, the input from each laboratory would be good, because you might find an anomaly in either the type of bio assay that you are using, or the way a particular bioassay protocol is handled in each lab might result in an differences in how they are actually doing the assay. You can also have variation in things like the water supply and so on, which might show up as an anomaly. I would say it’s definitely both worth having each lab contributing to the study, depending on the results you get and what you want to achieve. You will, you will assign a certain unitage that everybody is going to be using, but it will actually have highlighted some potential difficulties, yeah, and does that make sense?
Laureen: I have to say that this was a late stage clinical development case study, but where the company had three different facilities. They were moving really fast. And so they just sent reference material out and each lab basically set its own reference value. They were pretty close, but not identical. We noticed, as they continue to use those values, and got towards commercialization, that there was a bias. If you ran the potency in Lab A and you ran those same samples in Lab B, you got a consistent bias similar to what Raffaella was showing where she had that consistency between the assays of 10% or 5% bias. We saw that site to site, and it really did create problems, when one lab decided they were going to become the stability lab. So everyone sent their stability samples to them, and they had a reference that was assigned a slightly different value, even though it was the same reference. And all of a sudden, right in the middle of our stability studies, things went crazy. This was not a commercial product. I’m hopeful it would never have happened in a commercial environment.
Jane: But the thing is, if you are going to introduce, at a later stage another site, that one that’s the same as not having had them in the study, but one of the things you need to do is, if you introduce a new site, is see what their results look like on the protocol. Because a normally identical protocol is not going to have exactly the same assay conditions. You have to see whether it actually works. Oh, yeah, we’ve had
Nancy: Whenever you’re combining results to assign a value and calculate confidence interval, the assessment that you use, in the case of this situation, you’ll have lab to lab variability. If you only use one lab, you’re missing that variability, which truly exists. You’re just ignoring that. You really should be factoring that in, which will require a larger sample size, and then some of the things you were talking about, Laureen, eventually those biases will converge, averaging out, and everybody will be closer with less bias between them. The average will have taken into account all of the bias. Do you see what I’m saying? But you’ll have to have a large enough sample size to get the independent N that you need to have a confidence level is meaningful.
Sian: I’ve never done that, though. I’ve never qualified a reference standard from multiple labs. Seth, have you?
Seth: No. A lot of the times we’ll use the drug substance site because they’re the ones doing 95% of the testing. Sometimes we’ll have a drug product site that does 5% of the testing, right? If you have a site that you just qualified, would I want to include them in that testing? I probably would not for the reference standard.The risk might be a little bit too high for that. You know, throwing out your reference standard because you have a brand new site that’s trying to familiarize yourself with the method. Maybe not. Might depend on what you’re trying to do on that reference standard. Reevaluation is to help determine:, is it shifting, is it degrading? What’s your purpose there? You choose the site that’s most familiar with the bioassay, not the site that has the least familiarity.
Jane: You can weight the results. You can assess your results when they come in. I think you need to know what each site is doing. Basically, if you have one site that can’t tell the difference between the black, sticky stuff and the nice, transparent stuff you’ve sent out, then are you going to actually believe their results later? It’s going both ways, the qualification of the material.
Sian: but you use different mechanisms to answer those questions.. You certainly wouldn’t use a reference qualification because it’s too high risk. You would definitely do it with an assay transfer or aqualification here you’ve got multiple sites, but I certainly wouldn’t use it around a reference qualification it’s too expensive,
Jane: You’ve got the expiernce. In the ideal condition, you would getdata back from each lab, and then you would wait, Based on what you saw, you’re going to let that be the determining qualification. If somebody’s results show an enormous spread, then you’re not going to use them. Instead you will say, ‘Okay, this is different, or isn’t different.’ Ideally, you would actually have input from several labs, I think you’re not going to know whether one lab is actually able to detect something,
Seth: yeah, I mean, another thing to think about, right? You’re talking a large N if you have one of that small site, you may be quadrupling their testing for the year because of that one study, and they may not have the manpower,
Jane: Ideal situation. Yes, you come across practicality. But in an ideal situation, you would have input from multiple sites. But it may not be practical. You may know that one site doesn’t have the capability either resources, or they don’t have the actual ability to do it at that point, if it’s a new lab, and
Laureen: I was pointing out a different problem. I don’t think we should ever let each of these individual sites take the same reference and give it a different value assignment according to their site. Thats what I was talking about earlier. A different issue then what this conversation has evolved into, I do agree that we use the assay transfer and look for differences there, not with our reference. We, an pharma company is in a slightly different situation than a standards organization because we have more control. We can go in and say, what are you using? Why did you freeze it that way? That’s not what we meant in this protocol. It’s a joke in the industry, that if you’re starting to test the water in Lab A and Lab B, then you have no idea what’s going on.
Jaana: Yeah, if you think about WHO manual, there are the different options are all there. And if you think about the WHO establishment of WHO international standard, there may be 100 laboratories using their in-house methods to assign the value, and then a smaller cluster of that exercise may be, let’s say, the European Pharmacopeia group that has, say, 10 laboratories. And they then usually use the Pharmacopeia method, and that is treated as a little cluster, but included into the international standard establishment. If you think about manufacturers with their QC laboratories, and it may all be happening inside, I think that would be the usual way to do it, in one laboratory, as you have just described.
Question 11: Would this be where the assay controls from the potency assay would be considered?
Attendee: So is the recommendation that you should centralize any bioassay testing for any entity? Not just the reference, but wider.
Sian: It depends on, how how successful your product is. How many different countries you’re going to go into, What labs you’ve got, We’ve definitely made the case for centralized testing, as you have greater control over your assay, over your results. But sometimes you just can’t do that. Sometimes depending on where you’re product is going to you have to have testing in that region, like Russia, China, Europe
Laureen: Excuse me, me, if we test it in the US. It gets stopped at the European borders because they will not accept our bio assay results. They have to be tested somewhere in the EU, right?
Nancy: The other thing is,, we have to include is distributed manufacturing for other types of products. Products where you have multiple sites, where you are going to be testing at point of care, that automatically requires you have separate labs.
Laureen: I agree with you. We put in a lot of time and resources into transferring these methods, and some of them don’t transfer very well.
Attendee: How well do you control that? And what point do you say? And it goes back to Sian’s point, you know, what’s the demand on the testing? You know, how many territories have to factor in?
Laureen: We used to call them centers of excellence. And maybe you have three centers of excellence for this assay, or particular types of assays, in different geographical regions. And you work really hard to get those three sites to work the same and your references come from a different part of your organization, but they will work closely with these bioassay centers.
There were many more questions for the audience, many about how many runs do we do for a reference qualification? How do we assign values? And many more. We will address many of these tomorrow.