BEBPA Blog
Virtual Reference Material Conference: Your Questions Answered
Thank you to those that attended BEBPA’s Virtual Reference Material Conference in June 2024. Due to time limitations there were some unanswered questions at the end of Conference Days 2 and 3. Nobody likes unanswered questions! So we asked our speakers to provide answers. You can find them listed below.
Q for all: If you have extra Interim Reference Standard leftover (eg due to changing Interim Reference Standard after process change) do you archive it or destroy it?
Kim Dancheck: Archive it to have for historical comparison/studies.
Ken Miller: My recommendation would be to retain the remaining Interim Reference Standard and to have this proceduralized in an SOP
Q for all: In the 2013 CMC Forum (source of the 2014 paper) it was FDA OBP who noted the option of keeping the first Interim Reference Standard lot for potency assays for clinical continuity IF (1) process changes passed comparability studies and (2) the Interim Reference Standard was shown to be stable. Comments from panel?
Kim Dancheck: Yes. Not all process changes would require implementation a new Reference Standard batch.
Q for all: Do ANY of you use and Assay Control, in addition to a Reference Standard, on every plate?
Kim Dancheck: Yes
Ken Miller: Yes, the Assay Control can then be used to monitor method performance in terms of % relative potency
Q for all: For the Primary Reference Standard, how is stability assessed, vs which Reference Standard? Secondary Reference Standard? Do you only perform curves and monitor the curve parameters? What are the acceptance criteria?
Kim Dancheck: Long term, moving away from bioassay curve parameter monitoring and relying upon a holistic approach for orthogonal information is best. Curve parameters can change based on different equipment, different critical reagents, different labs, analyst technique, manual vs. automation, etc. The likelihood of having all of these stable and constant in order to assess the continued stability of the Primary Reference Standard over decades is very low. If the primary reference standard is able to be used for routine testing for some period of time, a robust baseline of curve parameters, as well as relative potencies of test samples or the assay control sample against the primary reference standard could be built in order to compare to in the distant future. Additionally, when the working reference standard is qualified or requalified against the primary reference standard, the data is another check for any trends that could indicate potential degradation of the primary standard. Also, included in the requalification are the tests that are known to detect modifications that have an impact on potency. Modifications known to impact the potency of such as clipping, modifications, oxidation, aggregation, etc. should have physicochemical testing performed to assess continued suitability of the primary reference standard.
Q for all: Do you always use an absolute test methods for potency testing of the reference standard? If not were you requested by authorities to a have an orthogonal absolute method e.g. mouse LD50 to verify the potency?
Kim Dancheck: Absolute quantitative acceptance criteria (e.g., EC50) have been requested by regulatory authorities. Because the curve parameter values generated in a potency assay can change (even significantly), even when the reference stadnard remains stable, it has not been seen as a best technical practice. Such changes that are not related to a change in the suitability of the reference standard can include changes in consumables, critical reagents, assay controls, instrumentation, sites and or analysts perform the run.
Q for all: Finney (1978) says a few things that don’t seem to be getting enough attention: ‘The fundamental assumption in bioassay is that the reference and test samples contain the same analyte.’ Bioassay is extraordinarily sensitive to some (perhaps unpredictable) differences in active compounds, this is part of why similarity is so important. Given this, why not exploit this as part of identity?
Kim Dancheck: Depending on the asset, and the markets it is to be registered, many medicines do have an identification specification to prove the compound of interest, in addition to a relative potency.
Q for all: I’ll agree with David (a rare thing, 😁) that we cannot make assumptions, including about stability of BRPs for emerging Biologics. How can we (the field) get access to data on the validation of the various bioassays used in study labs to establish BRPs plus the bioassays at the standards agency to confirm they are appropriately stability indicating?
David Lansky: Bioassay, process, and reference standard validation/qualification reports (and data) are usually considered confidential proprietary information that are disclosed to regulators when and as required. Multi-site or round-robin (ideally with coded samples) studies (e.g., those published by NIBSC) are particularly valuable in showing how different versions of assays and reference standards compare to one another in different labs. Publishing validation studies helps us all. Some ‘white papers’ that describe best practices are helpful. The most useful include example data and analyses (even if artificial) and use evidence or principles to justify claims of ‘best practice.’
Q for all: Question to the audience: Dean mentioned using Kjeldahl to determine total protein. What other methods are you using to measure total protein of biologic reference standards?
Jon Valgeirsson: It seems to me that AAA is the most employed method currently.
Q for all: If you attempt to qualify a Reference Standard with a lyo formulation, do you have to include a homogeneity study (shelf/position) to the qualification?
David Lansky: No, but including homogenity (against shelf position and other potentially important location, sequence, or batch effects) will strengthen claims of suitability for use. Separate studies (e.g., robustness) can address these concerns. Note that it can require a rather large study to deliver narrow confidence intervals on estimates of potentially important robustness factors such as location or sequence; that fact can make it effective and efficient to include homogeneity testing as part of ongoing monitoring rather than as part of a (relatively small) validation or qualification.
Q for all: Is there benefit to using relative potency (%) rather than absolute values with defined units? Could the use of absolute values reduce likelihood of drift between bridging of Reference Standard by being able to assess data back to Reference Standard lots that no longer have inventory and can’t be directly tested against?
David Lansky: Nearly all biological assays have appreciable variation in (log) EC50 from assay to assay. While this variation may appear small in a well managed lab over relatively short time periods (e.g.; a few years); large variation in (log) EC50 is probably the most important common characteristic of biological assays. This large variation in (log) EC50 is why bioassay design and analysis focuses on within-assay (and in some cases within-block) specific comparisons between reference standards and test samples (where the comparisons of interest are estimates of non-similarity and log relative potency). There are good ways to combine either log relative potencies or low-level (e.g., well) observations to simultaneously model the degradation of different reference standards, production lots, along with shifts associated with batches of critical reagents and variation associated with assays, analysts, etc. as illustrated by the ‘comprehensive stability analysis’ in my presentation. This is a much better way to combine information across many assays, many reference standards, and many other lots to effectively estimate the (at release) relative potency of any new lot compared to the initial potency assigned to (say) an early reference standard even when one or more earlier reference standards are no longer available.
Q for Ken: What is the difference between a development and an Interim Reference Standard?
Ken Miller: There is no official definition for Development Reference Material (DRM) and Interim Reference Standard (IRS); however, in the case study that I presented, the DRM would be the initial reference material and be derived from the GLP tox material, while the IRS would be derived from the Phase 1 clinical trial material. In this situation, the assignment of relative potency for the DRM would be set to 100% as it is the initial material, and the assignment of relative potency for the IRS would be based on testing IRS (test sample) against the DRM (reference standard).
Q for Ken: Any advice about primary reference standard selection for cell therapy DP with relatively short shelf-life?
Ken Miller: You might want to consider not implementing a two-tier approach (PRS and WRS) in this situation, since Primary Reference Standard would have to be replaced more frequently. You would then need to take that into account when preparing the Primary Reference Standard, since it would be required for qualification testing of the next Primary Reference Standard and routine QC testing.
Q for Ken: For a non-cell-based binding assay, do you force parallelism to determine EC50?
Ken Miller: Testing and confirming parallelism between the concentration-response curves of the reference standard and the test sample is essential for determining relative potency in any bioassay, regardless of whether it is a cell-based or non-cell-based binding assay.
Q for Isabelle: You had process changes after 2nd Interim Reference Standard. Should the Primary Reference Standard be selected for similarity to current product rather than 2nd Interim Reference Standard?
Isabelle Meira Silva: This is very dependent on the process change that took place. It may be that the process has no impact on potency. In our case, although the first batch we tried did not work (3rd Interim Reference Standard), we are actively attempting to find a batch from the current process that is a match to the 2nd Interim Reference Standard potency. We attributed the mismatched potencies to the normal batch to batch variations we see from the process.
Q for Isabelle: After process development, 2nd Interim Reference Standard becomes much more potent (e.g., 20% more potent) than 1st Interim Reference Standard. Can we use 2nd Interim Reference Standard to replace 1st Interim Reference Standard?
Isabelle Meira Silva: You can implement the new higher potency Interim Reference Standard if you think it is appropriate, but you must bridge it back to your previous one to understand the differences and impact to the data. You also have to be careful with ongoing stability and the implementation of a new Interim Reference Standard with different potency. At the end of the day, you just need to explain and de-risk your decision to implement the new Interim Reference Standard and tell a full story tying it back to the initial Reference Standard you used.
Q for Isabelle: If you have a correction factor between interim reference standards, is your assignment on the Primary Reference Standard assigned as a certain value based on a definition, or is there something relative to the Interim Reference Standard (and potentially a correction factor)?
Isabelle Meira Silva: For our case, we are trying to match (as close as possible) the potency of the primary to the current Interim Reference Standard so that we can assign 100% again. Ideally, we are looking for a batch that is with within 5% difference from the current Interim Reference Standard. If that proves to be impossible then we will have to resort to applying a correction factor to tie it back to the Interim Reference Standard (but that is not desirable).
Q for Isabelle: If you value assign your 2nd Interim Reference Standard, now you qualify the 3rd Interim Reference Standard – do you determine you are within 5% of 100% and then give it that value? (You would be doing this even though there is no 100% Reference Standard in your assay.)
Isabelle Meira Silva: Based on how we determined internally (with support from our statistician), yes. We saw that a less than 5% difference has no practical impact.
Q for Isabelle: Do you always perform the same number of potency assays for annual requalification as during initial qualification? And is there a defined reference for this re-testing? Thank you!
Isabelle Meira Silva: During initial qualification we perform more than 12 (at minimum). Then at re-qual we only perform one assay and ensure it is within the confidence interval of the initial data set. For bridging we perform more and we determine how many on the degree of variability we think the assay will be under in the real work. For example, if there are more than one lab performing the assay we include both; if 5 analysts will perform the assay, ideally we include data from each, etc… The last bridging study we did we included 32 data points.
Q for Isabelle: For basing the potency for the Primary Reference Standard and Secondary Reference Standard on that for the 2nd Interim Reference Standard, if there was a process change between the 2nd Interim Reference Standard and 3rd Interim Reference Standard, is it acceptable to base the potency for the Primary Reference Standard on an Interim Reference Standard lot (probably) not representative of the current manufacturing process?
Isabelle Meira Silva: Yes, in our case. Although there was a process change we confirmed that the process/product was comparable. But the potency value has more to do with continuity of data rather than representation of the process here. If, at the end of the day, we cannot find a batch by the new process that is within 5% of the 2nd Interim Reference Standard potency we will use a correction factor to translate between data generated by the old reference and the new. But this is not preferred and gets complex when having to trend ongoing stability (especially) that experiences assays with both reference standards.
Q for Kim: For the first interim Ref Std, how is stability assessed, vs which RS or batch? Do you only perform curves and monitor the curve parameters? What are the acceptance criteria?
Kim Dancheck: Before filling the standard, a batch in which representative stability is being performed is chosen as the starting point, and the Reference Standard batch is stored at conservative conditions to enhance stability. Therefore, we do what we can to assume the batch is stable. For the pre-primary reference standards, curve parameter monitoring is one set of the data examined to assess for stability. Part of asset development, or quality by design, as well as over time having broader platform knowledge, is to know what non-potency methods might measure which could have an impact on the potency (make it sub-potent or super-potent). So in addition to potency curve parameter monitoring, physicochemical properties are assessed for trends. Lot release and stability (nominal temperature) can be assessed as well (e.g., if the RS is losing potency, the sample values should start to trend higher), but in early development there may not be a lot of batches manufactured. If relying heavily on plate control trending, there needs to be readily accessible information pertaining to the runs to know what consumables, critical reagents, instrumentation, sites and analysts perform the run, since there could be differences in curve parameters that happen that are not relevant to reference standard stability/suitability. For the long-term for Primary Reference Standard, movement away from bioassay curve parameter monitoring for the purposes of RS stability is desirable. Curve assessments are a part of method maintenance and control, but once the independent WRS is in place, relying on Primary Reference Standard physicochemical testing and trending, plate control vs. Primary Reference Standard trending, WRS vs. Primary Reference Standard testing, and looking at any studies utilizing the Primary Reference Standard is preferable to give a holistic and long-term picture of the Primary Reference Standard stability.
Q for Kim: If you have multiple potency release assays, do you use a single Reference Standard for all assays? Would you need to value assign for each method? And most importantly is okay in this system to have very different values? (Example you have 3 methods and 2 are close to 100% but the 3rd is much lower) If you allow this is there anything special needed to qualify that Reference Standard?
Kim Dancheck: Ideally yes. If during replacing an Interim Reference Standard, or if there’s a process change that makes it desirable to make a new specific potency method Reference Standard (maybe difference in ADCC results vs. CB results), that can be done. All Reference Standard batches need to go into the dossier and explained, so that is another reason to limit the number of Reference Standard batches or the intended use of the Reference Standard batch to allow for the majority of the testing. During the initial Primary Reference Standard it would be desired to start the baseline over again with one batch for all test methods; bridging testing will be important. By the time the independent WRS is being made, if certain properties have a large effect on one potency method vs. the others, then picking a batch that has the parameters that should result in the same potency for all test (e.g., 3 methods) should be pursued vs. correction factor processes. It might not always work out, but by testing some in-date DS batches with higher replication could inform a decision on the best source batch to pick to make an Reference Standard batch.
Q for Kim: Have you had any regulatory pushback on choosing 1st Interim Reference Standard from mixed pool of clones? Assuming your clone material is from a single clone.
Kim Dancheck: Generally we have a single clone for the tox batch, so the first Interim Reference Standard isn’t from a mixed pool of clones. There were instances during the pandemic when the Reference Standard batch was a mixed pool of clones, and this was acceptable for the particular situation and timing needed for emergency authorization situations. I’m not sure what pushback we might have for routine submission situations.
Q for Kim: Does retest date increase as Reference Standard age increases. IE if you have 2 yr real time on the Reference Standard lot, you extend next retest of that same by 2 yrs?
Kim Dancheck: Yes it can, but….. Generally, we want to get at least 3 evaluations (for trending purposes) before extending to a time longer than one year. But if we have 5+years (maybe decades) experience on a previous batch, then we might date the extension of a newer RS batch as 2 years after 2 years of that Reference Standard batch testing, based on extensive product/process knowledge overall. With 2 years of data on the new batch, we can extend another two years, but we can’t extend longer than the timeframe that exists on that batch. Therefore, the time period between requalifications is based on demonstrated stability of the reference standard, supported by previous requalification data. If any stability indicating property shows evidence of change outside of analytical variability over time, there should be a technical assessment made to evaluate if there is an impact of the reference standard for its intended use. Results that pass acceptance criteria but demonstrate potential degradation will be considered when justifying the next requalification period. In the event of suspected change, the time period until the next requalification is not extended or can be reduced to as little as six months.
Q for Paul: Is it true that the Arrhenius modeling is only useful for lyophilized formulations? For solution standards, would the phase change from frozen to liquid would invalidate the Arrhenius relationship? We have also seen non-Arrhenius behavior in the transition from solution to frozen (e.g., aggregation).
Paul Matejtschuk: Certainly we use Arrhenius modelling primarily to fit our lyophilised materials. As you say a major phase change such as freeze/thaw would make it unsuitable for studying degradation in say a frozen liquid though presumably it would still apply from the liquid phase going upwards from the phase change (say 4- 56C?). We find that some of our lyiophilised materials show insufficient degradation to allow an Arrhneius fit.
Q for Paul: Do you confirm the bioassays used to monitor BRPs are stability indicating for degradation pathways other than thermal degradation?
Paul Matejtschuk: No because the concentration of active material in many of our bioactivity standards is too low often to allow an anaylsis by phys chem methods. This is because such standards are often still diluted quite considerably before the assay of activity can begin, so a more concentrated product would mean the dilution factor becomes the main source of error. Accelerated degradation is often a worst case scenario, and sometimes isothermal real time degradation never attains such high levels of deterioration. Our standards are meant to measure bioactivity and as such the stability indication against that parameter is the most relevant for us.
Q for Paul: How critical is a vacuum effect for particle aggregation?
Paul Matejtschuk: Vacuum is of course one of the only three parameters you can adjust in freeze drying (time, vacuum and temperature). I have certainly seen cases where modifying the vacuum has influenced the product cake appearance, and this may well trap more moisture and so result in a more degraded product, but I can not say we have evidence that vacuum would influence aggregation per se. I think aggregation would be most likely influenced at the freezing stage and therefore the rate of freezing, use of factors such as thermal tempering (annealing) and settling of material, before a completely immobilised state is achieved, are the most likely sources of increased aggregation.
Q for Vanessa: Is the Potency Acceptance Criteria of 90-110% with a requirement of 95% CI to contain 100% applied to interim RS being qualified against a different RS as the comparator or itself? This seems tight during early phase development given method variability prior to validation.
Vanessa Aquier: For the first Interim Reference Standard we only do one test vs itself and we apply only the 90-110% acceptance criteria. For the following one we apply the same criteria but we have more tests (at least 6 occasions) and for the primary and secondary RS we add the CI acceptance criteria. For early phases we are also allowed in our SOP to justify a larger AC if the method is quite variable.
Q for Vanessa: So you don’t characterize the initial Interim Reference Standard used to test the next Interim Reference Standard, but wait and fully characterize them together? What do you do if they show substantial differences in the characterization data?
Vanessa Aquier: Yes the Characterization of both is coming quite soon as the first GMP batch is manufactured, so far we had no issues but if it happens that we have significant diff we would investigate and see if we need to retest with the GMP RS.
Q for Vanessa: what level of confidence are you targeting with an n=6 individual occasions?
Vanessa Aquier: We do not define the nber of testing based on the CI but we want to achieve a reasonable level of method variability has shown in my example and with a reasonable nber of occasions for the lab.
Q for Vanessa: You mention that control chart for Secondary Reference Standard is in place in each laboratory running the potency assay. How do you compare the control charts across different laboratories? Do you have a centralized process for control charting?
Vanessa Aquier: There is no specific process but we are looking at the mean value and STD of each lab in light also of the initial TT and method variability to see if labs are diverging.
Q for Vanessa: Is the same number of runs used for periodic retest of Primary Reference Standard as for its initial qualification?
Vanessa Aquier: Yes, 6 runs was assessed as sufficient as showed in my example (it includes 24 to 36 data) so we keep the same design for initial qualif and periodic retest.
Q for Vanessa: Can you elaborate on “not performing the characterization of Interim Reference Standard at its release”. Is that only when you release/qualify it as an Interim Reference Standard during preclinical phase? I assume that if you enter clinical phase you have to do extended characterization but when in the lifecycle timeline do you introduce what you call a GMP Interim Reference Standard?
Vanessa Aquier: As the first GMP Reference Standard is coming quickly as from 1st GMP batch we qualify them together (with the Interim Reference Standard). So it will be available at time of first clinical submission.
Q for Vanessa: What if the control drifts in quite the same way as the Reference Standard? Relatively to each other there might be not much differences in potency found in this case.
Vanessa Aquier: Indeed if both material drift the same way we will not see it in the potency Control chart. But the control sample is also usually used in other methods where it is not tested/determined vs the Reference Standard so we will be able to detect if the control sample is degrading and investigate. Most of the time degradations are first noticed in physchem methods vs potency.
Q for Vanessa: If trending potency for an Interim Reference Standard using a control chart and have no criteria for the ec50(ed50) are you showing how the trend extends for the months ahead or simply base retest extension based on performance to date?
Vanessa Aquier: We extend SL by 1 year based on the periodic retest and with the confidence that during past year we didn’t had a trend.
Q for Vanessa: Just curious when you bridged your bioassays did you verify equivalent performance between assays using different types of degraded samples? An FDA talk for bioassay bridging showed the two bioassays performed the same with normal samples but very different with different degraded samples.
Vanessa Aquier: Amongst the samples tested we had diff stability samples but all were around 100%, hence we diluted them to have the 50% and 200% level to cover the range. We also assessed forced degradation samples which were not around 100%
Q for Vanessa: The periodic re-test is monitored on the control chart created from data of the Secondary Reference Standard? After sufficient data points are obtained would a degradation analysis be beneficial to justify a longer extension period?
Vanessa Aquier: We tried to propose 2 years but this was challenged by HA so we keep a periodicity of 1 year (even if we had previous stability on previous Reference Standard)
Q for Vanessa: What would be your decision if all sequential reference standards pass the acceptance criteria (CI and potency range), but there is always a slight shift towards the same direction?
Vanessa Aquier: We limit the nber of Reference Standard before introducing the 2 tier system so we are not concerned by slight shift (ie should be very limited regarding method variability)
David Lansky: How big is ‘slight’ compared to the product specification range? If the total shift in potency is a few percent of the specification range it may be a non-issue; if the shift in potency is 30% of the specification range, it would be important to understand what is causing the trend.
Q for Jon: Has anyone considered the eventual compendial compliance problems associated with allowing a biosimilar to be put on the market based on a significantly different extinction coefficient? Although you are correct that most differences will “wash out in the clinical data”, it is still a bad idea to have two products on the market that are different doses. See Replies for more detail.
Jon Valgeirsson: I am not sure. But I fully share your concerns that if there are already two or more products on the market with different extiniction coefficient, then the pharmacopeia will have a hard time with publishing only one of them. So perhaps they will have to be silent on the extinction coefficient as such, but only supply a standard material for companies to use as external standard in the OD280 assays.
Q for Jon: Your data showed how much variability exists in potency among lots of RMPs in different bioassays. You chose 1 lot for your first in house Reference Standard knowing the risk of bias (there is no WHO BRP for your Mabs) Your thoughts on WHO choosing 1 lot to serve as an international Reference Standard for potency in multiple bioassays vs pooling lots to make a composite Reference Standard for each Mab product?
David Lansky: As others have said, it is really important that the reference standard be representative of the manufacturing process. You can’t assess that with a single lot, particularly for a product that contains (as many do) several different active forms (e.g.; differently glycosylated) of the product. It is important to characterize the production process, e.g., what are the ranges of the ways in which the active material varies (e.g., % glycosylated) and choose as ‘the’ reference standard, a lot that is typical (near the middle of the distribution). It may be important to use a panel of reference standards to represent the range of the production process.
Q for David: Do you have any concerns about pooling potency data sets from different labs and disparate bioassays (including different cell lines) using different assay preparation procedures to assign a single unit of “activity” for a single Mab batch?
David Lansky: Yes, there are some critical issues to consider when combining data across different assay systems even for a single (common) test sample. Combining relative potency estimates is only sensible uncertain conditions. If the different assay systems use different reference standards, it is important that there be solid evidence that each reference standard contains the same active components and it is important that the relative potencies of the different reference standards compared to one another be well known. When combining (log) relative potency estimates for a sample across different assay systems it is almost certainly important to use weights that (are (most likely) inversely proportional to the estimated variances of log potency from each assay system. If the assays use qualitatively different references, it would almost certainly be more appropriate to report the collection of (log) relative potency estimates rather than try to combine them into a single estimate.