Cell Counting and Viability Assessment of 2D and 3D Cell Cultures: Expected Reliability of the Trypan Blue Assay

Background Whatever the target of an experiment in cell biology, cell counting and viability assessment are always computed. The Trypan Blue (TB) assay was proposed about a century ago and is still the most widely used method to perform cell viability analysis. Furthermore, the combined use of TB with a haemocytometer is also considered the standard approach to estimate cell population density. There are numerous research articles reporting the use of TB assays to compute cell number and viability of 2D and 3D cultures. However, the literature still lacks studies regarding the reliability of the TB assay in terms of assessment of its repeatability and reproducibility. Methods We compared the TB assay's measurements obtained by two biologists who analysed 105 different samples in double-blind for a total of 210 counts performed. We measured: (a) the repeatability of the count performed by the same operator; (b) the reproducibility of counts performed by the two operators. Results There were no significant differences in the results obtained with 2D and 3D cell cultures: we estimated an approximate variability of 5% when the TB assay was used to assess the viability of the culture, and a variability of around 20% when it was used to determine the cell population density. Conclusions The main aim of this study was to make researchers aware of potential measurement errors when TB is used with a haemocytometer for counting and viability measurements in 2D and 3D cultures. We believe that these results can help researchers to determine whether the expected reliability of the TB assay is compliant with their applications.


Background
The evaluation of cell population density (i.e. the total number of living cells in the culture) and cell viability (i.e. the percentage of living cells in the sample) is fundamental during biology studies [1]. The majority of laboratories engaged in cell biology routinely perform cell viability and counting analysis for different purposes, ranging from ecosystem investigation [2] to proliferation studies [3], in both 2D (two-dimensional) [4] and 3D (three-dimensional) cell cultures [5].
Among the various typologies of 3D cell cultures, multicellular tumour spheroids are those typically used for testing drugs and radiation treatments [6]. The measurement of viability and the reduction of cancer culture population are fundamental parameters for evaluating the efficacy of the treatments under investigation [7]. Accordingly, the reliability of the method used to estimate these parameters plays a key role in this analysis [8]. In addition, cell counting and viability assessment often need to be performed for other 3D cell cultures, such as stem cell spheroids generated for regenerative medicine purposes [9], and organoids used to study (some) organ characteristics [10].
Many different methods (e.g. AlamarBlue ® and MMT assay) and systems (e.g. Bio-Rad TC20™ Automated Cell Counter, ChemoMetec NucleoCounter ® , Beckman Coulter Vi-CELL™ XR Cell Viability Analyzer [11]) can be used to analyse cell viability [12]. Most of these share the same approach: the cells are stained using a light (or a fluorescent) dye to highlight dead cells (or living cells), and a detection system counts the number of cells highlighted, in addition to the total number of cells. Finally, cell viability is computed as the percentage of healthy cells in the sample [13]. However, the Trypan Blue (TB) dye exclusion assay [14] ,the first method proposed in the literature, is considered the standard cell viability measurement method [15] and is still the most widely used approach [16]. Furthermore, TB paired with a haemocytometer grid (Fig. 1) is regarded as the standard approach for estimating the cell population density [17], i.e. the total number of living cells in the culture [18].
TB was synthesised for the first time in 1904 by Paul Ehrlich (Nobel prize in medicine, 1908) and was first used for clinical analysis before becoming a standard probe in biology. Today it is still widely used for several medical purposes such as the visualization of the lymphassociated primo vascular system [19] and of the anterior capsule during cataract surgery [20]. Chemically, TB is defined as toluidine-derived dye characterized by a molecular weight of 960 Da [15]. Its chemical construction is C 34 H 28 N 6 O 14 S 4 . Azidine Blue, Benzamine Blue, Chlorazol Blue, Diamine Blue, and Niagara Blue are synonyms for TB. TB is a cell membrane-impermeable molecule and therefore only enters cells having compromised membrane. From a practical point of view, with TB the cell viability is determined indirectly by detecting cell membrane integrity [21]. Upon entry into the cell, TB binds to intracellular proteins and in brightfield the dead cells appear blue (apoptotic and necrotic cells are not distinguished [1]), whereas the colour of living cells remains unchanged (Fig. 1c).
Over the past two decades a number of studies comparing TB with other assays have been published [15] and several methods have proven more efficient than TB [22], especially those using fluorescent dyes [23]. The use of TB has, in fact, several drawbacks [24]: (a) TB exerts a toxic effect on cells after a short exposure period, thus limiting cell counting to only a brief period after staining [25]; (b) As TB binds to cellular proteins, there is a potential for binding to non-specific cellular artifacts, especially in primary cells from clinical samples; (c) There is a large number of false positives, i.e. "dead cells" resulting from irreversible damage to their membrane, and false negatives from cells that have already initiated the apoptotic pathway but still have intact membranes; (d) There is no standardized TB concentration for the measurement of cell viability; (e) Manual counting using a haemocytometer and a light microscope is time-consuming and operator-dependent. Although the TB assay requires the use of a fluorescence microscope, it has long been known that several fluorescent dyes are more reliable indicators of cell viability than the more traditional coloured dyes [26]. For example, Acridine Orange (AO) and Propidium Iodide (PI) stainings have been shown to be more accurate in detecting live and dead cells than TB [27]. AO is a membrane-permeable cationic dye that binds to nucleic acids of viable cells. At low concentrations it causes a green fluorescence. PI is impermeable to intact membranes but readily penetrates the membranes of nonviable cells and binds to DNA or RNA, causing orange fluorescence. When AO and PI are used simultaneously, viable cells fluoresce green and nonviable cells fluoresce orange under fluorescence microscopy. Notwithstanding, TB is still the most commonly used dye for cell viability analysis because it is inexpensive, easy to use, it reacts quickly, and can be visualized with a standard brightfield microscope available in all biological laboratories [2]. TB is also used in several automatic counters [28] and as the reference method for comparing customized cellcounting algorithms [29]. However, in-depth validation studies of the TB assay used in combination with a haemocytometer in viability and counting measurements are lacking. Several articles have provided statistical Fig. 1 Haemocytometer grid containing cells stained with TB. a Picture of a Kova glasstic slide with grids (Hycor Biomedical Inc.). Each slide contains 10 counting chambers. b Schematic representation of the grid of a counting chamber. c Cells in brightfield are characterized by very low contrast. This magnified real-world detail shows some living and dead cells. In particular: a and b show the typical appearance of a living and a dead cell (stained with TB), respectively analyses on its reliability. In 1964, Tennant [30] and Hathaway et al. [31] performed preliminary studies comparing TB, eosin Y and AO for the determination of the viability of in vitro and in vivo cultures. Twenty years later, Jones and Senft [26] also considered fluorescein diacetase (FDA) and PI. In 1999, Leite et al. [32] extended the research into this area, comparing the reliability of TB, AO and six other methods (i.e. Giemsa staining, ethidium bromide, PI, Annexin V, TUNEL assay and DNA ladder). In 2000, Mascotti et al. [27] published an in-depth comparison between AO/PI and TB assays in which the viability of 7 aliquots of hematopoietic progenitor cells (HPC) and the percentage of viable cells was calculated as the average of 5 viability measurements performed by two operators. However, as the raw counting data was not reported, it was not possible to quantitatively infer the repeatability (intra-rater reliability) and reproducibility (inter-rater reliability) of the counts. The first study on the repeatability and reproducibility of the TB assay appeared in 2011 when Sanfilippo et al. [33] assessed the reliability of TB and calcein AM/ethidium homodimer-1 (CaAM/EthD-1) staining in fresh and thawed human ovarian follicles. Measurements were performed by two independent operators. Reliability was evaluated by the intraclass correlation coefficient (ICC) and the differences between paired measurements were tested by the Wilcoxon signed-rank test. TB proved to be the more reliable staining method to evaluate follicle viability. However, the operators only evaluated 10 samples simultaneously. Finally, in 2015 Cadena-Herrera et al. [34] validated a manual, semi-automated, and fully automated TB exclusion-based methods. A single operator counted several samples in triplicate and the results obtained did not reveal a significant difference between the automated methods and the manual assay. However, 3D cell cultures were not taken into account and no considerations about measurement errors between different operators were made.
In this work we studied repeatability and reproducibility with the specific aim of assessing measurement errors occurring when TB is used in counting and viability applications in 2D and 3D cell cultures. Repeatability is the closeness of the agreement among subsequent measurements of the same object carried out under the same measurement conditions. Reproducibility is defined as the closeness of the agreement among measurements of the same object carried out under different measurement conditions [35]. In particular, the viability and total number of living cells of the culture were the "objects" being measured in our experiments. Thus, the operators performing the measurements represented the changing "condition" when assessing reproducibility. In practical terms, each operator generated and analysed 5 different samples from the same 13 2D cell cultures and 8 3D cell cultures (i.e. multicellular spheroids), making a total of 10 samples considered for each culture. Repeatability for each culture was evaluated by calculating the variability of the measurements obtained by the single operator. Conversely, reproducibility for each culture was estimated by comparing the measurements obtained by two operators. Overall, 210 samples were analysed ( Table 1).
The main aim of this work was to make researchers aware of the measurement errors that can occur when the TB assay is used to evaluate population and viability of 2D and 3D cell cultures. Given that this is a preliminary study, global accurate overall accuracy values of assay reliability used in different contexts and with different cell lines cannot be provided. However, we believe that our findings can help researchers to evaluate whether the expected repeatability and reproducibility of the TB assay are compliant with those required by their own application.
All flasks Ai were prepared simultaneously in the morning and kept in the incubator for 24 h. Then, as previously done by Cadena-Herrera et al. [34], each flask A i was subjected to a different thermal shock to differentiate the cell viability between flasks. A 1 and A 2 were simply moved from the incubator to a sterile laminar flow hood at room temperature. A 3 and A 4 underwent a freeze-thaw cycle (incubator at 37°C, freezer at −80°C and were then returned once to the incubator at 37°C). and A 8 for 30 min. Of note, the thermal shocks were carried out sequentially in the morning and the counting measurements were performed for all the flasks in the afternoon of the same day. We used gemcitabine, a well known chemotherapeutic agent used to treat several tumours, including pancreatic cancer [36], to modulate the viability of the cells contained in the different P k . All P k were prepared simultaneously on the same morning and gemcitabine was tested at scalar concentrations of 5 μM (flask P2), 50 μM (P3), 500 μM (P4), and 1000 μM (P5). P1 contained untreated cells. An exposure time of 1 h followed by a 72-h wash out was chosen on the basis of peak plasma levels defined in recent pharmacokinetic studies [37].

3D Cell Cultures
The A549 cells described in Section 2.1 were also used to produce the multicellular spheroids. Several systems and methods are available to generate in vitro multicellular spheroids of different dimensions [38]. We used a rotatory cell culture system, the RCCS-8DQ bioreactor (Synthecon Inc., Houston, TX, USA), which is capable of controlling up to 4 rotating chambers, even at different speeds. The rotator bases were placed inside a humidified, 37°C, 5% CO 2 incubator and connected to power supplies on the external side of the incubator. All activities were performed in sterile conditions under a laminar flow hood, as previously described [7]. Briefly: a single cell suspension of about 1 × 10 6 cells/ml was placed in a single 50-ml rotating chamber at an initial speed of 12 rpm (rpm), increasing as the size of the spheroids increased to avoid aggregate sedimentation within the culture vessels. The culture medium was changed every 4 days. After 15 days the spheroids had reached a diameter of 0.5-1 mm and were transferred (one spheroid/well) under a sterile laminar flow hood to 96-well low-attachment culture plates (Corning Inc., Corning, NY, USA), each well previously filled with 100 μl of fresh culture medium. After the spheroidization time (i.e. 1 week [7]), each spheroid was imaged in brightfield using an inverted Olympus IX51 widefield microscope equipped with an Olympus UPlanFl 4×/ 0.13na as a standard objective lens and endowed with a Nikon Digital SightDS-Vi1 camera (CCD vision sensor, square pixels of 4.4 μm side length, 1600 × 1200 pixel resolution, 3-channel images, 8-bit grey level). For spheroids with partially out-of-focus borders, we acquired a zstack of brightfield images and reconstructed a single 2D image fully in-focus by using the open-source tool previously described [39]. We then vignetting corrected the images with CIDRE [40], segmented the spheroids using AnaSP [41], and computed their volume by ReViSP [42,43]. To assess TB reliability, eight compact spheroids with regular shape but a different volume (called SP i , i = 1, …, 8, Fig. 2) were transferred to a different plate and digested into single cells using a Trypsin/EDTA 1× solution (Euroclone, Milan, Italy) [44].

Sample Preparation
We used a haemocytometer (Kova glasstic slide with grids, Hycor Biomedical Inc., Fig. 1b) and a commercially available TB preparation (TB solution 0.4%, SIGMA-ALDRICH, Buchs, Switzerland) to perform the counts. A detailed description of the protocol adopted with TB is reported in [11,21] and [45]. In brief, for each Ai we:

1) detached the cells from the flask by trypsinization;
2) centrifuged the cell suspension for 5 min at 1200 rpm; 3) resuspended the pellet in 1 ml of culture media using a pipette to obtain a single-cell suspension; 4) removed an aliquot of 100 μl; 5) added 100 μl of TB solution 0.4% to obtain a final 1:2 dilution; 6) waited for 5 min to allow the TB to stain the dead cells; 7) counted the cells using a haemocytometer and a light microscope; 8) calculated the percentage of viability and number of cells in the culture by considering the final dilution factor.
We followed the same protocol for the different P k but used a 1:6 dilution. For the different SPi we used the same protocol as that used for Ai but with the pellet resuspended in 200 μl of culture media (not 1 ml, as described in point 3).
Two expert operators (hereafter O 1 and O 2 ) performed a double-blind evaluation of the viability and population of a set of 5 single-cell suspensions (S k , k = 1, …, 5) for each A i , P k and SP i ; making a total of 210 samples analysed. Of note, both O 1 and O 2 prepared their own suspensions for each A i /P k /SP i . Using a Falcon 2 ml serological pipet for each S k they gently pipetted up and down 30 times in about 15 s to disaggregate all the possible cell clumps before loading a drop into a counting chamber. Differences in viability due to different cultivation/waiting times were avoided by simultaneously counting the samples of the same flask/spheroid in double blind. In particular, the operators used two widefield microscopes with similar optics, located in the same room and used daily for counting applications. The first was an inverted Olympus IX51 widefield microscope equipped with an Olympus UPlanFl 10×/0.30na Ph1 objective infinity corrected, while the second was an inverted Zeiss Axiovert 200 widefield microscope equipped with a Zeiss Achroplan 10×/0.25na Ph1 objective infinity corrected. Both microscopes were used in brightfield, and the Köhler illumination alignment [46] was performed in advance.

Sources of Error for Counting Measurements
Several sources of error contributed to the variability in the counts performed with the TB assay and can be summarized as follows (https://chemometec.com/manualcell-counting/): 1) Subjective definition of a "cell": There are guidelines but no well defined rules to help an operator define a cell. From a practical point of view, distinguishing a cell from cell debris or other particles is often challenging, even for an expert biologist. 2) Subjective perception of a "dead cell": With TB there is no official colour threshold for discriminating between a dead cell and a living one. Individual operators performing the manual count has a certain specific set of criteria to define the threshold of brightness of the stain in order to count a cell as being viable or not. Such interpersonal differences in the manual identification of dead cells are crucial for defining the percentage of viability of the cell culture.

Statistical Analysis
The reproducibility and repeatability of the TB assay was measured by analysing the 210 counts performed by O 1 and O 2 . In particular, for cell viability we computed the mean and standard deviation (i.e., μ and σ values of the different S k ) of the percentage of living cells estimated by O 1 and O 2 for each A i (results reported in Table 2), P k (Table 5) and SP i ( Table 8). As for the cell population density assessment, we estimated the mean and coefficient of variation (i.e., μ and CV of the different S k ) of the total number of living cells for each A i (Table 3), P k ( Table 6) and SP i (Table 9). Specifically, we first computed μ and σ of the 5 S k analysed by each operator for each A i /P k /SP i , and then computed the CV values.
Finally, we calculated the absolute percentage error (E%) of the values obtained by the two operators, defined according to Eq. 1: For cell viability and total number of living cells, v 1 and v 2 are the mean values estimated by O 1 and O 2 , respectively, while v 12 is the mean value estimated considering all 10 samples for each A i, /P k /SP i analysed by the two operators. Finally, a two-sided Wilcoxon ranksum test was used to compare the values obtained by the different operators for both cell viability and total number of living cells. MATLAB (©, The MathWorks, Inc., Natick, Massachusetts, USA) was used for statistical analysis. p-values < 0.05 were considered significant. The results obtained from the Ai analysis are reported in Tables 2, 3, and 4. Tables 5, 6, and 7 report the results for P k , and Tables 8, 9, and 10 show the results for SPi.

Analysis of the 2D Cell Cultures
We used the σ values obtained for A i and P k to estimate the intra-rater reliability of cell viability (Tables 2 and 5, respectively). Given that cell viability is computed as a percentage, the standard deviation can be considered a direct estimation of the error that may occur when TB is used to estimate cell viability. All σ values were lower than 15% for both O 1 and O 2 . Furthermore, the average σ values were approximately 5% for A i and 3% for P k (last row of Table 2 and Table 5, respectively), indicating the high reliability of the TB assay when used for this purpose. With regard to the inter-rater reliability of cell viability we considered the E% values reported in the   Tables 4 and Table 7. It is worthy of note that the mean cell viability values estimated by O 1 and O 2 for each A i /P k were fairly similar (from left, the second and the forth column of Table 2 and Table 5). Accordingly, E% values reported in Table 4 and Table 7 were very low, i.e. <10%, and their average was <5% (last row, second column of Table 4 and Table 7). Conversely, both the intra-and inter-rater variability values obtained for the total amount of living cells were particularly high. Being the total amount of cells computed as the absolute value, we estimated the intra-rater variability by analysing the CV values for all A i /P k , considering the different S k counted by the operators. The majority of CVs reported in Table 3 and Table 6 were >15%, which is fairly surprising. In particular, O 1 obtained a CV <10% twice (i.e. for A 3 and P 2 ) and O 2 only once (i.e. for A 4 ). Furthermore, the average CV values (bottom row of Table 3 and Table 6) were particularly high (around 20%) for both operators. Similarly, as the amount of living cells estimated by O 1 and O 2 for each A i /P k differed substantially (second and forth column of Table 3 and Table 6), the majority of E% values reported in the third column of Table 4 and Table 7 were especially high. In particular, the average E% (bottom row, right-hand column of Table 4 and Table 7) was >15% for both A i and P k . These results, paired with the previously described high intra-rater variability, unexpectedly revealed a poor ability of the TB assay to estimate cell population density.
However, many of the p-values computed for both viability and total number of living cells were >0.05, this proving that the sets of counts obtained by O 1 and O 2 for the same A i /P k did not differ significantly from each other. In actual fact they differed in one only case for A i ( Table 3, row A 4 ), and in three cases for P k ( Table 5, row P 1 and Table 6, rows P 2 and P 5 ). The differences obtained by the two operators in these cases were probably caused by a pipetting/resuspending error. For example, the data in Table 1 clearly show that the number of cells counted by O 1 for A 4 was significantly lower and more variable than those counted by O 2 . However, a p-value <0.05 in 4 out of 26 cases simply means that, despite the high intra-rater reliability of the TB assay, especially when used for cell population density assessment, the sets of counts performed by different operators did not, in general, differ statistically.

Analysis of the 3D Cell Cultures
The results obtained from the analysis of the 3D cell cultures were similar to those obtained for the 2D cultures.    All σ values reported in Table 8 were <15%, and the average σ were 4.84% and 4.23% for O 1 and O 2, respectively, once more confirming the high repeatability of the TB assay when used to estimate the viability of 2D and 3D cell cultures. The E% values reported in the second column of Table 10 were slightly higher than those of Table 4 and Table 7, suggesting poorer reproducibility of cell viability values for 3D cultures (but still around 5%).
With regard to the analysis of cell population density, both intra-and inter-rater variability were once again exceptionally high. The majority of CVs reported in Table 9 were >20%, O 2 never obtaining a CV <20%, and O 1 only twice obtaining a value <10% (i.e. for SP 2 and SP 6 ). Similarly to what happened for the 2D A549 cell cultures, the amount of living cells estimated by O 1 for SP i differed substantially from that obtained by O 2 (second column vs forth column, Table 9). Consequently, most of the E% values reported in the third column of Table 10 were >15%, with an average E% of 17.23%. Notably, the CV value obtained by O 2 for SP 2 , SP 5 , SP 6 , SP 7 was triple that obtained by O 1 because the total number of living cells counted by O 2 for these SP i was much more variable than that of the counts performed by O 1 . Specifically, the σ of the counts performed by O 2 was more than twice that of the counts performed by O 1 . Furthermore, O 2 counted a lower number of cells than O 1 for all but SP 4 , probably because there were more cell clusters in the samples prepared by O 2 that must not be considered when counting with a haemocytometer (here, we remark that each operator prepared her/his own 5 S k ). This resulted in a lower μ of the number of living cells counted by O 2 which negatively contributed to the estimation of the CV values. Although both operators are biologists with more than 10 years' experience in counting cells, the results are suggestive of a greater ability of O 1 to resuspend the samples generated from 3D spheroids, effectively disgregating the cell clusters. This is indicative of the high subjectivity of the TB assay and of it poor reliability when used to estimate the total number of cells in a culture. However, as happened for the 2D cell cultures, almost all p-values computed for viability and total number of living cells were >0.05, once more proving that the sets of counts obtained by the different operators did not significantly differ from each other.

Discussion
In this work we studied repeatability and reproducibility of cell population and viability measurements obtained with the TB assay. We asked two experienced biologists to count the live and dead cells of 105 different samples of 2D and 3D cell cultures in a double blind manner   (total 210 counts). Our aim being to measure: (a) the repeatability of the count performed by the same operator; (b) the reproducibility of counts performed by the two operators. We estimated an approximate variability of 5% for both 2D and 3D cell cultures when the TB assay is used to assess the viability of the culture, and a variability of around 20% when it was used to determine the cell population density, i.e. total number of living cells in the culture. Our results show that, whilst the method is quite precise when used to assess viability, it is fairly unreliable at estimating the population of a cell culture, whether 2D or 3D. In practice, our findings serve to alert researchers evaluating cell culture populations that they should expect to find an appreciable difference between measurements (up to 20%) when performed by different operators.

Conclusions
The TB assay was introduced about a century ago and is still the most widely used method to perform viability and population assessments of cell cultures. However, no study has been published so far with regard to deep validation of the TB assay, especially for viability and counting measurements of 3D cell cultures.
The main aim of the statistical analyses performed in this work was to provide researchers with novel information on TB reliability and to make them aware of expected measurement errors when the assay is used to evaluate population and viability of 2D and 3D cell cultures. The results obtained prove that (a) there is no significant difference between 2D and 3D cell cultures as far as TB reliability is concerned; (b) the TB method is precise when used for viability assessments of a cell culture; (c) the method is fairly inaccurate at estimating cell population density, despite it is routinely used for this purpose in numerous laboratories.
For the sake of clarity we repeat that as mentioned before, the purpose of our work was not to provide overall accuracy of the reliability of an assay used in different contexts and with different cell lines. Nevertheless, once these performances are known and acknowledged, it will be up to researchers to determine when the TB assay can be used and whether the expected reliability of its measurements is compliant with their own experiments.