Decoding the Virtual 2D Map of the Chloroplast Proteomes

Background The chloroplast is a semi-autonomous organelle having its own genome and corresponding proteome. Although chloroplast genomes have been reported, no reports exist on their corresponding proteomes. Therefore, a proteome-wide analysis of the chloroplast proteomes of 2893 species was conducted, and a virtual 2D map was constructed. Results The resulting virtual 2D map of the chloroplast proteome exhibited a bimodal distribution. The molecular mass of the chloroplast proteome ranged from 0.448 to 616.334 kDa, and the isoelectric point (pI) ranged from 2.854 to 12.954. Chloroplast proteomes were dominated by basic pI proteins with an average pI of 7.852. The molecular weight and isoelectric point of chloroplast proteome were found to show bimodal distribution. Leu was the most abundant and Cys the least abundant amino acid in the chloroplast proteome. Notably, Trp amino acid was absent in the chloroplast protein sequences of Pilostyles aethiopica. In addition, Selenocysteine (Sec) and Pyrrolysine (Pyl) amino acids were also found to be lacking in the chloroplast proteomes. Conclusion The virtual 2D map and amino acid composition of chloroplast proteome will enable the researchers to understand the biochemistry of chloroplast protein in detail. Further, the amino acid composition of the chloroplast proteome will also allow us to understand the codon usage bias. The codon usage bias and amino acid usage bias of chloroplast will be crucial to understanding their relationship. Supplementary Information The online version contains supplementary material available at 10.1186/s12575-022-00186-8.


Background
The chloroplast is a semi-autonomous organelle in plant cells. It is responsible for photosynthesis and the biosynthesis of several other vital molecules, including amino acids, fatty acids, and terpenoids. The chloroplast was derived from an independent, prokaryotic endosymbiotic ancestor with a small genome. Chloroplast genomes possess three to 273 protein-coding DNA sequences (CDS) [1], and the organelle is fundamental to plant productivity and survival. A large number of chloroplast proteins are associated with photosynthesis and fatty acid biosynthesis. Several chloroplast proteins increase or decrease in abundance as a part of different stress and signaling responses. Therefore, understanding the expression of functional chloroplast proteins is important. Nuclear-encoded proteins are also present in chloroplasts and function in diverse cellular processes. This indicates that the chloroplast proteome is determined by two genomes and is bidirectionally regulated by both the chloroplast and the nucleus. The functional characterization of a protein depends on knowing its sub-cellular localization, co-and post-translational modifications, and enzymatic activity. The field of proteomics focuses on characterizing all the proteins expressed by an organism or tissue. To enable the global identification of *Correspondence: nostoc.tapan@gmail.com; tapan.mohanta@unizwa.edu. om; aharrasi@unizwa.edu.om proteins, extracted proteins must be first separated by different methods, such as 2D electrophoresis, before their identification by mass spectrometry. Although the genomes of thousands of species have been sequenced, the number of proteins identified and characterized by 2D electrophoresis is very low due to their high level of complexity. Less than 10% of the proteins in the SWISS-PROT database have been identified in 2D gels. This suggests that 2D protein gel electrophoresis cannot be used to provide a comprehensive picture of the proteome. Proteins commonly interact with other proteins, lipids, and nucleic acids. These complex interactions make many proteins challenging to solubilize in an extraction buffer and subsequently separate. Therefore, it is often necessary to separate the protein from its non-protein component so it can be easily separated by isoelectric focusing (IEF) using a wide pH gradient. Mass spectrometry analysis of the entire cellular proteome remains a daunting task due to the compartmentalization of proteins in eukaryotic cells and their complex interactions with other molecules. However, the continuing increase in sequenced genomes dramatically increases our ability to identify predicted translated protein sequences and understand protein function. Several different parameters can be used to characterize the complexity of a protein, including its isoelectric point (pI), molecular mass, and charge; all of which determine its separation in a 2D gel. In the current study, the complete annotated genomes of more than 2500 species were used to construct a virtual 2D proteome map of the plastome based on their molecular weight and isoelectric point (pI). The pI and molecular weight of a protein can be sequentially used to separate proteins by 2D electrophoresis. In 2D gel-based electrophoresis, proteins are first separated by using immobilized pH gradient (IPG) strips and polyacrylamide gel electrophoresis (PAGE), which is then followed by separation in a second dimension based on molecular mass using SDS (sodium dodecyl-sulfate)-PAGE. These data have been used to construct a virtual 2D proteome map of the chloroplast plastome of plants.
In this study, we have delineated the proteomic details of the chloroplast proteome of 2893 species constituting 256,387 protein sequences and constructed a virtual 2D map of the chloroplast proteome. The virtual 2D map of the chloroplast proteome showed bimodal distribution. The average pI of the chloroplast proteome was 7.825, and the molecular weight of the chloroplast proteome ranged from 0.448 to 616.334 kDa. Amino acid composition study revealed that Leu was highest and Cys was the lowest abundant amino acid of the chloroplast proteome while Sec and Pyl amino acid was found to be absent.

Results
The Molecular Mass of the Chloroplast Protein Ranged from 0.448 to 616.334 kDa An extensive analysis of the chloroplast proteome, based on the fully-annotated protein sequences of 2893 species, comprising a total of 256,387 protein sequences, revealed that the molecular mass of the chloroplast plastome ranged from 0.448 to 616.334 kDa (Supplementary File 1). The ribosomal protein L16 (accession: AWK02406.1) of Cercidiphyllum japonicum (accession: MG605672.1) encoded the smallest protein (0.448 kDa). In comparison, the cell division protein (accession: AID67672.1) of Nephroselmis astigmatica (accession: KJ746600.1) was found to be the largest protein (616.334 kDa) present in the chloroplast proteome. Additional low-molecular-mass proteins found in the chloroplast proteome included the ribosomal protein S12 of A principal component analysis (PCA) of the lowmolecular-mass proteins of the chloroplast proteome revealed that monocots, magnoliids, gymnosperms, and bryophytes share similar low-molecular-mass chloroplast proteins, while the low-molecular-mass proteins of eudicots, nymphaeales, pteridophytes, and algae cluster separately; indicating distinct differences in the lowmolecular-mass proteins present within these two groups (Fig. 1). A Pearson correlation analysis (p < 0.05) indicated that the low-molecular-mass proteins of eudicots and nymphaeales are negatively correlated (− 0.289), while the low-molecular-mass proteins of bryophytes and algae (0.299), pteridophytes and bryophytes (0.389), bryophytes and eudicots (0.24), and nymphaeales and magnoliids (0.303) were all positively correlated (Fig. 1).
The largest identified chloroplast protein (cell division protein) has a molecular mass of 616.334 kDa, and is comprised of 5242 amino acids (Supplementary File 1). Some of the other high-molecular-mass chloroplast proteins were hypothetical chloroplast RF21 (575.771 kDa, accession: AWH11312.1), cell division protein (487.534 kDa, accession: ALO62775.1), hypothetical chloroplast RF1 (485.475 kDa, accession: AHZ11038.1), and Ycf1a (482.348 kDa, accession: GAQ93691.1) (Supplementary File 1). The high-molecular-mass cell division protein was only found in algal species and absent in other species. Principal component analysis of the high-molecular-mass chloroplast proteins revealed that the high-molecular-mass proteins of gymnosperms, bryophytes, magnoliids, protists, and pteridophyte clustered together, while the high-molecular-mass proteins of algae, monocots, nymphaeales, and eudicots clustered independently (Fig. 2). These data  High-molecular-weight proteins in the chloroplast proteome of different taxonomic groups indicate that monocots, eudicots, and algae are independent, suggesting a lack of commonality in the high-molecular mass chloroplast proteins in these taxonomic groups. B Pearson's correlation analysis (p < 0.05) values for high-molecular-mass proteins in the chloroplast proteome of different taxonomic groups. C Heat map of the Pearson's coefficients of high-molecular-mass proteins. A high correlation between nymphaeales and bryophytes is evident, while several others are negatively correlated suggest commonality in the high-molecular-mass proteins in the lower eukaryotic plant taxa (gymnosperms, bryophytes, magnoliids, protists, and pteridophytes). In comparison, no commonality is present in the higher eukaryotic plant taxa (monocots, nymphaeales, and eudicots). A Pearson's correlation (p < 0.05) analysis revealed that the high-molecular-mass proteins in the bryophytes and nymphaeales were positively correlated (0.476) with each other, while several other groups were negatively correlated (Fig. 2).
Chloroplast proteomes were found to encode a range from 3 to 370 proteins in their proteome. Pilostyles aethiopica (eudicot) contained the lowest number of chloroplast-encoded proteins, while Pinus koraiensis was found to encode the highest number (370) of chloroplast-encoded proteins. The chloroplast plastome contained an average of 88.749 chloroplast-encoded proteins with an average mass of 32.483 kDa (Fig. 3, Supplementary file 1). Some of the species with a lower number of chloroplast-encoded proteins were Monoraphidium neglectum (4), Pilostyles hamiltonii (4), Asarum minus (7), and Cytinus hypocistis (15). Similarly, some of the species encoding a higher number of chloroplast proteins were Grateloupia taiwanensis (233), Grateloupia filicina

The Molecular Weight and pI of the Chloroplast Proteome Exhibits a Bimodal Distribution
The isoelectric point and molecular mass values vary greatly among different chloroplast proteomes and may actually exhibit a bimodal distribution (Fig. 6). The calculated mean pI of the overall chloroplast proteome was 7.852, and the mean molecular mass was 32.483 kDa. The variance in pI was 5.613, which is lower than the mean, while the variance in the molecular mass was 1966.947, which is quite higher than the mean (Supplementary Table 1). The 75th percentile for the calculated pI of proteins was 9.736, while the 25th percentile was a calculated pI of 5.715 (Supplementary Table 1). The 75th percentile for the calculated molecular mass of chloroplast proteins was 38.95 kDa, while the 25th percentile was calculated to be 9.18 kDa (Supplementary Table 1). The Skewness of the pI and molecular mass of chloroplast proteomes was 0.108 and 3.569, respectively, while the kurtosis for pI and molecular mass was − 1.246 and 15.282, respectively (Supplementary Table 1). The pI exhibited a platykurtic (< 3) distribution, while the molecular mass of chloroplast proteins exhibited a leptokurtic (> 3) distribution. The normal distribution of pI for P(X > 12.954), P(X < 2.854), P(X > 7.951), and P(X < 7.951) was 0.0158, 0.0174, 0.484, and 0.516, respectively (Supplementary Table 1). The normal distribution of molecular mass for P(X > 616.334), P(X < 0.448), P(X > 17.669), and P(X < 17.669) was 0, 0.235, 0.629, and 0.370, respectively (Supplementary Table 1). These data indicate that the probability of an encoded chloroplast protein with a pI above 12.954 is very low (0.0158), and the probability of an encoded protein with a pI below 2.854 is less than 0.0174. However, the probability of an encoded protein with a pI > 7.951 is very high (0.484). Similarly, the probability of an encoded protein with a molecular mass greater than 616.334 kDa is zero (Supplementary Table 1). Only 126 species (4.35%) of the examined species were found to encode neutral pI proteins (Supplementary file 5). Coeloseira compressa, Lobelia anceps, and Megaleranthis saniculifolia encoded two neutral pI proteins, while the remaining species were found to contain only one neutral pI protein within their chloroplast proteome.

Chloroplast Proteome Lack Sec and Pyl Amino Acid and the Abundance of Leu Was Highest, and Cys Was Lowest
Plastome-wide proteome analysis of amino acid composition revealed that Leu (10.59%) was the most abundant amino acid. At the same time, Cys (1.125) was the Fig. 6 Virtual 2D map of chloroplast proteomes. The X-axis represents the pI, and Y-axis represents the molecular mass of different chloroplast proteomes. The overall chloroplast proteome exhibits a bimodal distribution. Basic pI proteins are more abundant in chloroplast proteomes than nuclear proteomes; hence the modality shifts towards the basic pI range least abundant amino acid in the chloroplast proteome (Table 1, Fig. 7, Supplementary file 6). Other highabundant amino acids in the chloroplast proteome were Ile (8.503%), Ser (7.536%), and Gly (6.807%). Other low abundant amino acids in the chloroplast proteome were Trp (1.683%), His (2.298%), and Met (2.305) ( Table 1, Supplementary file 6). The chloroplast proteome was found to encode 50.785% non-polar and 49.197% polar amino acids. Notably, only 0.955% of protist chloroplast proteins contain Cys, and only 0.988% of algal chloroplast proteins contain Cys. The percentage of algal chloroplast proteins containing Arg was 4.8 and 4.97% in protists, which was considerably lower relative to other taxonomic groups (Table, Fig. 7). The highest and lowest abundance of various amino acids in different taxonomic groups are indicated by an asterisk (*) and a dagger ( †), respectively, in Fig. 7. None of the analyzed chloroplast protein sequences were found to contain Sec selenocysteine (Sec), and a few encoded Xaa (unknown), B (Asx, codes for Asn or Asp), and J (Xle, codes for Leu or Ile) (Supplementary file 1). At least 108 species contained Xaa, six contained Asx, and eight contained Xle amino acids. The amino acid pyrrolysine, and selenocysteine, were also not found in the chloroplast proteome. The highest and lowest abundant amino acids in many individual species were also determined ( Table 2). Most of the species listed in Table 2 were algae or protists and exhibited significant variation in amino acid composition. For example, although the average Percentage of Leu in the chloroplast proteome was 10.590% (Table 1), the Percentage of Leu was 12.385% in the chloroplast proteome of Codonopsis lanceolata (Table 2). Similarly, the Percentage of Ile in the chloroplast proteome was 8.503% (Table 1), while the percentage of Ile in Choreocolax polysiphoniae was 14.555% ( Table 2). The chloroplast proteome of Pilostyles aethiopica does not contain Trp and may have lost the genes responsible for encoding this amino acid. A PCA analysis revealed that Leu, Ile, Lys, Asn, and Ser are independent of each other, while Cys, Met, His, and Trp cluster together (Fig. 8). Similarly, Tyr, Gln, Thr, Glu, Asp, Phe, Val, and Gly also cluster together, reflecting their similar percentage of abundance in the proteome. A Pearson's correlation analysis (p < 0.05) of amino acid composition was conducted to better understand their abundance in the chloroplast proteome. Results indicated that a maximum of the chloroplast encoded amino acids were positively correlated with each other, with a few exceptions (Fig. 8). The abundances of Cys, Met, His, Tyr, Gln, Thr, Glu, Asp, Phe, Val, Gly, and Trp were found to be correlated (Fig. 8). A few amino acid combinations exhibited a negative correlation, including Lys and His (− 0.083), Lys  (Fig. 8).

Discussion
Plant cells and protists contain a semi-autonomous chloroplast organelle that encodes a small proteome, consisting of a dynamic range of proteins that vary in  The analysis indicated that Leu, Ile, Asn, Lys, Pro, Gly, Ser, and Arg amino acids locate independent from each other, while other amino acids cluster in groups; suggesting the differential composition of Leu, Ile, Asn, Lys, Pro, Gly, Ser, and Arg amino acids. B Heat map of the Pearson's correlation analysis values of the amino acid composition in chloroplast proteomes. All of the amino acids, except for Lys and His, were positively correlated molecular mass and isoelectric point. The largest protein (616.334 kDa) identified in the chloroplast proteome was a cell division protein and is quite smaller than the largest nuclear-encoded protein in plant cells. Presently, the largest protein encoded in plant cells is a putative polyketide synthase type-I protein with a molecular mass of 2236.8 kDa [2]. Chloroplast proteomes were found to encode a range from 3 to 370 proteins, while the nuclear genome encodes from 6033 (Helicosporidium sp.) to 248,180 (Hordeum vulgare) protein sequences [2]. The largest chloroplast-encoded proteome in the plant kingdom is 9,857,470.162 kDa (Hordeum vulgare), which is 1683.657 times larger than the chloroplast proteome of 5854.794 kDa in Grateloupia filicina. The average molecular mass of nuclear-encoded proteomes in the plant kingdom is 1,918,027.187 kDa, which is 666.552 times larger than the average molecular mass of the chloroplast proteome (2877.533 kDa). Chloroplast proteomes encode an average of 88.749 proteins per chloroplast (Fig. 3), while the nucleus encodes an average of 40,469.47 proteins, which is 455.999 times greater than the chloroplast proteome.
In algae, the chloroplast proteome encodes larger proteins relative to other taxonomic groups and also has a higher number of proteins. It is reported that chloroplasts originated approximately 1.2 billion years ago as cyanobacterial endosymbionts within a eukaryotic host cell [3]. Later, the endosymbiont genome underwent an enormous reduction in its genome size, decreasing the number of encoded proteins to a range of 3-370 [1]. In contrast, the cyanobacterial genome encodes several thousand proteins [4]. Although it is commonly assumed that the chloroplast maintained its genetic autonomy, this does not seem to be the case. Chloroplasts have frequently lost genes and genetic content and transferred genes to the nucleus [1]. During evolution, genes have been transferred from an ancestral chloroplast to the nucleus and are translated into the cytosol, where they are properly expressed and targeted for import into the chloroplast with the aid of a transit peptide. Our studies have established that almost all chloroplast protein-encoding genes can be found as a nuclear genes in one or more species [1]. Approximately 18% of the nuclear genes in Arabidopsis thaliana have been reported to be inherited from cyanobacteria [5]. This observation is explained by the common phenomenon of an exchange of genetic material between the endosymbiont chloroplast and the nucleus. However, the question arises: why protein-encoding genes from the chloroplast have been transferred and merged with the nuclear genome? Is the genomic organization of the chloroplast genome unsuitable for the proper expression and processing of chloroplast-encoding genes inside the eukaryotic cell? The nucleus regulates the chloroplast, so concomitant to this regulation, it may have been more efficient for the chloroplast genes to be transferred to and expressed by the nucleus.
The chloroplast proteome encodes small peptides, with the smallest identified peptide being comprised of M-S-L-V amino acids. This tetrapeptide has a molecular mass of 0.448 kDa, and in comparison, the smallest nuclearencoded peptide is also a tetrapeptide (M-I-M-F) with a molecular mass of 0.54 kDa [2]. The low molecular mass tetrapeptide identified in the chloroplast proteome of Cercidiphyllum japonicum was not found in other species, and the cellular and molecular function of this tetrapeptide are unknown. One of the small molecular mass peptides identified in the nuclear-encoded proteome of plant cells is the cytochrome b6/f complex subunit VIII [2], which is also encoded in the chloroplast proteome (Supplementary file 5). Glutathione is the smallest reported peptide composed of three amino acids (tripeptide) G-S-H [6]. Although nuclear-encoded small peptides in the plant kingdom contain glutathione, chloroplast-encoded small peptides contain Ser (S), an amino acid similar to glutathione. Polypeptides with fewer than 100 amino acids are categorized as small peptides, and 33.22% of the proteins encoded by the chloroplast proteome are composed of ≤100 amino acids. The small peptides play a role in cell signaling, cell growth, and DNA damage response [7][8][9][10]. Tri, tetra, and pentapeptides are involved in diverse signaling processes [11,12]. The tetrapeptide G-E-K-G is associated with the formation of the extracellular matrix [13], the pentapeptide E-R-G-M-T induces the expression of the srfA-lacZ gene in Bacillus subtilis [14], and A-R-N-Q-T plays a role in sporulation [14]. A previous study reported that the average size of plant proteins is smaller than animal proteins [2]. In the plant kingdom, the average length of nuclearencoded proteins is 424.34 amino acids, while the average size of chloroplast-encoded proteins is 288.9613 amino acids. The average length of eukaryotic proteins has been reported to be 472 amino acids [15], which is 183.038 amino acids greater than the average length of chloroplast-encoded proteins. Although the average size of chloroplast-encoded proteins is very low relative to nuclear-encoded plant and animal proteins, the chloroplast genome of Monoraphidium neglectum encodes an average of 1743 amino acids per protein and was found to only encode a total of four protein sequences.
The chloroplast proteome was found to contain a higher percentage of basic pI proteins (56.334%) relative to the nuclear-encoded proteins, the latter of which has been reported to encode a higher percentage (56.44%) of acidic pI proteins. The average pI of nuclear-encoded acidic proteins is 5.62 [2], slightly higher than the average pI of acidic chloroplast proteins (5.506). The average pI of basic proteins in the chloroplast proteome is 9.669, slightly higher than the average pI (8.37) of basic, nuclearencoded proteins in the plant kingdom. The pH of chloroplasts ranges from 7.8 to 8.2 [16], and the stromal pH of illuminated chloroplasts is approximately 8.0 [17]. These data indicate that the chloroplast stroma resides in an alkaline pH environment and suggests that chloroplasts may encode a higher percentage of basic pI proteins to maintain homeostasis. The pH gradient between the thylakoid lumen and stroma under illuminated conditions has been reported to drive ATP synthesis, and stromal pH is partially dependent on the external pH and proton uptake by thylakoids under illuminated conditions [17,18]. Lightinduced stromal alkalization is quickly reversed under dark conditions as protons diffuse across the membrane from the thylakoid lumen. The light-induced alkaline pH of the stroma is crucial for the activity of photosynthetic enzymes in the carbon reduction cycle and facilitates optimal photosynthesis [19,20]. Therefore, it is important to understand how an alkaline pH is maintained in the stroma of the chloroplast, which is surrounded by the acidic pH of the cytosol. It can be hypothesized that a complex regulatory system may exist, which is comprised of cationic/monovalent anti-porters, cation channels, and efflux carriers that transport H + across the chloroplast envelope, which still remain to be identified. Chloroplasts also have the potential to generate a stromal Ca 2+ signal in response to diverse stimuli and contribute to the finetuning and maintenance of stromal pH [21][22][23][24][25].
The highest percentage of basic pI proteins was found in protists, and the lowest percentage was found in gymnosperms. The species Prototheca, which lacks a chlorophyll molecule, encodes 96.428% basic pI proteins, while the chloroplast proteome of the parasitic plant, Asarum minus, possesses the highest percentage (71.428%) of acidic pI proteins. Due to the higher percentage of basic pI proteins in the chloroplast proteome, the bimodal distribution of pI on the proteome map falls towards the basic pI range (Fig. 6). Although the chloroplast proteome indicates a bimodal distribution of chloroplast proteins, the nuclear-encoded proteome in the plant kingdom exhibits a trimodal distribution [2]. Schwartz et al. (2001) reported a trimodal distribution of pI for eukaryotic proteins [26]. Kiraga et al. (2007) reported a bimodal distribution of the pI of proteins from all organisms. They indicated that taxonomy, ecological niche, proteome size, and sub-cellular localization are correlated with the presence of acidic and basic pI proteins [27]. Although these attributes do not show any correlation for nuclear-encoded proteins [2], the bimodal distribution of the pI of proteins in the chloroplast proteome is strongly correlated with the taxonomy and ecological niche of an organism (Figs. 4 and 5). The chloroplast proteome of protists and algae has a higher percentage of basic pI proteins, and gymnosperms have a lower percentage of basic pI proteins. Notably, the marine seaweed, Prototheca stagnorum, encodes 96.428% of its chloroplast-encoded proteins as basic pI proteins, reflecting the association of an ecological niche with a higher percentage of basic pI proteins (Supplementary file 4). In contrast, gymnosperm species were found to only encode 48.680% of its chloroplast-encoded proteins as basic pI proteins, reflecting the association of taxonomic rank with a higher percentage of acidic pI proteins.
The present study revealed that Leu was the most abundant (10.59%) amino acid in the chloroplast proteome, while Cys (1.125%) was the lowest. The chloroplast proteome's highest and lowest abundance of amino acids was partially associated with taxonomic rank ( Table 1). The chloroplast proteome of protists contained only 0.955% Cys amino acids, and algae had only 0.988%, indicating a lower abundance of Cys amino acids in lower eukaryotic plants. Leu, a non-polar amino acid, is present in chloroplast-and nuclear-encoded proteins, favoring the synthesis of non-polar amino acids rather than polar amino acids. Pilostyles aethiopica only contains three proteins [28] in its chloroplast proteome, which do not include any Trp amino acids (Supplementary file). The amino acid selenocysteine (Sec), which has been reported to be present in the nuclear proteome of algae and absent in all other higher plants, was not found in any of the chloroplast proteomes [2]. The selenium-containing Sec amino acid is frequently found in the proteome of animals and bacteria [29][30][31][32], where it is usually present in the active sites of protein molecules that are involved in redox reactions [31]. Pilostyles aethiopica, a myco-heterotrophic fungus, and an ectoparasitic land plant, has almost lost its proteome entirely. The endoparasitic flowering plant, Rafflesia lagascae, appears to lack a plastome [28]. The abundance of an aromatic ring containing amino acids, Trp and Tyr, is relatively low in both nuclear and chloroplast proteomes, and the complete absence of Trp in the chloroplast proteome suggests that this amino acid has undergone stringent selection pressure.

Conclusion
Analysis of the chloroplast proteome of 2893 species of the plant kingdom revealed a diverse range of molecular mass and pI in chloroplast proteins. Basic pI proteins were dominant over acidic pI proteins in the chloroplast proteome, while only 0.054% neutral pI proteins were identified, suggesting that proteins with a neutral pI are rarely needed. The pI of chloroplast proteins covers almost the entire pH range (2.854-12.954). Understanding the function of these high and low pI chloroplast proteins will be interesting. The relative abundance of acidic and basic pI proteins in a chloroplast proteome is related to an organism's taxonomic rank and ecological niche. The high and low abundance of different amino acids in the chloroplast proteome of other species may be helpful to understanding the functional role of high and low abundant amino acids in the proteome. The rate of mutation and selection pressure may be the main reasons underlying amino acid composition in the chloroplast proteome of different plant species. The presence of ambiguous amino acids Xaa, B, and J in the chloroplast proteome is intriguing and requires further investigation to understand their functional significance. In addition, the absence of Trp in the chloroplast proteome of the mycoparasitic plant, Pilostyles aethiopica, is also quite exciting and warrants further investigation.

Sequence Retrieval and Determination of Molecular Weight Isoelectric Points of Chloroplast Proteins
All the protein sequences of the chloroplast proteomes were downloaded from the National Center for Biotechnology Information (NCBI). After collecting all the protein sequences, the isoelectric point and molecular weight of the proteins were calculated using the Linuxbased program of isoelectric point calculator (http:// isoel ectric. org/) [33]. This resulted in isoelectric point, and molecular weight files of proteins of individual species were further proceeded to remove the amino acid sequences and collected the molecular weight and isoelectric point values. The clear file of molecular weight and isoelectric point of individual species were analyzed for the amino acid count and sequence length of individual protein sequences using Linux-based command lines.

Statistical Analysis of the Chloroplast Proteomes
All the isoelectric point and molecular weight files of the individual species were subjected to further statistical analysis. The average of protein sequences per proteome, pI, mol. Weight, amino acid composition, number of amino acids per sequence, and others were calculated using Microsoft excel 2016. The probability distribution of molecular weight and the isoelectric point was analyzed using an online statistical tool math portal (https:// www. mathp ortal. org/). The scatter plot graph of the molecular weight vs isoelectric point of the chloroplast proteins was drawn using the scatterplot online server (https:// scatt erplot. online/). The principal component analysis of the chloroplast proteomes was conducted using the statistical tool unscrambler v 3 (https:// www. camo. com/ unscr ambler/). Pearson's correlation regression (p < 0.05) of the chloroplast proteins was analyzed using the statistical tool JASP 0.14.0.0.