A Review of Sampling and Modeling Techniques for Forest Biomass Inventory

: Forest biomass is the energy base and material source of forest ecosystem cycle, which is expressed by the dry matter weight or energy accumulated per unit area and time. It is also an important index to study the structure and function of forest ecosystem, and is the premise and basis of scientific management of forest ecosystem. In this paper, the concept, development history, and research status of forest biomass were reviewed. The sampling methods, model construction methods of forest biomass survey were analyzed. Finally, the research prospects and summaries of key technologies of forest biomass inventory and monitoring were put forward

estimation by remote sensing. the initial research on forest biomass started from the forest ecosystem and forest succession, and a large number of studies on forest biomass and carbon density were carried out from the 1990s to 2000. In the first decade of the new century, a large number of scholars carried out research on remote sensing and carbon sequestration, which are closely related to continuous inventory and management of forest resources, as shown in Figure 1.

Methodologies
The traditional methods of biomass inventory and estimation are still dominant. These traditional methods could be divided into carbon dioxide balance method, micrometeorological field method, direct harvest method and expansion factor method (Wu et al., 2023). The direct harvest method is the most accurate survey method and the most practical method for terrestrial forests. Direct harvest method can be divided into average wood method, clear cutting method, and relative growth method. The traditional methods of biomass investigation and estimation are heavy in workload, complicated in process, poor in representativeness, and have not formed a system of measurement techniques. Therefore, the traditional methods cannot timely reflect the quick changes of macroscopical ecosystems and ecological environment conditions, which could not meet the practical requirements. With the development of "3S" (GIS, RS, and GPS) technology, studies on vegetation productivity and biomass based on remote sensing technology have developed from the traditional ground measurement on a small scale and two-dimensional scale to the estimation of a large scale and multidimensional space-time, so that forest biomass at different spatial scales from stand to region can be estimated quickly (Hill et al., 2018), accurately.

Sampling Techniques
Sampling is a method based on probability theory. The random selection of samples can ensure the representativeness of samples, avoid human interference and deviation, and estimate the sampling error (H. Wu et al., 2023). Different sampling methods should be used for different purposes (Hou et al., 2021). In practical problems, a specific sampling scheme is mostly composed of a variety of basic sampling methods (Bagaram & Tóth, 2020). The methods of systematic sampling, stratified sampling and random sampling are used in biomass inventory at various levels. As a classical sampling method, equal probability sampling has been widely used (Hawbaker et al., 2009), and methods such as unequal probability sampling, remote sensing based sampling and sparse population distribution sampling (Lei & Tang, 2007;Sterba, 2009) are more targeted in biomass inventory. The study of forest biomass sampling mainly focuses on two keywords, measurement and estimation (Perez-Cruzado et al., 2021), which are closely related to forest carbon storage. The study of forest biomass sampling is gradually extended to survey design and estimation model, as shown in Figure 2.

Random Sampling
The sample units obtained by random sampling are scattered, which is not conducive to the development of actual forest resources investigation (P. Yu, 1974), and the accuracy depends on the number of sample units (Jin & Zhao, 2001;Meng et al., 1995). When the standard deviation of biomass between the sample units of systematic sampling is large, even if the samples are organized according to the method of systematic sampling, the estimated result is closer to the random sampling. To ensure the accuracy of sampling inventory results, it is necessary to increase the sample units. At present, the most common approach to estimating provincial forest biomass is by using data from continuous forest resource inventory plots data (Qin et al., 2017). However, the biomass estimation of this method is summarized from the level of individual trees to the level of plot, and then estimated the total biomass of the region. In the process, there are a large number of errors from different sources (McRoberts et al., 2013) and uncertainties, such as measurement errors, model uncertainties, and young trees that do not measure, which could lead to underestimation of forest biomass data (Poudel et al., 2015). The error sources of forest biomass estimation are diverse, and there are interaction and error conduction effects among them, so scientific sampling design is particularly important (Montesano et al., 2015).

Stratified Sampling
Compared with random sampling, stratified sampling often has significant efficiency for estimation. The effect of stratified sampling depends on the accuracy of prior information. The key point is that according to the classification of attribute characteristics, the variance between sample units in the same layer should be as small as possible, while the variance between sample units in different layers should be as large as possible. When the standard deviation of total biomass is large and the standard deviation between subpopulations is small, the estimation accuracy of using stratified sampling is greater than that of the random sampling (W. Zeng et al., 1995). Stratified sampling is often used to investigate forest volume in forest resource planning and design investigation (Yang, 1993). Cluster sampling divides the survey population into disjoint groups and then surveys the whole. By increasing the number of investigation units of adjacent samples to improve the accuracy and sampling efficiency and has been widely used in some forest resource surveys distributed in clusters (Y. Li et al., 2019;Shi, 2012).

Cluster Sampling
Since adaptive sampling design was proposed in the 1990s (Thompson, 1991), sample organization method, estimation method and practical application have all been improved and perfected. Since traditional sampling methods do not consider the inconsistent contributions of different clusters, traditional methods such as systematic sampling may lead to the reduction of sampling efficiency and estimation accuracy (Y. Gao & Gao, 2018;Hua et al., 2014;Huang., 2018;W. Zeng et al., 2018). Adaptive sampling design for sparse distribution population relying on the correlation between sample units can effectively reduce the number of sampling units (Hero et al., 2013;Lin et al., 2009); Traditional cluster sampling for aggregate distribution may lead to estimation bias (Xiao, 2004). The combination method of applying the results of biomass spatial distribution pattern analysis to adaptive sampling is unequal probability sampling (Holmberg & Lundevaller, 2015). Different distribution patterns should have different sampling probabilities. Adaptive sampling combined with unequal probability sampling can fully consider the spatial distribution difference of biomass, the status difference of different clusters, to carry out adaptive unequal probability sampling.

Unequal Probability Sampling
The advantage of unequal probability sampling is to improve the estimation accuracy and reduce the sampling error. The process of preparing the sampling frame is more complicated. PPES (sampling with probability proportional to an estimate of size) and PPP (probability proportional prediction) in forest resource inventory is the practical application of unequal probability sampling theory. In the 1970s, the forestry inventory began to use an unequal probability sampling design (Shi et al., 2009). Angular gauge for tree measurement is a typical application of unequal probability sampling in forest resource investigation. Different sample organization forms and unequal probability sampling combine to form different sampling design schemes (Good et al., 2001;Zhou & Sun, 2004). The sample sizes in cluster sampling are often different, the sampling probabilities of groups of different sizes are specified, and sampling with unequal probabilities can ensure the accuracy of sampling estimation under the requirement of fewer sampling units (Peng, 1998). Unequal probability sampling is more complicated than equal probability sampling in sampling frame design and sampling probability determination, but it can effectively improve the sampling efficiency (J. Li, 2000).

Other Sampling Methods
Unequal probability sampling is applicable to situations where the status of sampling units in the population is inconsistent or the units surveyed are inconsistent with those of the sampled population. The prerequisite of unequal probability sampling is that the sampling probability of sample units can be determined by the auxiliary information of each unit. Based on the prior information of the distribution pattern of forest biomass, the original sampling proportion was set, and the sampling adaptation was conducted according to the sampling probability of neighborhood sample units, so as to improve sampling efficiency. The Ripley's K(d) analysis, aggregation index analysis, nearest neighbor analysis and spatial autocorrelation analysis are the main spatial pattern analysis methods (Jiang et al., 2009). The nearest neighbor analysis selects and measures the distance between each base unit and its nearest base unit in turn, then calculates the mean value of the nearest neighbor distance of all base units in the region, and compares it with the expected mean value of random distribution. Aggregation index is used to calculate and analyze the distance between adjacent sample units to describe the spatial distribution and aggregation state of biomass. The spatial autocorrelation analysis among sample units can effectively reduce sample redundancy and reduce investigation cost (Zhao et al., 2022).

Modeling Techniques
Stand growth model refers to a mathematical function or a group of mathematical functions describing the relationship between stand growth and site conditions, which is used to estimate the development process of a stand under certain conditions (H. Gao et al., 2017). The main uses of model include updating forest biomass data, evaluating the benefits of different forest management measures, evaluating the impact of disturbance activities on forest ecosystem, predicting the yield of forest sustainable management. Models can be classified into various types according to their purpose of use, model structure, and object of reflection. The correlation between forest carbon storage and aboveground biomass is still the main research of forest biomass model. Random forest and support vector machines are widely used in forest biomass model research, and forest measurement is combined, as shown in Figure 3.

Tree Level
The biomass model of individual trees is a kind of model based on simulating the dry matter weight of each component (trunk, branch, leaf, bark, root, etc.) of each tree in the stand. More than 3000 biomass models have been established globally, involving more than 100 tree species (Basuki et al., 2009;Case & Hall, 2008;Chojnacky, 2002;Jenkins et al., 2003;Muukkonen, 2006). In the modeling process, it is necessary to measure the biomass of a certain number of sample trees as the basic data. Once the model is established, the continuous forest inventory data can be used to estimate the biomass of the whole stand in the same kind of stands with a precision. Therefore, the research of biomass model of individual trees has always been a hot spot (Kleinn et al., 2020).
According to the number of independent variables, tree level biomass models can be divided into unitary, binary and multivariate biomass models (W. Zeng & Tang, 2010). Models can be divided into linear, nonlinear and polynomial models, according to the form of tree level biomass models. Parameter estimation methods for biomass model construction of tree level biomass model mainly include traditional regression method (Basuki et al., 2009;Vallet et al., 2006), nonlinear likelihood uncorrelated regression method (Bi et al., 2004), linear or nonlinear joint estimation method (Tang et al., 2000), dummy variable method (Fu et al., 2012), mixed effects model method (Fehrmann et al., 2008), measurement error model method (W. , spatial regression method (Ou et al., 2014), and so forth.
In the research and application of biomass models, the component model is incompatible with the aggregate model (Parresol, 1999;H. Xu & Zhang, 2002). Tang et al. (2000) proposed a compatible tree level biomass model and estimation method combined with the current forest volume resources inventory method, namely the nonlinear joint estimation method. Compatible biomass models mainly include two types, namely the biomass model compatible with volume, and the biomass model compatible with total and component.

Stand Level
With the proposal of the IGBP, the results of previous ecological research systems and biomass data at stand level have been extended to landscape, regional and even global scales in the study of global climate change (X. Xu & Zhang, 2002). According to the independent variables, stand level biomass model can be divided into the model based on stand factor and the model based on volume. The biomass model based on volume can be subdivided into biomass expansion factor (BEF) and continuous biomass conversion functions (CBCF).
The dependent variable of the stand biomass model is usually the biomass per unit area of each organ of the stand, while the independent variable is the total basal area, dominant height, and average height of the stand to construct the stand biomass model of each dimension (Luo et al., 2009). Because the measurement of biomass at stand level rarely uses clear-cutting to obtain measured data, mostly obtains the biomass data based on the calculation of biomass model of individual trees or the measurement of standard trees (L. Dong & Li, 2016). Uncertainty analysis in the process of conversion from individual tree to stand level is also an important aspect of stand biomass research (Qin et al., 2017).
Scholars began to study to improve the accuracy of forest biomass estimation, and thus proposed a series of research methods (Fang, Chen, et al., 2001). Among them, BEF is a method to obtain the total biomass of stand by multiplying the average biomass of stand by the total volume of the forest types (Kauppi et al., 1992). The shortcomings of the BEF method are mainly reflected in the conversion factors, such as wood density and conversion ratio of total biomass and aboveground biomass, which are taken as constants. Fang, Wang et al. (1998) pointed out that stand biomass and volume were related to forest type, age, site conditions, stand density and other factors, and that the use of constant biomass conversion factor could not accurately estimate forest biomass. The continuous function method of conversion factors changes the constant average conversion factor into an age-graded conversion factor to achieve a more accurate estimation of forest biomass at the national or regional scale (Edgar et al., 2019;Kauppi et al., 1992;Turner et al., 1995).
The conversion model between biomass and volume is a hot topic in recent years, which needs to be further verified from the region and tree species, and the model relationship should be established comprehensively and systematically.

Nonparametric Model
Traditional statistical regression methods could not effectively describe the complex nonlinear relationship between forest biomass and measured data in certain situation, as well as practical problems such as high dimension, and the derived relationship is usually only applicable to this region. Although the accuracy can be improved by the learning method, the "black box" operation only shows their complex action process through the simulation of some training data sets, which is difficult to reflect the mechanism between biomass and remote sensing parameters. In order to improve the nonlinear prediction ability of biomass model, data mining and machine learning methods were applied to forest biomass estimation, including decision tree, K-NN method, support vector machine and artificial neural network.

Artificial Neural Network
Artificial neural network (ANN) is based on spectral information, vegetation index and texture characteristics as input variables of neural network, forest aboveground biomass of sample plot survey as output variables, select part of the sample data input neural network system for training to obtain a model algorithm, and then estimate the forest aboveground biomass according to the model algorithm (Foody et al., 2003;X. Xu et al., 2011).

Decision Tree
Decision tree (DT) is a method to approximate discrete value function, which can be regarded as a tree prediction model. The basic algorithms include random forest and gradient lifting decision tree. Decision tree integration methods can remove noise interference well, training complexity is low, prediction is accurate, and the model is easy to display, but there may be problems with over-fitting of training data (Carreiras, et al., 2012).

K-NN Method
K-nearest neighbor (K-NN) classification algorithm is also known as the reference plot method. The forest aboveground biomass value of a certain pixel of remote sensing image is obtained by weighted value of K measured sample points closest to the pixel in feature space (Chirici et al., 2008;Tuominen, et al., 2010), and then monitored forest aboveground biomass according to the sample plot data. The K-NN method can estimate the forest biomass and maintain the heterogeneity and similarity characteristics of the spatial distribution of carbon density, but its estimation results are often higher than those estimated by using the sample plot data (Labrecque et al., 2006).

Support Vector Machine
The principle of support vector machine (SVM) can be summarized, the nonlinear transformation defined by the inner product function is used to transform the input space into a high-dimensional space, and then the optimal classification surface is obtained in this space. Each intermediate node corresponds to a support vector, and the output is a combination of nodes (Zhang, 2000). Support vector regression machine is a special form of SVM and a kernel theory of regression analysis and equation approximation (Englhart et al., 2011). It overcomes the defects of insufficient data and over-learning of traditional prediction methods and has unique advantages in solving small sample and high-dimensional problems. However, improper selection of kernel function would cause errors in estimation results.

Based on Tree Structure Parameters
As for the structural parameters of trees, it is difficult to obtain the DBH by satellite remote sensing because the trunk is heavily shielded by the crown. As the vertical structure parameter of trees, height can be accurately measured by active lidar remote sensing. The height can be calculated based on the time interval between the echo signal received by the lidar and the echo signal from the ground. Then, the biomasses were calculated by the tree height or stand average height. Many space-borne full-waveform tree height inversion models have been developed (Lefsky et al., 2005).

Based on Vegetation Feature Index
Spectral information analysis of optical remote sensing images of forests can reflect the biophysical characteristics of forests. A variety of vegetation indices can be applied for modeling based on forest optical remote sensing data. Commonly used vegetation index is usually a linear or nonlinear combination of spectral reflectance ratio of two or more bands (W. Zeng et al., 2022), which can compress multi-dimensional spectral information into one index channel. By statistical regression of various vegetation indices, structural parameters such as canopy density and leaf area index can be further estimated. Based on the statistical regression method, the relationship between the spectral reflectance provided by optical remote sensing images and the vegetation biomass can be established, and then the regional vegetation biomass can be estimated.

Based on Physical Mechanism
The mechanism model (or process model) is used to describe the vegetation growth process at different spatial-temporal scales, such as photosynthesis, respiration, decomposition and oxygen cycle, etc. It simulates the process of solar energy conversion into chemical energy, and the process of plant body and soil water loss accompanied by canopy evapotranspiration and photosynthesis according to the principles of plant physiology, to achieve the estimation of forest productivity. The mechanism model is incorporated into models of global change and nutrient cycling, with biomass being only one of the model's output variables, taking CENTURY, CARAIB, TEM and so forth as Examples. The disadvantage of these models are often too complicated and need more input variables, the application of the model often depends on the quality of the data. The mechanism model emphasizes more on the description of various action processes within the ecosystem, and the estimation results are generally more reliable (X. Xu & Cao, 2006).

Research Prospects
The biomass inventory sampling should make full use of the existing spatial data for distribution pattern analysis, and carry out sampling design based on the results of spatial distribution pattern analysis (H. Wu & Xu, 2021). Spatial sampling technique can carry out with unequal probability sampling based on the spatial distribution pattern of forest biomass, which can effectively improve the investigation efficiency (L. Li et al., 2009;Q. Wu et al., 2004). At the same time, in order to monitor large scale macroscopic changes and development trends of forest biomass, remote sensing data were used as sampling basic data and auxiliary survey data, and the method of aerial sampling was used to arrange sample plots on remote sensing maps to estimate forest biomass (H. Liu, 2001;L. Liu, 2016) to meet the practical requirements of biomass estimation at different regional scales and spatial distribution characteristics (Hetzer et al., 2020;Zhu et al., 2020).
In recent years, scholars have presented the biomass models of various tree species, and carried out the study of forest biomass at multiple scales, such as individual, population, community, ecosystem, region and biosphere (McRoberts, 2001). It is necessary to study the biomass of individual tree deeply, including different geographical provenances, different development stages, and different natural zones, so as to establish a weight index model of biomass to achieve a more accurate estimation of biomass in different stand types. At present, there is still a lack of research on multi-level stand models. Only the biomass of standing trees with a certain diameter in forest was estimated, while the biomass of trees with a smaller diameter, understory shrubs and herbaceous plants was ignored. Therefore, the relationship between the total stand biomass and the biomass of living trees was clarified, and the biomass model of understory vegetation was established to solve the problem that the existing model neglected the biomass of smaller trees, understory shrubs, and herbs. In the future, model development, the effects of forest biological factors and non-biological factors on forest biomass should be considered comprehensively, especially the effects of stand volume, age, and climate factors on forest biomass estimation.
Remote sensing has the characteristics of macroscopical, comprehensive, dynamic, rapid and repeatable, and its band information has a certain correlation with forest biomass, so it has become the main method for estimating regional forest biomass (Wirasatriya et al., 2022). Each remote sensing data has certain limitations in spatial, spectral and radiative resolution, which affects the ability of remote sensing technology to estimate forest biomass. These factors result in the instability of the accuracy of estimating ground forest biomass with different remote sensing data (Y. Yu et al., 2022). Combining ecological factors, topographic factors, environmental factors, and remote sensing data to build a forest biomass estimation model with multi-source data can inhibit the influence of these factors. When the optical remote sensing data are used to estimate the areas with high forest biomass, the problem of remote sensing information saturation comes out. As a result, the changes of biomass cannot be accurately reflected, which becomes the bottleneck of forest biomass estimation by remote sensing method. Steininger (2000) found that LandSatTM images had a saturation problem when estimating biomass. Its saturation threshold was 15kg/m 2 . Lu (2005) also found this problem when estimating the Amazon basin in Brazil. Similar problems exist in estimating forest biomass with radar data (Wang et al., 2006). Multi-sensor data integration and non-atraditional estimation methods can solve this problem to some extent. Multi-source remote sensing image by multi-sensor to estimate forest biomass has become a developing trend to address the saturation threshold issue (Zhu et al., 2020). Combining remote sensing data of different sensors, different time, different spatial resolution different spectral resolution, and selecting the optimal information to estimate forest biomass is a problem that needs to be studied in data assimilation, as well as a problem faced by contemporary remote sensing development (W. Zeng et al., 2022). There is a certain gap between field sampling data and remote sensing image data, which is also a problem faced by remote sensing data assimilation.

Summary
The change of ecological environment caused by climate warming is becoming more and more obvious. It is affecting the pattern of ecological system and the sustainable development of human society. It has become a global environmental problem recognized by the international community. Forest is one of the main terrestrial ecosystems, which has an important carbon sink function. Forest carbon reserves account for about 80% of the above-ground part of the land and 40% of the underground part of the land, and the annual carbon fixed amount accounts for more than 2/3 of the whole terrestrial ecosystem. Forest in China will become a large carbon sink, which will play a positive role in mitigating the rise of global atmospheric carbon dioxide concentration.
It is a research hotspot and technical difficulty in the field of natural resources investigation and monitoring that efficiently and accurately monitors the annual dynamic change of forest biomass. National forest and grassland ecological comprehensive monitoring in 2021 and 2022 has optimized survey organization. The number of sample units to be investigated by equiprobability sampling is still quite large, under the specified accuracy and reliability. The estimation of forest biomass below the provincial level takes the regional total carbon storage as the control number, and divides them into small populations according to the principle of hierarchical control to produce the biomass data. Due to the lack of precision and the inability to sensitively reflect the dynamic changes of forest biomass at different scales, the application of the monitoring results in forest carbon sink accounting and other work is limited.
Based on the multi-stage sampling framework, researchers adopted the three-phase sampling method, combined with the characteristics of the regional distribution pattern of forest biomass, and carried out multi-scale unequal probability spatial stratified complex sampling design. The model sampling inference assisted by randomization inference was used to solve the problems of the limited sample size and small area estimation, data missing and measurement error under the specified precision, and the complicated sampling design uncertainty analysis and reliability evaluation were carried out. It is expected to form a set of complex sampling design and data inference technology for annual monitoring of forest biomass under hierarchical control, so as to meet the low-cost, rapid and accurate annual counting demand of forest biomass to be applied in practical work.

Conflicts of Interest:
The authors declare no conflict of interest.