

# *The Image Processor based on Signal Ranking, Determination of Rank Differences with their subsequent Selection and Weighted Addition*

<https://doi.org/10.31713/MCIT.2025.071>

Krasilenko Vladimir  
 Vinnytsia National Agrarian University  
 VNAU  
 Vinnytsia, Ukraine  
[krasvg@i.ua](mailto:krasvg@i.ua)

Nikitovich Diana  
 Vinnytsia National Technical University  
 VNTU  
 Vinnytsia, Ukraine  
[diananikitovich@gmail.com](mailto:diananikitovich@gmail.com)

Lazarev Alexander  
 Vinnytsia National Technical University  
 VNTU  
 Vinnytsia, Ukraine  
[krasvg@i.ua](mailto:krasvg@i.ua)

**Abstract**— In order to expand the functionality by increasing the number and complexity of nonlinear multi-input functions and transformations performed, in this paper we consider the urgent need for creating high-performance hardware image processors (IP). Such IPs are designed based on a conceptual approach, the essence of which is to rank the processed signals (pixels), fast command-controlled selection of rank differences with their subsequent weight addition. And since these procedures are basic for all advanced models of convolutional neural networks (CNNs), such IP can play the role of not only high-speed pre-filtering devices, but also be self-learning reconfigurable accelerators for CNNs, associative memory models, clustering and pattern recognition. First, we briefly review related works in order to show the advantages of using the proposed concept and equivalence models (EMs). The capacity and recognition properties of NNs based on modified EMs exceed the similar indicators of traditional networks by orders of magnitude. Therefore, such EM-neuroparadigm is promising for processing, recognition of large-sized images, including highly correlated, high-noise images. And since the main nodes of EMs are filtering nodes and procedures with continuous-logical operations, in this article we consider approaches to the design of IPs with extended functionality. The proposed structure of the processor based on our concept and the FPGA Altera EP3C16F484 Cyclone III family chip. The design results and calculations show that it is possible to implement IP for an image size of 64\*64 and a window of 3\*3 in a single crystal. For 2.5 V and a clock frequency of 200MHz, the power consumption will be at the level of 200mW, and the time for calculating the filter pixels will be 25ns. The results confirm the correctness of the concept.

**Keywords**—image processor, signal ranking, selection, weighted addition; equivalent model, rank differences, neuron-equivalent, convolutional neural network.

## I. INTRODUCTION, OVERVIEW, ANALYSIS OF PUBLICATIONS AND FORMULATION OF WORK GOAL

Advanced direction becomes fast parallel images processing using optoelectronics for interconnects and non-conventional MIMO-system, corresponding matrix logics (ML) (continuous, neural-fuzzy and others) and corresponding mathematical apparatus [1-5]. Photodetectors can be monolithically integrated with digital electronics in silicon, which allows the realization of stacked 3-D chip architecture in principle and significantly simplifies design of OE-VLSI circuits [2]. Smart image sensors with ADC [4, 5] show a great application field and potential. Our approach favors smart pixel architecture combining parallel signal detection with parallel processing in circuit, what guarantees the fastest processing. For self-learning neural networks (NN) based on equivalency models (EM) [6, 7], the elements of MLs are required. For the description and modeling of each continual subject domain and the class of its tasks, its own logical-algebraic apparatus (LAA) is required. Formal LAA is based on clear rules that allow you to make an exact description of a certain class of problems and even suggest an algorithm for solving them. The basis of information technologies in the analog field is precisely the continual LAAs: infinite-valued logic [8], continuous logic with all its variants and generalizations, additive-multiplicative logic (AM) algebra, predicate selection algebra, equivalence algebra [6-7]. They determined the biologically inspired stage of development of LAAs and new more energy-efficient direction of models and hardware implementations of artificial intelligence. Many logics are based on multi-input operations **min** ( $x_1, x_2, \dots, x_n$ ) and **max** ( $x_1, x_2, \dots, x_n$ ). The image processing algorithms, basic procedures of composition-decomposition, fuzzy inference in artificial neural-fuzzy systems are also based on multi-input min-max operations. Therefore, is an urgent need

to improve the nodes, that perform these and similar operations. Efficiency increasing of systems of speed images processing in the use of special mathematical support. The special place among such methods occupies the class of the nonlinear algorithms that carried out transformation of kind:  $B = \{b_{k,l}\} = F(A) = \{\Phi_{k,l}(A_{k,l})\}$ , where  $\Phi_{k,l}(A_{k,l})$  — nonlinear function, which is determined by subset of rank and (or) index statistician of selection. By virtue of the last this subclass was adopted by rank algorithms.

The algorithms of extreme filtration, using values of **min** and **max** on samples of neighborhood space, are the special cases of the rank algorithms. Any  $r$ -th index statisticians  $v_s(r)$  of display  $(k, l)$  the set neighboring of which form other  $(N_s-1)$  the elements of selection it is possible to bind to the local histograms of distributing of values of neighboring elements and with the proper functions of the well-organized choice  $F_n^m(X)$  element, where  $X = (x_1, x_2, \dots, x_n)$ . Such functions at any values of changing variables choose that size which at the location all right not decreasing are occupied by  $m$ -th place. These functions can be represented by a logical formula:  $F^{(r)}(x_1, x_2, \dots, x_n) = x^{(r)}$ ,  $r=1 \div n$ , where  $r$ -rank of the base operations of continuous logic (CL). Thus for  $r = n$  this operation passes to  $n$ -local disjunction, for  $r=1$  to  $n$ -local conjunction. The algebra formed in a number of  $C = [0, 1]$  with base operations  $f(r)$  and complementarity operation  $(-)$  is named ordering Boolean algebra. Rank algorithms are locally-adaptive on the same essence: simplicity of local adaptation, invariance to spatial links and to signals dimension, almost algorithms complication independence from the sizes of neighboring. Sorting algorithms have been widely researched due to the need for sorting in many applications. In paper [9] approaches to creation of programmable relational processors for sorting were shown. But in such a relational processor, working with analog signals, the sorting structure for ordering signals is complex.

Therefore, the aim is to simplify and digitize the sorting node to build on its basis the relational processors of nonlinear image processing. The latter can be used as nodes of ordinal logic, data ordering and sorting nodes, rank filters, fragment classifier recognizers, as tools for morphological operations as dilation, erosion, opening, closing. The above mentioned **min-max** operations on sets of signals are also necessary, which represent structural windows or selected fragments processed images. Many of the morphological operations need to be repeated many times and for all the fragments of the image being processed, therefore, there is an urgent need to reduce the execution time of **min-max** operations and ranking operations. Therefore, the **goal of our work** is to search new options for implementing both signal sorting nodes, including digital, providing increased accuracy and speed, and based on them relational non-linear image processing processor with advanced functionality. In addition, taking into account the recent emergence of a new element base, our task is to prove the possibility of creating on the FPGA, practically in one chip, an image preprocessor (IP) with enhanced technical characteristics and a wide range of commands through the use of a new method of processing pre-ranked signals and (or) their differences. To achieve this goal, it is necessary to simulate the algorithms and methods

themselves, and then based on them design and simulate the technical options for the implementation of non-linear IP and their main nodes.

The main computational procedure of most known methods and algorithms used in the creation of biometric systems of machine vision and recognition of objects in images is the comparison of fragments of the current processed image with a template image. The mutual two-dimensional correlation function or convolution with a filter is the most common discriminant measure of mutual matching-comparison of the reference fragment with the current image fragment. At the same time, in the works [10, 11, 12] it was shown that to improve the discriminant properties and quality of recognition-comparison, especially for strongly correlated and noise-damaged images, it is desirable to use recognition methods based on mutual equivalent two-dimensional spatial functions, nonlinear transformations and adaptive-correlation weighting. Therefore, to solve the problems of image recognition and clustering using new spatially invariant models of neural networks, equivalence models (EMs) of autoassociative memory (AAM) and heteroassociative memory (HAM) were proposed [11-15]. Studies of such EMs have shown that these models allow better recognition of images with increased dimensions (vectors with 1024  $\div$  4096 components) and a significant percentage (up to 25-30%) of damage, while the capacity of the AAM can be an order of magnitude higher than the number of neurons representing vector arrays, the set of which is memorized [12, 14, 15]. In addition, it is known that for scene analysis and recognition it is necessary to first solve not only the problem of preliminary filtering, but also the problem of object clustering. Such preliminary clustering allows organizing proper automated data grouping, conducting cluster analysis, evaluating each cluster based on many features, assigning a class label, and improving subsequent training and classification procedures. Since significant advantages of EM were demonstrated when creating on their basis advanced multiport AAM and HAM, neural networks (NN) for parallel cluster analysis of images [7, 15, 16], the current task was to study a more general, spatially invariant (SI) equivalence model (SI EM). Namely, such models that are invariant to spatial displacements require a large number of comparisons of image fragments [7, 16, 17]. And comparisons are basic operations in the most promising paradigms of convolutional neural networks (CNN) with deep learning [7, 17]. In our previous article [7], we considered the issues of new possible ways of self-learning in such extended models, explained some important fundamental concepts of various associative recognition and understanding the principles of functioning of biologically inspired NN-structures, simulated processing processes in them [1], trained and extracted patterns in such models, and also proposed their implementation. In article [17], the authors showed that the concept of self-learning works directly with multi-level images. In SI EM, we calculate spatially dependent normalized equivalence functions (SD\_NEF), the elements of which will correspond to the value of the normalized equivalence of the fragment of the input image  $X$  and one of the selected fragments from the training matrix. To implement ESLCNS [1, 17], we need some new or modified known devices capable of

calculating normalized spatial equivalence functions (NSEqF) with the required speed and performance. For all known convolutional neural networks, as well as for our equivalence models, it is necessary to calculate the convolution of the current image fragment in each layer with a large number of templates that are used, selected or formed during the training process. However, as studies show, large images require a large number of filters for image processing, and the size of the filters can also be large. Therefore, there is an acute problem of increasing the computational performance of hardware implementations of such convolutional neural networks.

## II. PRESENTATION OF THE MAIN RESEARCH RESULTS

#### *A. Design of the main unit of Image Processor*

Our proposed allows ranking of signals and forming output signals are shown in Fig. 1. Structure of digital multifunctional image processor (IP) DMIP\_2 based on FPGA with serial input and registers memory to form a vector of signals to be sorted and 1 output using sorting unit (SU) based on modified conveyor homogeneous wave structure (MCHWS) consisting of layers of digital comparison switching circuits is shown in Fig.2. Here variant DMIP\_1 with 10 inputs and 1 output and supply of all input signals in parallel is not shown. For the convenience of data input, we have developed and modeled a processor DMIP\_2 circuit with register memory for fast sequential image input and automatic sequential search of processed windows. It is shown in Fig. 2. Simulation results of DMIP\_2 are shown in Fig. 3, 4.



Figure 1. Graphical representation of the processor operations of ranking.



Figure 2. FPGA Structure of DMIP\_2 with serial input and registers memory to form a vector of signals to be sorted and 1 output.

Since such processor have output signals that are ranked by value and not by difference of values, by some modification [9, 18, 19, 20] they can be used to organize an additional calculation of the difference of signals having neighboring ranks. Besides, the difference in signal values is also necessary for such a function as nonequivalence. Based on the operations of bounded difference and nonequivalence, a whole set of

other continuous logic complex operations and functions are constructed. For example, early we can select one of  $n$  signals by rank using multiplexer. There is only 1 output (ranks). And now we can also form signals difference between max signal and next by order. So we can find signal that is proportional to difference of any two signals from ordered set. Such approach allows to formed output complement signals. If one of reference level is  $D=1$  (255) than difference between the reference and any of signals is the complement of the signal. Therefore, we will develop this idea further, taking into account the fact that the selection, amplification, weighting and addition of signals are simple.

The screenshot shows the Quartus Prime software interface with the following components:

- Project Navigator:** Shows files like processor.v, multiplication.v, connect2.v, connect.v, Block1.bdf, add.v, Waveform.vwf, Main.bdf, min.v, and connect2.v.
- Tasks:** A list of tasks including Compile Design, Analysis & Synthesis, Filter (Place & Route), Assembler (Generate program), and TimeQuest Timing Analysis.
- Code Editor:** Displays Verilog code for a main module with logic for calculating F\_w9 based on Y values from 1 to 9.
- Flow Summary:** A table showing project details:
 

|                       |                                             |
|-----------------------|---------------------------------------------|
| Flow Status           | Successful - Wed Apr 17 15:44:29 2019       |
| Quartus Prime Version | 17.0.0 Build 595 04/25/2017 SJ Lite Edition |
| Revision Name         | processor                                   |
| Top-level Entity Name | Main                                        |
| Family                | MAX II                                      |
| Device                | EP2210F324C5                                |
| Timing Models         | Final                                       |
| Total logic elements  | 1,951 / 2,210 (88 %)                        |
| Total pins            | 181 / 272 (67 %)                            |
| Total virtual pins    | 0                                           |
| UFM blocks            | 0 / 1 (0 %)                                 |

Figure 3. Simulation of DMIP\_2 based on FPGA (window fragments).



Figure 4. Simulation results of DMIP\_2 based on FPGA with serial input and registers memory and 1 output (issuing ranks, one switch)

### B. Modeling of digital multifunctional image processor

The simulation results of DMIP\_3 (Fig. 5) with serial input and registers memory, 2 outputs for rank and rank differences signals weighing-selection processing are shown in Fig. 6-9. As can be seen from Fig. 3, 4, 6, 7 the resources of the Altera FPGA chip EP3C16F484

Cyclone III family are not fully used in the first case, and in the second for the processor with register memory and two outputs almost completely (there is a small margin).



Figure 5. Structure of DMIP\_3 based on FPGA with serial input and registers memory to form a vector of signals to be sorted, 2 outputs for rank and rank differences signals weighing-selection processing.



Figure 6. Simulation of DMIP\_3 based on FPGA with serial input and 2 output (window fragments).

The processing cycle in the pipelined structure of DMIP and SU did not exceed 25 nanoseconds, which makes it possible to achieve an input / output rate of pixels of the processed and processed images at the level of 40MHz. During the processing cycle, DMIP\_1 essentially performs (9\*ln9-estimates for the best algorithms!) Sorting operations and generates all the ranks and their differences, which gives, taking into account the wide variety of output functions, performance estimates of at least  $10^9$  operations per second.



Figure 7. Simulation results of DMIP\_3 with 2 outputs (issuing ranks, two switches) in case of formation of a difference of ranks  $r_2-r_3$ .



Figure 8. A good example of image line processing using proposed DMIP: Original line (red) and received rank and other output functions.



Figure 9. The results of the Amo image transformations using DMIP for different rank values and different functions defined by the control vector  $Y$ .

### III. DISCUSSION

The results obtained confirm the correctness of the chosen concept and the possibility of creating digital multifunctional image processors with an expanded set of nonlinear signal processing operations, including rank and its modifications. In addition, such processors can be easily and quickly and efficiently configured for the desired type and function of processing-transformation. At the same time, it should be noted that with an increase in the size of the filters, i.e. the number of signals to be ranked, the costs of sorting-ranking when implemented in digital formats and the duration of the operation can significantly increase. But one of the options for solving this problem is the implementation of this concept of building an image processor on analog devices, for example, on current mirrors, i.e. the use of processing in analog or hybrid formats. Similar conceptual approaches were proposed in the works [21, 22], where the developed and modeled so-called equivalent neurons were described, which have a processing-conversion time of  $0.1-1\mu\text{s}$ , low supply voltages of  $1.8-3.3\text{V}$ , insignificant relative computational errors (1-5%), low consumption of no more than  $1-2\text{mW}$ , can operate in low power consumption modes (less than  $100\mu\text{W}$ ) and high-speed (10-20MHz) modes. Therefore, it is necessary to further conduct a comparative analysis of these two (digital and analog) formats for implementing the proposed concept of building multifunctional image processors. Their comparison is especially important from the point of view of efficiency relative to energy consumption, since the efficiency of neuro-equivalents relative to energy

consumption is estimated at a level of no less than  $10^7$  an. op./s per W and can be increased by an order of magnitude [21, 22].

The developed processors can become the basis for the implementation of convolutional NNs and self-learning biologically inspired devices with a code-controlled choice of the type of nonlinear, as well as linear, functional transformation and a significant number of processing channels due to the integration of an array of neural-like processors into a chip to implement parallel computing of equivalent convolutions.

### IV. CONCLUSIONS

We show the results of design the new FPGA-DMIPs with digital accuracy. Calculations show that in the case of using Altera FPGA chip EP3C16F484 of Cyclone III family, it is possible to implement DMIP for image size of 64\*64 and window 3\*3 in the one chip. For 2.5V and clock frequency of 200MHz the power consumption will be at the level of 200mW and the calculation time for pixel of filters will be 25ns.

### REFERENCES

- [1] Krasilenko, V. G., Nikolskyy, A. I., Lazarev A.A., "Designing and simulation smart multifunctional continuous logic device as a basic cell of advanced high-performance sensor systems with MIMO-structure," in Proceedings of SPIE, 94500N (2015).
- [2] Lei Yi, Guangbao Shan, Song Liu, Chengmin Xie, High-performance processor design based on 3D on-chip cache, Microprocessors and Microsystems, Volume 47, 2016, Pages 486-490, ISSN 0141-9331, <http://dx.doi.org/10.1016/j.micpro.2016.07.009>
- [3] Krasilenko, V., Ogorodnik, K., Nikolskyy, A., Dubchak, V., "Family of optoelectronic photocurrent reconfigurable universal (or multifunctional) logical elements (OPR ULE) on the basis of continuous logic operations (CLO) and current mirrors (CM)," Proc. SPIE, 8001, (2011).
- [4] Krasilenko, V. G., Nikolskyy, A. I., Lazarev, A. A., "Multichannel serial-parallel analog-to-digital converters based on current mirrors for multi-sensor systems", Proc. SPIE 8550, Optical Systems Design 2012, 855022 (2013); doi:10.1117/12.2001703.
- [5] Krasilenko, V. G., Lazarev, A. A., Nikitovich, D. V., "Simulation of continuously logical base cells (CL BC) with advanced functions for analog-to-digital converters and image processors," Proc. SPIE 10438, 104380K (2017).
- [6] Krasilenko, V., Nikolskyy, A., Zaitsev A., Voloshin V., "Optical pattern recognition algorithms on neural-logic equivalent models and demonstration of their prospects and possible implementations," Proc. SPIE 4387, 247 –260 (2001).
- [7] Krasilenko V.G., Lazarev A.A., Nikitovich D.V., "Modeling and possible implementation of self-learning equivalence-convolutional neural structures for auto-encoding-decoding and clusterization of images", Proceedings of SPIE Vol. 10453, 104532N (2017).
- [8] Volgin, L.I., Mishin, V.A., "Is the future digital or analog?," Information technologies in electric power industry: Cheboksary: RESCNIT, 86-89, (1998).
- [9] Krasilenko, V.G., Lazarev A.A., Nikitovich D.V., "Design and simulation of image nonlinear processing relational preprocessor based on iterative sorting node," Proc. SPIE 11028, Optical Sensors 2019, 110282X (11 April 2019); doi: 10.1117/12.2524114.
- [10] Krasilenko, V. G., Saletsky, F. M., Yatskovsky, V. I., Konate, K., "Continuous logic equivalence models of Hamming neural network architectures with adaptive-correlated weighting," Proceedings of SPIE Vol. 3402, pp. 398-408 (1998).
- [11] Krasilenko, V. G., Magas, A. T., "Multiport optical associative memory based on matrix-matrix equivalentors," Proceedings of SPIE Vol. 3055, pp. 137 – 146.
- [12] Krasilenko V.G., Nikitovich D.V., "Experimental studies of spatially invariant equivalence models of associative and hetero-associative memory 2D images," Systemy obrobky informaciji, 4 (120), 113 –120 (2014).
- [13] Krasilenko V.G., Nikolskyy, A. I., "The associative 2D-memories based on matrix-tensor equivalent models," Radioelektronika Inform. Communication, 2 (8), 45 –54 (2002).
- [14] Krasilenko, V. G., Lazarev, A., Grabovlyak, S., "Design and simulation of a multiport neural network heteroassociative memory for optical pattern recognitions," Proceedings of SPIE Vol. 8398, 83980N-1 (2012).
- [15] Krasilenko V. G., Lazarev, A., Grabovlyak, S., Nikitovich D.V., "Using a multi-port architecture of neural-net associative memory based on the equivalency paradigm for parallel cluster image analysis and self-learning," Proceedings of SPIE Vol. 8662, 86620S (2013).
- [16] Krasilenko V.G., Nikitovich D.V., "Simulation of self-learning clustering methods for selecting and grouping similar patches, using two-dimensional nonlinear space-invariant models and functions of normalized "equivalence," Electronics and information technologies: collected scientific papers, Lviv: Ivan Franko National University of Lviv, Issue 6, pp. 98-110 (2015).
- [17] Krasilenko V.G., Lazarev A.A., Nikitovich D.V., "Modeling of biologically motivated self-learning equivalent-convolutional recurrent-multilayer neural structures (BLM\_SL\_EC\_RMNS) for image fragments clustering and recognition", Proceedings of SPIE Vol. 10609, MIPPR 2017: Pattern Recognition and Computer Vision, 106091D (8 March 2018); doi: 10.1117/12.2285797.
- [18] Krasilenko V.G., Lazarev A.A., Nikitovich D.V. "Design of neuron-calculators for the normalized equivalence of two matrix arrays based on FPGA for self-learning equivalently convolutional neural networks (SLE\_CNNs)". Proc. SPIE 10996, Real-Time Image Processing and Deep Learning, 2019. 109960P (14 May 2019); doi: 10.1117/12.2518206; <https://doi.org/10.1117/12.2518206>
- [19] Krasilenko V.G., Lazarev A.A., Nikitovich D.V. "Rank differences of signals by weighing-selection processing method for implementation of multifunctional image processing processor". Emerging Imaging and Sensing Technologies for Security and Defence IV, Richard C. Hollins; Gerald S. Buller; Robert A. Lamb; Martin Laurenzis, Editors. Proceedings of SPIE 11163. 2019. (SPIE, Bellingham, WA 2019), 111630J.
- [20] Krasilenko V.G., Lazarev A.A., Nikitovich D.V. "Modeling nonlinear image processing algorithms using a processor based on the sorting node". Emerging Imaging and Sensing Technologies for Security and Defence IV, Richard C. Hollins; Gerald S. Buller; Robert A. Lamb; Martin Laurenzis, Editors. Proceedings of SPIE 11163. 2019. (SPIE, Bellingham, WA 2019), 111630I.
- [21] Krasilenko V.G., Lazarev A.A., Nikitovich D.V. Design and Simulation of Array Cells of Mixed Sensor Processors for Intensity Transformation and Analog-Digital Coding in Machine Vision. Machine Vision and Navigation. monograph. Springer. 2020. P. 87-132. ISBN 978-3-030-22586-5 ISBN 978-3-030-22587-2 (eBook) <https://doi.org/10.1007/978-3-030-22587-2>
- [22] Krasilenko, V., Nikitovich, D., Lazarev, A. Modeling Nodes and Cells of Neuron-Equivalentors as Accelerators of Equivalental-Convolutional Self-Learning Neural Structures. Modeling, Control and Information Technologies: Proceedings of International SP Conference, (7), 247–250. (2025). <https://doi.org/10.31713/MCIT.2024.077>