Sažetak (engleski) | The doctoral dissertation is the result of the research in the field of design space exploration as the keypart in the development of heterogeneous multiprocessors systems. Due to the ever increasing complexity of embedded systems applications, which pervade the modern world, it is required that each new generation provides more and more advanced functionality. Designing heterogeneous multiprocessor systems is a major challenge due to the number and variety of components, which significantly increase the size of the design space, i.e. the number of possible solutions. This creates a gap between the speed of obtaining potential solutions and the accuracy of the evaluation of their performance. In order to reduce the gap, it is necessary to model the entire system using high-level abstraction models which can be then used to obtain a preliminary performance assessment of possible solutions at a very early stage and prone the design space quickly. In this dissertation a new design space exploration method is proposed, specially targeting heterogeneous multiprocessor systems. It includes an early estimation of runtime performance and heuristics to solve an optimization mapping and scheduling optimization problem for heterogeneous MPSoC systems. Early estimation of application execution time is based on the novel concept of elementary operations, which enables estimation of the duration of execution of individual operations on different platform configurations without creating a processor data path and cache models. Using elementary operations, abstract models of application and multiprocessor platforms are built. These models are used in a heuristic design space exploration method based on the NSGA-II evolutionary algorithm with adaptation to the specifics of heterogeneous MPSoC systems. When mapping parts of an application to platform elements, computation and communication are optimized simultaneously according to two criteria: runtime and occupancy. Furthermore, a special case of platforms - the so-called "sparsely connected platforms", where processing elements are not connected to every memory element, is also considered and included in the proposed design space exploration method. Overall, the proposed method enables a modular and scalable approach to design space exploration which is based on high abstraction level and high reusability of intermediate results, thereby reducing the gap between the speed of obtaining potential solutions and the accuracy of their performance evaluation. Chapter 1 introduces the basic concepts like embedded platforms, heterogeneity etc. and presents the motivation behind the research conducted. The contributions of the doctoral research and an overview of the work are also outlined. Chapter 2 presents the basic theory of modern approaches to embedded systems design and development, with focus on abstraction levels and existing design methodologies. Three major embedded system development methodologies are presented, with a special emphasis on System-level design methodology (SLD), which is particularly suited for the development of heterogeneous multiprocessor systems. At the core of all these methodologies is the possibility of early performance estimation and model evaluation to enable early decision making and quick design space pruning. Chapter 3 addresses the issue of estimating application execution duration solely based on the source code. First, the state of art in the field of source-level timing estimation is presented. Then, the proposed method for estimating the duration of the application execution - called ELementary OPerationS based Estimation Method (ELOPS-EM) is presented. Central to the proposed method is introduction of the novel concept - elementary operations, defined as syntax units that are viewed as a whole in the source code of applications written in C, and whose execution time on the platform can be measured completely independently of other parts of the code and the application itself in which they appear. Different code statements are classified as various types of elementary operations, and in this way, profiles of all applications and platforms are created. Based on these models, the duration of the entire application can be estimated without the need for simulation at the basic block level and without the development of a mathematical model of the processor data path and cache, which is required by most current approaches. Application and platform profiles are completely independent of one another and it is therefore possible to reuse previously acquired profiles for different combinations of applications and platforms. The method has been evaluated in an environment that represents a typical real-world heterogeneous multiprocessor system. The JPEG compression algorithm and the AES encryption algorithm were selected as test applications, and the target test platform was Xilinx Zynq ZC706 in several different configurations. Overall, timing estimation accuracy is nearly at the same level for all test cases: average error is around 5\%, and maximum error remains below 17\%, which is comparable to other state of art source-level timing estimation methods with having additional profile reusability to minimize future time and effort. The amount of underestimated timing values is around the same as the amount of overestimated values, leaving it to future research to investigate further in which particular cases does the ELOPS-EM method give under or over estimated values. From the aspect of compiler code optimization, there seems to be no difference in the estimation accuracy for different optimization levels. The topic of Chapter 4 is design space exploration of heterogeneous embedded systems. At the beginning, an overview of the state of art is given. This is followed by a presentation of the proposed system model and the proposed design space exploration method, as well as the evaluation of the results on artificial and real test cases. The proposed method for design space exploration is based on a high-level system model, following the principles of system-level design (SLD), which assumes having separate application and platform models and separating computation from communication in those models. The system model includes application and platform models which are built based on the concept of elementary operations and the application and platform profiles described in the previous chapter. These models are used in a heuristic design space exploration method that aims to find the optimal scheme for mapping parts of an application to platform elements and to determine the schedule of their execution. Considering the great impact of memory configuration on the performance of the entire system, both computation and communication are mapped: computation to processing elements and communication channels to memory elements. The method is based on the evolutionary algorithm for multi-criteria optimization - NSGA-II, and two criteria are optimized: runtime and total number of platform elements used. Two variants of design space exploration method are considered: simultaneous mapping (SDSE) and two-stage mapping (2SDSE). In the simultaneous mapping approach - SDSE, procedures (i.e. computation) are mapped to processors and communication channels between procedures are mapped to memory at the same time. Alternatively, the two-stage mapping - 2SDSE is performed in two iterations: first, the procedures are mapped to the processors and then the communication channels are mapped to the memory elements. The evaluation has been done on both artificially created models and models of real applications and platforms. The selected real applications have been JPEG compression and Ray Tracing, and the platforms have been Xilinx ZC706 and Adapteva Parallella. Three evaluation measures are used: hypervolume, inverted generational distance and correlation with reference front. Overall, SDSE algorithm is undoubtedly more successful than the 2SDSE algorithm according to all three measures. In certain cases, where the communication to computation ratio is very low and there are many procedures which cannot be executed on all processors, both algorithms give equally good results which is expected given the fact that communication plays a much less significant role and the number of feasible solutions in relatively low. At the end of the chapter, a special consideration is given to the case when not all processors are connected to all memory elements, i.e. the platform is sparsely connected. In such case, the number of infeasible solutions can significantly exceed the number of feasible solutions, which makes the design space exploration even more complex. In such cases, it has been shown that the SDSE algorithm cannot always find feasible solutions, and thus the SDSE algorithm has been varied into four additional versions specifically tailored to such platforms. The performance of each version has been evaluated on specially crafted artificial models and the results show that the initial version of SDSE algorithm is overall the most successful in finding solutions, especially in cases when not all procedures can be executed on all processors. But, other two versions of the algorithm: C-SDSE and MCN-SDSE, which are designed to target clusters of processors on such platforms, outperform SDSE when the problem is slightly relaxed so that most procedures can execute on all processors. Also, there is a certain level of complementarity between SDSE and MCN-SDSE - in cases where SDSE cannot find any feasible solutions, MCN-SDSE has a great chance to find a solution and vice versa. This demonstrates that SDSE algorithm is most versatile, but introducing certain modifications tailored to suit specific platform configurations, like targeting clusters of processors, can yield better results in such situations. Chapter 5 concludes the dissertation and provides final considerations. In it, the realization of expected contributions of the dissertation is briefly analysed, and a guidance for future research is provided, which will be mainly in the direction of extending the method for source level timing estimation on other platform architectures like DSP and CISC, and further improvement of the design space exploration method to suit better the different types of sparsely connected platforms. To summarize, the three major contributions of this doctoral thesis are: • A new method for application source code classification and platform profiling based on the concept of elementary operations. • High level application and hardware platform models built from elementary operations profiles and used for analytical estimation of application execution duration. • A new multiobjective design space exploration method for heterogenenous systems which employs the high level application and platform models. |