Object code implemented as hardware components do not always guarantee the best performance. We know that we need a runtime: this can be realised by means of a standalone solution which is providing an optimized solution for the specific problem, but it requires large implementation efforts, since it is necessary to rewrite the whole application each time the underlying hardware changes. Or it can be implemented via an operating system which is, on the other hand, oriented towards increasing the level of abstraction and flexibility of the system, and simplifying module requests by providing a set of high-level system calls. Within this context, the development of an operating system for reconfigurable devices makes the whole system more flexible, and increases the level of abstractions for the final user and this is why in the following example I’m going to assume we are going to use a GNU/Linux based runtime. Now, let us supposed to have a scenario which involves two algorithms for cryptography: the Advanced Encryption Standard (AES) and the Data Encryption Standard (DES) The AES algorithm is a cryptography algorithm adopted as a United States Government standard, and it works on 128-bit data blocks using different key lengths. In the considered implementation, only the encryption operation using a 128-bit key is considered. On the other hand, the DES algorithm is the predecessor of the AES and employs a shorter key, which is 56-bit long, to encrypt or decrypt a 64-bit block of input data. Both a hardware and a software implementation are available for each cryptography algorithm. The system is implemented on a System-on-Chip, where the software implementations are executed on the processor implemented on the FPGA logic and executing at 100 MHz. Furthermore, they are handled by the OS as high-priority processes, and therefore their execution times are not affected by most of the tasks which are always active in the system. The hardware implementations of the AES and the DES functionalities are implemented as reconfigurable functional unites, and they can be placed on the FPGA on demand. The two cryptographic functionalities are registered via a centralized manager, which makes them available to userspace applications. Software applications may request one of the two algorithms, and they are required to specify the size of the input text to be encrypted. A large text must be split into fixed-size blocks which are encrypted separately, and thus each block of data requires a separate execution of the algorithm. When an AES or a DES request is received by the centralized manager, the operating system evaluates whether the hardware or the software implementation should be chosen. The selection of the implementation is based on the input size and on the number of available slots to allocate the IP-Core These figures show the execution of the hardware and software implementations of AES and DES. As expected, the latency of the software executions increases linearly with the input size, with only small variations due to the time sharing policy that is imposed by the Operating System on the processor. The hardware implementation exploits the inherited parallelism of hardware circuits, and hence performs much better than the software counterpart. However, the execution of the IP-Core version of the algorithm requires the dynamic reconfiguration of the hardware module itself, which introduces a nearly constant delay. Given the large size of the IP-Cores - whose partial bitstreams reaches the 500 kiloBytes -, the delay can be estimated in the order to 150 milliseconds, which makes the hardware implementation inefficient for input data of small size. The IP-Core outperforms the software implementation when the input size exceeds a threshold, which depends on the particular algorithm. As an example, results show that a software execution of the AES algorithm is more efficient if less than 40 blocks of data are processed. On the other hand, the reconfiguration and the execution of the DES IP-Core does not provide a significant improvement if less than 80 blocks are encrypted. This information can be employed by the OS - in addition to area availability - in order to select the proper implementation at runtime. Furthermore, the table shows how the throughput of the software and the hardware implementations of the AES and the DES algorithms is influenced by the input data size, assuming that the IP-Core must be configured on the device. The throughput of the two software versions is nearly constant, whereas it linearly increases for the hardware implementations, which explains the poor performances of dynamic reconfiguration for small input data. Module caching can vastly improve the performances, since it removes the reconfiguration overhead. The idea behind modules caching is quite simple: caching a module means that once it has been configured on the FPGA we are not going to remove it to reuse it in the future. A caching mechanism can be successfully applied in this case study when a requested functionality is already configured in an FPGA slot and it is currently unused. If the corresponding IP-Core is immediately available, the throughput of the AES algorithm is constant and it is equal to 436 kBytes/s, while the DES algorithm shows a constant throughput of 246 kBytes/s. This is just an example. We can have several applications that may vary performance based on several factors and the size of the inputs is just one of them. This is exactly the reason why we need to have, not only an hardware architecture capable to be adapted at runtime but also a runtime that may monitor the system performance and take decision based on them.