Proof preservation and program compilation

César Kunz

Résumé

Software applications have gained a notable role in our everyday activities, mobile code applications being a significant portion of these software agents. The mobile code paradigm entails the distribution of applications from the code producer to heterogeneous client environments in which they are executed. An extended practice of this paradigm consists in the development of third party components, that are transferred across an untrusted network such as Internet and finally integrated in a host execution environment such as a PC or a cellular phone. Naturally, this computational environment opens the door to the deployment of malicious code in client workstations. In some cases a potential misbehavior of the mobile code does not constitute a serious risk, for example when the integrity of the data affected by the execution is irrelevant, or when the execution architecture imposes strong constraints on the computational capabilities of the foreign application. There still are, however, situations in which it is essential to verify the functional correctness of the code before executing it, for instance when the confidentiality of critical data such as credit card information could be compromised, or when the execution environment does not have a special mechanism to monitor excessive resource consumption. George Necula proposed a technique to provide trust to code consumers about the program correctness without trusting the code producer side. The technique, Proof Carrying Code (PCC), consists in deploying code together with a formal proof of its correctness. Correctness is an inherent property of the received code that cannot be inferred from the identity of the code producer. That naturally puts PCC in advantage with respect to methods based on trusting a third party authority. Indeed, a signature from a trusted authority it is not sufficient to provide absolute trust on the execution of the received code. From its origins, the typical mechanism of PCC to generate certificates relies on a certifying compiler, an extension of a standard compiler that automatically produces certificates for decidable safety policies. Extending the set of enforceable properties is challenging and in the most general case it requires human interaction. One possibility, is to verify the final executable code from scratch. However, the lack of structure makes low level code verification a less natural, and hence more daunting, task. This, together with the fact that verification environments target mostly high level code, makes more appropriate the idea of moving the generation of certificates at the source code level. The main drawback of producing certificates ensuring source code correctness is that they do not entail correctness of the final compiled code. Several techniques may be proposed to transfer evidence of program correctness from the source code to the executable counterpart. That includes, for instance, deploying the source program and original certificates in addition to the executable code and certifying the correctness of the compilation process. However, this approach is not satisfactory, since in addition to require availability of the source code, a certificate ensuring compiler correctness can be prohibitively large. A more viable alternative is to provide a mechanism to generate certificates for compiled code from certificates of the original source program. Compilers are complex procedures composed of several steps, in which the original program is progressively transformed into intermediate lower level representations. Barthe et al. and Pavlova have shown that original certificates are preserved, up to minor differences, along with the first compiler phase: a nonoptimizing compilation from source level to an unstructured intermediate representation. However, compiler optimizations applied to the intermediate representations represent a challenge since a priori certificates cannot be reused. In this thesis, we analyze how optimizations affect the validity of certificates and propose a mechanism, Certificate Translation, to transform certificates in order to overcome the effects of program transformations, rendering feasible the generation of certificates for executable mobile code at the source level. Chapter 2 introduces certificate translation for common compiler optimizations applied in the context of an intermediate RTL language. The main difficulties are presented, and a classification of program transformations is given in terms of the effort required to transform the original certificates. A general scheme is provided to cover transformations that performs arithmetic simplifications such as constant propagation and common subexpression elimination. In addition, ad-hoc techniques are proposed for other standard optimizations that do not correspond to this classification. Chapter 3 studies the existence of certificate translators in a mild extension of an abstract interpretation framework that includes a notion of certificates. This abstract setting provides considerable advantages with respect to the approach of Chapter 2 since it allows us to extend certificate translation to diverse programming languages and several analysis environments. Certificate transformation is studied in this abstract setting for simple program transformations which can be composed to represent a wide variety of program optimizations, for instance those considered in Chapter 2. In a second part of the thesis, we extend certificate translation to less typical settings, that can be of interest in PCC scenarios. More precisely, the chapters in the second part consider the aspect oriented paradigm, hybrid verification methods and parallel programs executing in memory hierarchies. For each setting, appropriate analysis and verification settings are provided. Then, the effect of common transformations over verification results are studied.

Le paradigme du code mobile implique la distribution des applications par les producteurs de code à environnements hétérogènes dans lesquels elles sont exécutées. Une pratique étendue de ce paradigme est constituée par le développement d'applications telles que les applets ou les scripts Web, transferés à travers un réseau non sécurisé comme Internet à des systèmes distants, par exemple un ordinateur, un téléphone mobile ou un PDA (Assistant personnel). Naturellement, cet environnement peux ouvrir la porte au déploiement de programmes malveillants dans des plateformes distantes. Dans certains cas, la mauvaise conduite du code mobile ne constitue pas un risque grave, par exemple lorsque l'intégrité des données affectées par l'exécution n'est pas critique ou lorsque l'architecture d'exécution impose de fortes contraintes sur les capacités d'exécution du code non sécurisé. Il y a toujours, toutefois, des situations dans lesquelles il est indispensable de vérifier la correction fonctionnelle du code avant son exécution, par exemple lorsque la confidentialité de données critiques comme l'information des cartes de crédit pourrait être en danger, ou lorsque l'environnement d'exécution ne possède pas un mécanisme spécial pour surveiller la consommation excessive des ressources. George Necula a proposé une technique pour apporter de la confiance aux consommateurs sur la correction du code sans faire confiance aux producteurs. Cette technique, Proof Carrying Code (PCC), consiste à déploier le code avec une preuve formelle de sa correction. La correction est une propriété inhérente du code reçuu qui ne peut pas être directement déduite du producteur du code. Naturellement, cela donne un avantage à PCC quant-aux méthodes basées sur la confiance à l'autorité d'un tiers. En effet, une signature d'une autorité ne suffit pas à fournir une confiance absolue sur l'exécution du code reçu. Depuis les origines du PCC, le type de mécanisme utilisé pour générer des certificats repose sur des analyses statiques qui font partie du compilateur. Par conséquent, en restant automatique, il est intrinsèquement limité à des propriétés très simples. L'augmentation de l'ensemble des propriétés à considerer est difficile et, dans la plupart des cas, cela exige l'interaction humaine. Une possibilité consiste à vérifier directement le code exécutable. Toutefois, l'absence de structure rend la vérification du code de bas niveau moins naturelle, et donc plus laborieuse. Ceci, combiné avec le fait que la plupart des outils de vérification ciblent le code de haut niveau, rend plus appropriée l'idée de transferer la production de certificats au niveau du code source. Le principal inconvénient de produire des certificats pour assurer l'exactitude du code source est que les preuves ne comportent pas la correction du code compilé. Plusieurs techniques peuvent etre proposées pour transférer la preuve de correction d'un programme à sa version exécutable. Cela implique, par exemple, de déployer le programme source et ses certificats originaux (en plus du code exécutable) et de certifier la correction du processus de compilation. Toutefois, cette approche n'est pas satisfaisante, car en plus d'exiger la disponibilité du code source, la longueur du certificat garantissant la correction du compilation peut être prohibitive. Une alternative plus viable consiste à proposer un mécanisme permettant de générer des certificats de code compilé à partir des certificats du programme source. Les compilateurs sont des procédures complexes composées de plusieurs étapes, parmi lesquelles le programme original est progressivement transformé en représentations intermédiaires. Barthe et al. et Pavlova ont montré que les certificats originaux sont conservés, à quelques différences près non significatives, par la première phase de compilation: la compilation non optimale du code source vers une représentation non structurée de niveau intermédiaire. Toutefois, les optimisations des compilateurs sur les représentations intermédiaires représentent un défi, car a priori les certificats ne peuvent pas être réutilisés. Dans cette thèse, nous analysons comment les optimisations affectent la validité des certificats et nous proposons un mécanisme, Certificate Translation, qui rend possible la génération des certificats pour le code mobile exécutable à partir des certificats au niveau du code source. Cela implique transformer les certificats pour surmonter les effets des optimisations de programme.

Proof preservation and program compilation

Préservation des preuves et transformation de programmes

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager