Data Compression/Executable Compression

< Data Compression

executable software compression

In this chapter, we are careful to distinguish between 3 kinds of software:

Executable files are, in some ways, similar to English text -- and source code is even more similar -- and so data compression software that was designed and intended to be used on English text files often works fairly well with executable files and source code.

However, some techniques and concepts that apply to executable software compression that don't really make sense with any other kind of file, such as:

compress then compile vs compile then compress

Which stage of compilation gives the best compression:

Some very preliminary early experiments[23] give the surprising result that compressed high-level source code is about the same size as compressed executable machine code, but compressing a partially-compiled intermediate representation gives a larger file than either one.

filtering

Many data compression algorithms use "filter" or "preprocess" or "decorrelate" raw data before feeding it into an entropy coder. Filters for images and video typically have a geometric interpretation. Filters specialized for executable software include:

Several programmers believe that a hastily written program will be at least 10 times as large as it "needs" to be. [26] [27]

A few programmers believe that 64 lines of source code is more than adequate for many useful tools.[28]

The aim of the STEPS project is "to reduce the amount of code needed to make systems by a factor of 100, 1000, 10,000, or more." [29]


Most other applications of compression -- and even most of these executable compression techniques -- are intended to give results that appear the same to human users, while improving things in the "back end" that most users don't notice. However, some of these program compression ideas (refactoring, shared libraries, using higher-level languages, using domain-specific languages, etc.) reduce the amount of source that a human must read to understand a program, resulting in a significantly different experience for some people (programmers). That time savings can lead to significant cost reduction.[30] Such "compressed" source code is arguably better than the original; in contrast to image compression and other fields where compression gives, at best, something identical to the original, and often much worse.

References

  1. Embedded Linux wiki: "Fast Kernel Decompression"
  2. Smallcode: "Self-extracting executables " by Peter Kankowski, with a wiki discussion at strchr: "creating_self-extracting_executables"
  3. UPX is a portable executable packer for several different executable formats including linux/elf386, vmlinuz/386 and win32/pe.
  4. ficl "Ficl ... LZ77 compressed ... with the runtime decompressor, the resulting Ficl executable is over 13k smaller"
  5. "Managing code size"
  6. 1 2 3 "Enhanced Code Compression for Embedded RISC Processors" by Keith D. Cooper and Nathaniel McIntosh
  7. "Java: Trees Versus Bytes" thesis by Kade Hansson. "A Compact Representation For Trees ... is the most dense executable form that he knows of. ... The creation of compact tree representations ... achieving better compression and developing faster codecs will no doubt be a growth area of research in coming years. One such project in this area may lead to an eventual replacement for the Java class file format."
  8. Javascript Compression tools
  9. w: minification (programming)
  10. "Code compression under the microscope" by Jim Turley 2004
  11. Charles Lefurgy, Eva Piccininni, and Trevor Mudge. "Reducing Code Size with Run-time Decompression".
  12. "Brian Writes about His BOEL (Part 1)" (a Linux distribution that fits on a single floppy)
  13. "BOEL, Part 2: Kernel Configuration and Booting"
  14. "2-Disk Xwindow embedded Linux": "a single floppy X Window System thin client."
  15. embedded Linux wiki: "Technologies for decreasing system size"
  16. Microprocessor Design/Code Density
  17. 1 2 "Optimizing for Space : Measurements and Possibilities for Improvement" by Árpád Beszédes, Tamás Gergely, Tibor Gyimóthy, Gábor Lóki, and László Vidács 2003 from the GCC ARM Improvement Project at the University of Szeged
  18. uClibc FAQ
  19. "Embedding with GNU: Newlib" by Bill Gatliff 2001; "Embedding GNU: Newlib, Part 2" by Bill Gatliff 2002
  20. "COMPASS - A tool for evaluation of compression strategies for embedded processors" by Menon and Shankar, which cites "A Class of Code Compression Schemes for Reducing Power Consumption in Embedded Microprocessor Systems" by Benini, Menichelli, and Olivieri
  21. "A new visualization for packed and self-modifying programs".
  22. "Slim Binaries"
  23. "Code Size When Compiling to JavaScript" http://mozakai.blogspot.com/2011/11/code-size-when-compiling-to-javascript.html
  24. 1 2 Colin Percival, Naive differences of executable code, http://www.daemonology.net/bsdiff/, 2003. [http://www.daemonology.net/bsdiff/
  25. "How Courgette works" by Stephen Adams 2009 at The Chromium Projects.
  26. "Thoughtful Programming and Forth" by Jeff Fox
  27. "1x Forth" by Charles Moore 1999
  28. "useful source code in 64 lines or less"
  29. "STEPS Toward The Reinvention of Programming" 2007 apparently (?) written by Alan Kay, Ted Kaehler, Ian Piumarta, Yoshiki Ohshima, Alex Warth, Takashi Yamamiya, Dan Amelang, Scott Wallace, Dan Ingalls, Andreas Raab, Jim Clune, John Maloney, Dave Smith, Kim Rose, Chuck Thacker, Vishal Sikka, and David Reed.
  30. "Code Reduction is Job #1"; "Should Code Reduction be Job #1?"
This article is issued from Wikibooks. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.