Talk:Virtual memory compression

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Changes to the article[edit]

Dsimic: I'm relatively new to wikipedia editing, but it seems to me that if you don't understand something, you should ask before making wholesale changes. This is likely to result in less work all around. I guess I'll address your questions when I'm able, in the meantime maybe you should undo your substantive changes. For example improevent in i/o channel speed would reduce the need for compression not the amount of paging. Improvement to on cpu caching reduces memory access and improves the RAM to i/o channel performance for the *most active* pages. Adding compressed memory cache into the mix can reduce the efficacy of the on cpu cache by spreading memory accesses to the lru list used for paging and away from active program memory. — Preceding unsigned comment added by ScholarWarrior (talkcontribs) 13:46, 6 January 2015 (UTC)[reply]

In the first instance, during paging to auxiliary storage, very few cpu cycles are required because the i/o is typically done via dma without any cpu involvement. The cpu can therefore be dedicatedone to dealing with the highest priority data. If you add compression which uses cpu cycles (as opposed to a dedicated compression procerssor) you end up with 2 negative consequences: first as I mention above, you reduce the efficacy of the cpu cache; second you divert processing power from performing a useful task on high priority data, say computation on spreadsheet, to performing compression on data that is non critical, since it is being paged out.

In the second instance, read only code is never paged out--it is discarded. If you spend time compressing this code, you are spending time doing something which may be completely unnecessary (the code may never be re-used), and in any event the only i/o that would occur to restore the code is a read, vs. A write, then a read for other paging operations. So the bar is higher.

Fortunately in most implementations, discard operations by the OS will not trigger the compression code. On the other hand this means that compression fails to address the disk accesses caused by the VMM's handling of program code. So a common misconception about this technology is that it shrinks the size of programs. This is not so because: a. Code is not highly compressible, and b. Code usually doesn't contribute to thrashing in a way that is ever addressed by most compression implementations and if one did attempt to compress discarded code, the compression and decompression would need to be collectively faster than a read operation, where other compress-decompress operations only need to be faster than a write PLUS a read. — Preceding unsigned comment added by ScholarWarrior (talkcontribs) 16:05, 6 January 2015 (UTC)[reply]

Hello there!
Well, I wouldn't say that I didn't understand something, and you can always revert whatever you find in need to be reverted. Please feel free to revert anything, I won't cry. :) As you've pointed out, I've also thought a lot about discussing the whole thing before doing any edits, but there were just too many things that were incorrect to a certain degree. At the same time, please don't get me wrong, I highly respect anyone's contributions to Wikipedia.
If you agree, let's start by discussing the paragraph you've restored as the Virtual memory compression § Prioritization subsection:
In a typical virtual memory implementation, paging happens on a least recently used basis, potentially causing the compression algorithm to use up CPU cycles dealing with the lowest priority data. Furthermore, program code is usually read-only, and is therefore never paged-out. Instead code is simply discarded, and re-loaded from the program’s auxiliary storage file if needed. In this case the bar for compression is higher, since the I/O cycle it is attempting to eliminate is much shorter, particularly on flash memory devices.
Ok, LRU is most commonly used, that's fine, but what does the "dealing with the lowest priority data" is supposed to mean? It surely is that the lowest priority data becomes paged out, but it has to be dealt with as that's the whole point. Next, why does the program code affect prioritization? If unmodified, it will never be paged out, and why would that raise the "bar for compression"? Sorry, but that section makes no sense whatsoever, simply as only some very unintelligent virtual memory compression implementations would compress and page out something that doesn't get paged out in the first place.
Next, let's discuss this change you've reverted; the revert changed "accessible" to "inaccessible" in the following (emphasis added):
In a virtual memory compression system, paging requests are compressed and stored in primary storage (usually RAM) or sent as compressed to auxiliary storage. In both cases the original memory is marked inaccessible.
Your edit summary described it this way, which is correct as a statement:
Paged out memory is marked inaccessible. Any access causes a memory fault which causes the VMM to page in the region and then mark it accessible.
As we know, the whole point of paging something out is to free some RAM. Thus, you should've described it better if the intention was to describe the paging-in process associated with subsequent page faults. When "the original memory" is mentioned that way, it implies the memory locations/pages initially occupied by what's now paged out, and those locations become accessible again as part the operating system's RAM pool. In a summary, the original and now restored version is highly confusing.
Also, could you please explain which graph shows the exponential relationship "between paging activity and available memory"? Sorry, but in the provided reference I see no charts that would depict a clear exponential relationship, and especially not the exact relationship you're referring to. At the same time, saying that something is exponentially related to another thing makes little sense if not described further; in this example, it is perfectly fine to parse it as having the paging activity going up radically when there's more RAM, what would make no sense except under specific circumstances.
Lastly, having more than a few short sections doesn't improve clarity, it just makes the whole thing "chopped" and harder to read. Of course, those are all just my points of view, and I'm more than happy to discuss the whole thing further. — Dsimic (talk | contribs) 19:44, 6 January 2015 (UTC)[reply]
I won't have time to explain better than I have until the weekend. Please take my word that I'm an expert on this topic and read what I wrote more carefully. When I have time I'll try to answer your questions and concerns. — Preceding unsigned comment added by ScholarWarrior (talkcontribs) 22:20, 6 January 2015 (UTC)[reply]
Just take your time, there's no hurry. Looking forward to discussing it further. — Dsimic (talk | contribs) 22:30, 6 January 2015 (UTC)[reply]
Ok. Couple of quick notes. The graph is the s shape at the top of page 19. It's technically sigmoid. The memory mraked inaccessible is the memory region associated with the data that has been paged out. The underlying RAM is then used elsewhere. — Preceding unsigned comment added by ScholarWarrior (talkcontribs) 22:36, 6 January 2015 (UTC)[reply]
Well, there's no page 19 in the provided reference. :) Do you actually refer to the figure 3 on page 918 (page 4 in the PDF excerpt), with "missing-page probability" as its caption? By the way, could you please sign your posts by adding ~~~~ at the end of each post? Thanks. — Dsimic (talk | contribs) 22:43, 6 January 2015 (UTC)[reply]
Yes. That's the one. — Preceding unsigned comment added by ScholarWarrior (talkcontribs) 01:27, 7 January 2015 (UTC)[reply]
Well, that simply isn't an exponential relationship, and as we know a section of a chart simply can't be used to determine the overall nature of a relationship; I'd easily find a small exponential-like portion in almost all charts. :)
Also, as described on page 3 in the paper, chart in figure 3 depicts a relationship between the number of running programs (x-axis) and missing-page probability (y-axis); the amount of main memory (M) stays unchanged during the analysis. Translating that into a relationship "between paging activity and available memory", as you've described it in the article, takes out a lot of what the analysis is actually based on and turns it into a misleading statement. It was actually my mistake not to correct it in a better way so it speaks of a direct relation between the paging activity, the number of processes, and the amount of allocated main memory (instead of the amount of available memory, which implies how much RAM is installed in a computer).
Again, please sign your posts and consider indenting them properly as well. Thank you. — Dsimic (talk | contribs) 06:02, 7 January 2015 (UTC)[reply]
A sigmoid function is a logistic function in which "The initial stage of growth is approximately exponential; then, as saturation begins, the growth slows, and at maturity, growth stops." I didn't think the distinction needed to be called out specifically in the body. In this specific case, once the exponential growth stage is hit, the fact that it levels off as "miss probability" approaches 100% is not really relevant, since the system is spending nearly 100% of it's time on paging activity.
I also think the distinction between increasing number of processes and increasing memory use is not meaningful in this context, but if you find a different reference which shows the relationship between paging activity and memory use as somehow distinctly non exponential, lets discuss.
The issue here is that as long as there is a significant difference in access time between pages which are present in RAM and those which are not, there will be a large jump in lag time as the probability of a not-present page increases. By using RAM to store compressed data one approaches that threshold faster. This issue affects performance of the system more drastically when it is the CPU itself which is doing the compressing and decompressing.
As for the rest, why don't you break your issues into sections below, as the above thread is too long and confusing for me to follow at this point. ScholarWarrior (talk) 16:44, 9 January 2015 (UTC)[reply]
I would also note that the reference you provided shows an exponential relationship between miss rate and installed RAM on page 13 of the pdf. So perhaps we can add that reference in-line to the text. ScholarWarrior (talk) 18:30, 9 January 2015 (UTC)[reply]
The way you've above described the sigmoid relationship and its exponential portion is the way it should've been described in the article. :) By calling it an exponential relationhip "between paging activity and available memory", without describing further its nature, simply turns the whole thing into an unparsable and misleading statement. At the same time, we need to closely follow the references, and the reference speaks about a relation between the paging activity, the number of processes, and the amount of allocated main memory.
Regarding the usage of CPU time and effects on overall performance when using compression, we would also need to take contemporary multi-core CPUs into account. Also, as we know, crypto accelerators pretty much died off as they simply couldn't compete with general-purpose CPUs; transferring data over a bus eliminates nearly all performance advantages that a dedicated sillicon can offer, and that translates to what off-die compression accelerators could offer.
Sorry, but I'm not sure which reference you refer to? The only reference I've used above is this one, and that's actually the reference you've added into the article. It has only eight pages, while its page numbering starts with 915; thus, there's no page 13. :)
Perhaps it would be the best to discuss this relationship further, and then we could move to remaining issues. Hope you agree. — Dsimic (talk | contribs) 21:45, 9 January 2015 (UTC)[reply]
I'm referring to this:The Compression Cache, page 13 of the pdf shows an exponential relation between paging and total memory. However, I don't believe the relationship can be nailed down or clarified more other than to say it's exponential, because the specifics and the rate can somewhat. Even if you assume the compression decompression happens in near 0 time with near 0 load on the CPU and there are sufficient threads and tasks running to allow a task to continue while another is having it's memory decompressed (which perhaps you can in some multi-core environments with multiple processes running), there is still overhead involved in dealing with the access faults for compressed pages and with switching to an inactive task. Since there's an exponential relationship (we agree on that now, yes?), using x memory for storage of compressed pages, causes e^x increase in this paging overhead. Since by all accounts the compression of program data is in the order of 2:1 or at best 2.5:1, you will invariably hit diminishing returns fairly quickly. To me this is self evident. If you feel this needs clarifying in the main article, please go ahead. ScholarWarrior (talk) 23:27, 9 January 2015 (UTC)[reply]
Ah, that one, the paper I've added into the "External links" section; sorry, forgot about it for a moment. As you've described it, various overheads are inevitable just because the level of parallelism offered by hardware is limited for a particular computer – it not by anything else. After increasing the resource usage sufficiently enough, those overheads will inevitably start to rise. However, I still find that the description in the article should be clarified further, and I'll do that later today.
By the way, I hope that you find our discussion at least partially productive. :) As I've already mentioned it, I highly respect your contributions, and my only goal is to make the article better. — Dsimic (talk | contribs) 09:58, 10 January 2015 (UTC)[reply]
I tried to improve things a little without going into any great depth on the issue. I also added other references. I'm fully in favor of making the article better. ScholarWarrior (talk) 16:48, 10 January 2015 (UTC)[reply]

Sorry for my delayed response. Your improvements to the Virtual memory compression § Increased thrashing section made it much more clear; I've clarified the section further while also making it slightly more readable. Looking good? — Dsimic (talk | contribs) 12:15, 11 January 2015 (UTC)[reply]

I think this section looks good now. I reduced the first paragraph to one sentence, since I felt that both sentences said the same thing now. So I think we should consider merging this sentence into the second paragraph and perhaps breaking the paragraph in two later on. Perhaps at "In circumstances..." I also modified "two extremes" to "two states" since I don't feel these states are extremes. In fact the point between the two states is really an inflection point (where the sigmoid turns from roughly horizontal to roughly vertical) with very low likelihood of being achieved.
I also think that the article should use a single term for main memory, either main memory or primary storage. I used primary storage, since that's the title of the section linked to when you use main memory, however, I think "main memory" is the vernacular, so we should probably go with that. ScholarWarrior (talk) 16:22, 11 January 2015 (UTC)[reply]
Thinking about this some more, it might be good to go with "primary storage" because that avoids the confusion we had earlier with the difference between memory regions and RAM. "primary storage" clearly implies a physical quantity, and elsewhere we should use "memory region" to denote a virtual space. What do you think? ScholarWarrior (talk) 16:26, 11 January 2015 (UTC)[reply]
You're right, section's opening paragraph needed some trimming, and this is take two that I find to be leaving a better description, together with a more logical division into paragraphs. Also, your edits to "two extremes" are fine, and I've tweaked that a bit further. As a small note, using "running process" might be a bit redundant in this context as a process is pretty much a running program, and even stopped processes consume memory while almost surely going back later to a running state. Also, I'd say that "below certain point" should stay as it supports the "roughly exponential" nature of the relationship.
Speaking about "main memory" vs. "primary storage", I had similar thoughts about which one to use consistently. IMHO, it's better to use "main memory" simply because it can be always unambiguously shortened to "memory", while using "storage" as a shorthand for "primary storage" would seem more like it's referring to secondary storage. Hope you agree. — Dsimic (talk | contribs) 23:02, 11 January 2015 (UTC)[reply]
OK. Section looks good to me. "memory" on it's own is ambiguous in this context as it could be "virtual memory" or "compressed memory", that's why I think "primary storage" might be better. ScholarWarrior (talk) 00:18, 12 January 2015 (UTC)[reply]
Hm, you're right; it might be left to the context, but could still be ambiguous. However, "main memory" is perhaps more commonly used, and it goes along with "virtual memory" and "compressed memory". Shall we move on to other issues mentioned early in this thread? — Dsimic (talk | contribs) 00:32, 12 January 2015 (UTC)[reply]
OK, we can move on, pick the next topic you want to discuss... but perhaps we should make the decision and change on terminology first. I'm not really comfortable with "main memory" because realistically virtual memory addresses are part of main memory. How about physical memory or just use RAM since realistically there's no other type of physical memory in use today? ScholarWarrior (talk) 01:10, 12 January 2015 (UTC)[reply]
Hm, makes sense... Perhaps we could opt for a combination of "physical memory" and "RAM", using them interchangeably depending on a sentence (sometimes a longer form would fit better, and sometimes shorter)? Of course, the first time "physical memory" is mentioned, we'd briefly explain that it's synonymous with "RAM"; simple "physical memory (RAM)" should be good enough for that purpose. Thoughts? — Dsimic (talk | contribs) 02:33, 12 January 2015 (UTC)[reply]
OK, I've changed all relevant references to either "physical memory" or "RAM". The article notes that "physical memory" is usually RAM early on. I'm not sure which of your other questions are still open? ScholarWarrior (talk) 16:10, 12 January 2015 (UTC)[reply]
Looking good and much more consistent! I've cleaned it up a bit further, and I hope you're Ok with the take two on further description of the exponential relationship – that's what the reference says, and in most cases there will be no paging out anyway if all working sets fit into the physical memory (or better said in our case, into what's left after the size of the caching portion is subtracted).
Regarding the remaining issues, what's left is to clarify the "In both cases the original memory is marked inaccessible." sentence (emphasis added). As-is, it's technically correct but quite confusing – if you agree. — Dsimic (talk | contribs) 19:55, 12 January 2015 (UTC)[reply]
Look, virtual memory is either on or off. When it is on, paging is occurring even if allocated virtual memory is less than available physical memory. This is both because the system prepares for future allocation by paging out idle memory, and because OS generally balances performance by allocating more memory to activities like disk caching. So the exponential relationship is simply there and holds at all times. There is no magical point. In all exponential relationships there is a point at which the growth seems higher, but in reality the first derivative of e^x is e^x, and so the growth can be considered "increasing sharply" at any arbitrary point, say when e^x > 1. But since this is simply always true of exponential relationships, spelling it out makes it seem like there's some magical point, where there is no such point that one can define. The miss probability transition is smooth and simply in exponential nature of that part of a sigmoid. What you wrote implies there's a change in the first derivative that occurs at a known point and specifically when you start allocating beyond physical memory. There is no such point, and the amount of paging increases exponentially before you've used allocated all available physical RAM for processes, and the point at which you can say this is a "sharp" increase is undetermined and varies with circumstances, and this is part of the problem with RAM compression.
As for "inaccessible" I've taken a stab at adding more info to clarify. ScholarWarrior (talk) 22:10, 12 January 2015 (UTC)[reply]
You're right that paging out occurs even if the sum of working sets is way below the amount of installed RAM, and I've seen it happening countless times. However, that involves a very low amount of paging activity, and that's why I wrote "in most cases". To me, characterizing such a low amount of paging under an exponential relationship simply doesn't make sense, and also that's what figure 3 on page 918 in the reference clearly shows, with an additional description on page 917. Here's an excerpt from the page 917 that confirms it isn't me inventing that critical/magical point:
... there will be no significant interaction among programs and the missing-page probability is small. But when n exceeds some critical number n0, the totality of working sets exceeds M, the expansion of one program displaces working set pages of another, and so the missing-page probability increases sharply with n.
At the same time, we're obviosly not dealing with a clear ex relationship, so involving the first derivative into the analysis is pretty much irrelevant – if you agree. Thus, I still find that "below the sum of their working sets" (or some alternative wording) needs to be restored within the Virtual memory compression § Increased thrashing section.
Regarding your "inaccessible"-related edits in the article's lead section, they look good to me; I've just deduplicated some content and cleaned it up a bit further. — Dsimic (talk | contribs) 09:08, 14 January 2015 (UTC)[reply]
I think putting in the concept of any inflection point is misleading. I've worked with this technology and I don't believe such a point exists. By putting in the concept you are implying that the technology will provide a clear benefit from that point up to some other point where the exponential relationship is steeper. I don't believe that is an accurate representation. In any event, this is a shortcomings section, and since the size of the, let's call it "useful middle ground" is unknown and varies with OS, implementation and CPU activity, this section would by the wrong place to try to quantify it. This section is painting the picture of a limitation of the technology, and I think it should be left at that. ScholarWarrior (talk) 16:30, 15 January 2015 (UTC)[reply]
Well, there must be some point at which the paging activity becomes higher and turns into an exponential relationship, and the reference clearly confirms that. Perhaps some operating systems, notably Windows, keep paging out a lot for some reason even if there's no visible need for that; Linux doesn't do something like that. However, if you insist, I can live without that additional note. :) — Dsimic (talk | contribs) 10:32, 16 January 2015 (UTC)[reply]

quantization not relevant here[edit]

Quantization is a compression technique whether or not it is hardware assisted. Thus it is not relevant here. Virtual memory compression is independent of the data compression algorithm. The section on quantization should move elsewhere... ScholarWarrior (talk) 20:46, 19 February 2023 (UTC)[reply]