Wikipedia:Reference desk/Archives/Computing/2011 June 28

From Wikipedia, the free encyclopedia
Computing desk
< June 27 << May | June | Jul >> June 29 >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


June 28[edit]

Computing clouds[edit]

It appears that all of the commercially available cloud computers like Amazon's and Microsoft's, etc. are devoted mostly to files and data movement rather than to processing. Is there such a thing as a processor cloud that offers enormous numbers of processors as opposed to file space and data manipulation? I'm referring to processors capable of performing distributed functions and/or subroutines which for example might count duplicates in a generated list of numbers and then return a sum of squares such that the amount of stored data is to small to be an issue whereas the generated numbers might grow exponentially and take a long, long time to process. --DeeperQA (talk) 00:24, 28 June 2011 (UTC)[reply]

Probably not, as storage is cheap. Also, for most problems that need parallel computing, a lot of storage space is needed. The example you give doesn't need a lot of space, but it doesn't need a lot of time, either: counting duplicates and summing squares should be doable in around time (depending on implementation). Algorithm sketch: walk the list, putting each member into a search tree (a hash table would usually work fine, too), noticing collisions. You'd probably need to use bignums, but the overhead in this case should be logarithmic, which is cheap. It should be possible to process a list with a few billion elements in not much more time than it takes to read it off the disk in the first place. I suspect that most algorithms that parallelize well need to have large data sets, so you can break them up in the first place. (The DPLL algorithm, for example, takes exponential time, and doesn't use much space, but I don't think it's very fruitful to try to parallelize it.) Paul (Stansifer) 01:03, 28 June 2011 (UTC)[reply]
See distributed computing; for lots of existing projects, see Category:Distributed computing projects. Two famous examples are SETI@Home and Folding@Home. --Mr.98 (talk) 02:39, 28 June 2011 (UTC)[reply]
It sounds like you are saying that a cloud acts just like a single computer only with a much larger capacity that can handle enormous values and do so very fast, which is the point of splitting an algorithm into pieces so they can relieve a single computer of the need to have such great capacity and in need of dissembling the algorithm in the first place and reassembling the results from each making the use of more than one computer unnecessary.
In fact the algorithm I'm working with is designed to only begin splitting tasks when the values exceed single computer memory and speed limitations which necessitates processing in pieces distributed among the servers.
If the cloud did begin to slow down is there a way to assign pieces of the task to another cloud in the same manner as a Beowulf Cluster client sending pieces to the servers? --DeeperQA (talk) 09:39, 28 June 2011 (UTC)[reply]
A collection of computers is not just one big computer. A cloud environment is not a way to get a larger address space or word size or (usually) more memory. Sometimes (in fact, most of the time), getting "the cloud" involved is not the fastest solution, since a single computer will have come up with the answer by the time that you've finished farming out the initial input data to the cloud that you're using...
...provided that you are using a fast algorithm! If the input is large, the difference between an algorithm with a good big-O and an algorithm with a bad big-O is the difference between getting the answer immediately, and dying of old age before you get the answer. Example: If you come up with a (i.e., exponential) algorithm to solve a problem, there is no supercomputer in existence that can get you the answer for an input of size 300 before the heat death of the universe. But if you change the algorithm to one that takes time (and sometimes this is a small change!), your netbook will spit out an answer for an input of size 300 before you can blink. Change the algorithm to take time, and your netbook will be able to process an input of size 6,000 in in less time. Tweaking algorithms for big-O is far more powerful than just throwing more computer resources at the problem.
If you don't understand big-O notation, just ask. It's absolutely essential knowledge for any computer scientist, and you can't write fast programs without thinking about it. Paul (Stansifer) 11:46, 28 June 2011 (UTC)[reply]
Yes, I ran into big-O when I was writing the Check Sort routine but have not gotten that deeply back into logic or mathematical programming since.
I posted the server side code here but withdrew the question. The algorithm is in fact based on the Check Sort in that it counts duplicates the same way by using the values in the list as an array index whose content value is incremented each time an index value is read from the list. If it is the first time for a particular value then the value in the array at the indexed location become one. If it is the 11th time then the stored value becomes 11. Basically: n(x)=n(x)+one where n is the indexed array and x is the value for which duplicates are being counted.
Doing this in assembler or binary is what I would expect to be the next step to improve speed rather than using a cloud but still staying with a Beowulf when single computer limits would require digging up your bones and reburying with a note that the job had finally completed.. About big-O, I'm all ears, go for it. --DeeperQA (talk) 12:48, 28 June 2011 (UTC)[reply]
I can't seem to find a good tutorial on the Web, so I'll try to give a thumbnail sketch of how to think about big-O. Big-O is a formal approach to the amount of time a program takes to execute (it can be used for other things; it was discovered by mathematicians first, but it's more important to computer scientists and programmers). It abstracts over the little things, like how fast a computer is, or how long an individual atomic operation takes. So, making your program twice as fast has no effect on big-O at all; a 2x speedup is too small to measure! Rather, big-O measures how quickly things get out of hand on large inputs. Where an "operation" is defined to be something that always takes the same amount of time (like adding two numbers, or indexing into an array), you need to pay attention to the way the input size affects the number of operations. For a slightly more practical, concrete introduction, see Wikiversity's page.
In this case (and it's a bad example for learning about big-O, because most of the time, algorithms just depend on input size), your algorithm is slow because it depends on something huge: the maximum number you allow in your input array. In particular, you need to zero-out a huge temporary array to check for duplicates. This is slow, and requires a vast amount of memory. Also, finding a large chunk of contiguous memory may be a problem for the allocator. Instead, you want to store the counts in some kind of associative data structure, like a search tree or a hash table. This way, it doesn't matter what the keys are at all, performance only depends on the size of the input, which will usually be much smaller.
Don't bother with assembly at all (conventional wisdom holds that even an expert rarely writes better assembly than a compiler), and you shouldn't bother trying to muck around with distributed computing ("the cloud", Beowulf clusters, etc.) until you have a good intuition for how to make things run fast on a single processor on a single machine. Paul (Stansifer) 04:37, 29 June 2011 (UTC)[reply]

Big-O efficiency works on the concept of trying to find a way to send more data by parsing a single cycle of light. While big-O is doing this research it is already known that by converting the data to parallel all of it can be sent in one cycle using enough parallel channels. It would be a question of whether the glass was half empty or half full except that sending or processing a virtually unlimited amount of data can still be accomplished in a single cycle and need not wait for big-O to find a way to do so in half a cycle.

A binary search requires sorted data, for example. The Check Sort accomplishes this by using the data values as an address, which it marks (the addressed location that is represented by the data contains a mark) and then printing only marked addresses in sequence. Doing this with hardware is so fast the hardware version is named the “instant sort”. --DeeperQA (talk) 09:48, 29 June 2011 (UTC)[reply]

I'm not sure what you're saying here. Are you trying to claim that big-O notation is somehow irrelevant? It's not a implementation tool, it's the fundamental way to measure the complexity of an algorithm. Algorithmic complexity can't be obsoleted, you have to understand it if you want to write programs that operate on any large quantity of data efficiently. Nothing about having more than one processor operating on data at a time changes this.
In any event, "instant sort" isn't fast. For normal inputs, a ten-line Python program using a hash table should be much faster. Paul (Stansifer) 20:54, 29 June 2011 (UTC)[reply]
... here's a Python program that prints the sum of squares of numbers that appear more than once in the input file. It follows the algorithm that I sketched in small text above:
import sys

occ = dict()
accum = 0
for l in sys.stdin.readlines():
    n = int(l)
    occ[n] = occ.get(n,0) + 1
    if occ[n] == 2: accum += n*n

print accum
It takes half a second to process a 50,000 line input file on my netbook; you can see that there's no reason to bother parallelizing it. There's no upper limit to the size of the numbers it can handle, either. There isn't any magic going on here, all you need to do is use the right data structure for the problem. Paul (Stansifer) 03:08, 30 June 2011 (UTC)[reply]
Absolutely not. What I am saying is that if you have a single perfectly tuned 24 inch wide lawn mower with perfectly balanced razor sharp ceramic blades running at 10,000 RPM it can not compete with 50 lawn mowers which sputter and run around 1,800 RPM with rusted, bent and unbalanced 18 inch metal blades only sharp enough to cut grass.
The only way a perfectly tuned and honed mower as mentioned above can improve speed and capacity is perhaps to use a wider blade or to duplicate itself but big-o stops at the notion of wider blade or duplication.
BTW, have you ever built an instant sort circuit and tested it? --DeeperQA (talk) 10:01, 30 June 2011 (UTC)[reply]
I can't fully understand what you're trying to say. You seem to think that parallelism makes everything faster. Sure, some problems are like lawnmowing; those are called the embarassingly parallel problems, and you can just cut up the input data and hand it off to all the nodes. For everything else, the conventional wisdom in computer science is that mucking around with parallelism for everything else is difficult, and should only be when it's really needed. (Which is not to say that it's even possible for all cases. I don't see a good way to parallelize this problem to make it fast, unless the keys are kept extremely small.)
All I can do is ask: what's wrong with the program that I wrote? It's fast to implement, it runs fast (even on my dinky machine), it's nice and general (Python uses bignums natively, so the count will never overflow, which I notice is not true of your code). Even if the problem were embarrassingly parallel, it wouldn't be worth the effort of parallelizing to speed it up. Simplicity is always the programmer's goal. Paul (Stansifer) 11:51, 30 June 2011 (UTC)[reply]
You are talking about the benefits of using a particular programming language to over come number size limitations. The same benefit of using a particular natural language may be greater when discussing a particular topic. But eliminating number size limits does not eliminate the amount of time for an index to increment from zero to the highest value in the list while comparing values and counting duplicates at each incrementation.
The Check Sort and its hardware implantation called the Instant Sort is the result of an effort to simplify the Shell-Metzner Sort. Write them in any language and in that language (if properly written) will be faster than any other sort with exception perhaps of how much data is being sorted (taken care of by using many lawn mowers)
The binary search routine requires sorted data but can also be performed with distributed software and performed directly with hardware that can be duplicated so the hardware version also can run in parallel.
What I am saying is there is nothing wrong with simplification but once you achieve ultimate simplification it is time to move on and that sometimes means going parallel.
Think of it this way… water may seep through crack in the Earth until it reaches a layer of clay. From there it will simply spread out until it reaches the boundaries of the layer of clay. When you reach the layer of clay start spreading out. --DeeperQA (talk) 12:19, 30 June 2011 (UTC)[reply]
(You can do bignums in any language. And for this task, they're likely to be necessary. They make it slower, but they ensure correctness.)
I don't think that I can explain why one algorithm is faster than another without using big-O notation. But you don't seem to be interested in the possibility that a non-parallel algorithm could be faster than a parallel one in this case, so I guess it doesn't really matter. Paul (Stansifer) 17:04, 30 June 2011 (UTC)[reply]
No not true. I was planning on a client side multiset duplicate counter in C++ or assembler and depending on speed and capacity improvement use it serverside on my Beowulf but it is true that I'm looking for an online service that will provide multiple processors in the form of internal addressable servers so I can do this as is to the max online. I am open as to how to do bignums in VB v6 SP6 under Windows XP. --DeeperQA (talk) 18:10, 30 June 2011 (UTC)[reply]

Ubuntu keyboard restart[edit]

Is there a way to restart my keyboard functionality, without restarting Ubuntu? — Preceding unsigned comment added by 88.9.106.0 (talk) 12:57, 28 June 2011 (UTC)[reply]

Not quite sure what you mean. Has your OS stopped responding to your keyboard inputs? --Aspro (talk) 13:15, 28 June 2011 (UTC):[reply]
yes everything Is working except the keyboard88.9.106.0 (talk
Which version of Ubuntu are you using? --Aspro (talk) 13:32, 28 June 2011 (UTC)[reply]
8.04, and the problem happens every now and then.. :( — Preceding unsigned comment added by 88.9.106.0 (talk) 13:43, 28 June 2011 (UTC)[reply]
Try ctrl+alt+f3 then type some stuff, then ctrl+alt+f7. That might fix it. -- Finlay McWalterTalk 13:33, 28 June 2011 (UTC)[reply]
How can I type that, if the keyboard is not working? — Preceding unsigned comment added by 88.9.106.0 (talk) 13:35, 28 June 2011 (UTC)[reply]
Just do it. -- Finlay McWalterTalk 13:51, 28 June 2011 (UTC)[reply]
Finlay McWalter's suggestion is based on the (pretty good) hunch that something is broken with your X server and/or your desktop-manager. A common problem might be a bug in your X server's HID drivers. By pressing those key sequences, you'll swap out and back in (without restarting X), and hopefully "kick" the server back into normal operation. This doesn't really "fix," as much as "work around," the bug. Let us know if that actually worked; if not, we can try to diagnose other possible causes. Hardware or driver failure in your USB system is the next most probable culprit after your X server, in my opinion. Nimur (talk) 15:53, 28 June 2011 (UTC)[reply]
Time to update to the current LTS version. Although you can't see the keyboard doing anything, it doesn't mean the keyboard can't open a terminal. Hence the suggestion Ctrl + Alt + f3 Also, it helps to check that you have still have sufficient free disc space. Empty trash ect. I think a bad bit of RAM might also cause intermittent faults like this.--Aspro (talk) 14:06, 28 June 2011 (UTC)[reply]
OK, I'll try next time the keyboard gets blocked. And update to the new version of Ubuntu (although I suppose it's a hardware problem). 88.9.106.0 (talk) 14:20, 28 June 2011 (UTC)[reply]
Also, note that if you're completely stuck, you can hold down Alt and Sys Rq (the same key as Print Screen), and then hit S, U, and then B (while still holding Alt-Sys Rq down). This will tell the computer to reboot in a slightly gentler fashion than simply hitting the power button does. (see Magic SysRq key, which also includes a slightly longer and even better sequence of commands) Paul (Stansifer) 16:18, 28 June 2011 (UTC)[reply]
Worth trying +k first. ¦ Reisio (talk) 17:48, 28 June 2011 (UTC)[reply]
In case it's relevant, I have a similar problem where the keyboard seems to get disconnected sometimes. I find by switching to a different desktop workspace and back again, the keyboard will magically reconnect. Astronaut (talk) 11:09, 29 June 2011 (UTC)[reply]

Full access to an iPod's file system[edit]

Is there anyway that I gain full file system access to a regular iPod be it software on the iPod itself or from my desktop? --Melab±1 14:50, 28 June 2011 (UTC)[reply]

Sure. The music is in a hidden folder. On a Windows machine it's a matter of going to "Show hidden folders"; on a Mac it's a little trickier but it can be done. (You can easily access the hidden folders through the Terminal, for example. Getting the Mac OS to show hidden folders involves some Terminal trickery but it can be done.) --Mr.98 (talk) 15:00, 28 June 2011 (UTC)[reply]
You have full access to the file system. However, you might not be familiar with the type of file-system on your iPod: see How to determine your iPod's disk format from Apple's support page. If your iPod is formatted with HFS+, you may need to use the extended file attributes commands and utilities. You can read about these tools, in the public developer documentation, at listxattr on the Mac OS X Developer library. The xattr utility program is very helpful. Bear in mind that even though the iPod can register itself as a USB mass-storage device class when connected over USB, your iPod is a full-featured small computer, not a hard-disk-on-a-stick; so if you're trying to access its file-system, you should be aware that its operating system is on and in control of the storage-medium at all times, even when it is mounted as a USB "disk." Nimur (talk) 15:41, 28 June 2011 (UTC)[reply]
If I had full filesystem access I would think that I could see the kernel, plists, etc. --Melab±1 20:32, 28 June 2011 (UTC)[reply]
You want access to the iPod firmware? That is kept in a read-only area. You can hack it (Google "hack iPod firmware" — this howstuffworks page gives a nice overview) but it's not a straightforward thing, because the iPod is not a straightforward computer. --Mr.98 (talk) 21:07, 28 June 2011 (UTC)[reply]
If it is updated then it is not read-only. --Melab±1 22:49, 28 June 2011 (UTC)[reply]
I meant read-only in the sense that it is stored in ROM. It's a technical term. It doesn't mean it cannot be written to, just that it is harder to write to. The point is that it's not stored in a straightforward place (not on the hard drive, for example), and not a simple matter of getting "full access" or anything like that. You have to flash the firmware with something else (e.g. a Linux replacement) if you want to modify it, and it means that the firmware probably cannot modify itself. --Mr.98 (talk) 13:24, 29 June 2011 (UTC)[reply]

'Run As Admin' Option Not Appearing[edit]

I've been grinning and bearing this for a couple of weeks, but now it's just beginning to annoy me. I am finding that with some programs, the 'Run As Admin' option is not even present in the context menu - it used to be there for practically every executable, as far as I can remember. Sometimes, if I make a shortcut for these programs where the option has disappeared, the option now actually is present in the shortcut, but not in the original. Anyway, this is not normal. I should have the 'Run As Admin' option present in all executables (except one or two exceptions, which are beyond the scope of this question). In these same executables where this option is not present, they also lack the 'Properties' option, meaning I can't do anything from that dialogue box. Can anyone guess what has happened and how can I fix this? - EDIT - My UAC is turned on. Win Vista, Home Premium, 32 Bit. --KägeTorä - (影虎) (TALK) 15:43, 28 June 2011 (UTC)[reply]

This may be too elementary to even bring up, but personally I've had trouble with context items that I've fixed by first left-clicking to select the executable, and then right-clicking to display the context menu. I think of this as letting Explorer "catch up" with me. Comet Tuttle (talk) 23:41, 28 June 2011 (UTC)[reply]
No, it happens in the start menu too. You can't just highlight something in the start menu by clicking on it, as that makes the program run. Thanks anyway, but I think it's more complicated than this. As a bit of extra info, when I have been lucky enough to get a 'properties' option in the context menu, I sometimes now have the 'Run As Admin' option + checkbox in that dialogue box greyed out. --KägeTorä - (影虎) (TALK) 12:32, 29 June 2011 (UTC)[reply]


like a doggy with a stick. before you throw it, you have to wave it in front of their face so they even know you have a fucking stick. --188.28.242.234 (talk) 00:40, 29 June 2011 (UTC)[reply]

what format are rackspace cloud images in?[edit]

I took a server image (maybe "snapshot"), it's just lilsted as (name I gave it)cloudserver(numbers).tar.gz.0 at about 2 GB and another 171 byte file called (name I gave it)cloudserver(same numbers as above).yml - what format would these two files be in? Could I download them and run them in a VM on my desktop, if so which one? Thanks. --188.28.242.234 (talk) 21:00, 28 June 2011 (UTC)[reply]

answering own question... apparently http://communities.vmware.com/thread/312288?tstart=0 says: " Rackspace solution seems to be based on XEN servers. So you should be able to extract VM's files from the tar archive and use VMware Converter to import it to your ESXi." --188.28.242.234 (talk) 21:26, 28 June 2011 (UTC)[reply]

How do you get a PDF to work on Word 2010?[edit]

Please help, every time i try to open a pdf with Word 2010 it says it is not a valid Win32 application. — Preceding unsigned comment added by 98.71.62.95 (talk) 21:36, 28 June 2011 (UTC)[reply]

MS Office applications like Word can't open .pdf files, even though they are able to create them. You should use a .pdf reader, like Adobe Acrobat Reader or Foxit or suchlike. --KägeTorä - (影虎) (TALK) 22:42, 28 June 2011 (UTC)[reply]
See also List of PDF software, which isn't a mere list but instead says which is which. Tama1988 (talk) 08:07, 29 June 2011 (UTC)[reply]

PHP MyAdmin help[edit]

When i try to export a database, i get it PARTLY, never mind if it's ZIPPED\GZIPPED or a single file, it's always partly, but always 25% partly, it's seems the exportation stops before it really ended the procedure.., and it's always stops in the same point... it's more than a week like that, so annoying!!.. thanks, Beni. — Preceding unsigned comment added by 79.179.8.59 (talk) 23:09, 28 June 2011 (UTC)[reply]

Is PHP timing out? In your php.ini file, there is a setting for max execution time. It is usually around 30 seconds. A complete MySQL dump of a large database can take much longer (mine takes about 30 minutes to dump). You can set the maximum time to 0 to disable it and see if that helps. -- kainaw 12:38, 29 June 2011 (UTC)[reply]
Kainaw probably has the correct cause. If you do not have the ability to alter the timout (or simply don't want to), you can split your export into x pieces. Either by exporting each table individually, or if it is a large table, export the first 25% of the rows in one dump, the second 25% rows in second, etc. 72.159.91.131 (talk) 18:51, 29 June 2011 (UTC)[reply]