Talk:Strlen

Untitled[edit]

There appear to be no html markup tags available that allow inclusion of literally any character. Repeated ' characters accomplish some of this, but this article is still technically incorrect.

jim mcnamara Aug 17 16:18 MDT

Point to ponder: Should each programming language function have its own article? Deryck C. 18:17, 27 January 2006 (UTC)[reply]

strlen( NULL )[edit]

This is invalid, correct? In standard C it will cause undefined behavior, instead of returning 0?

Yes. "4.1.6 Use of library functions. Each of the following statements applies unless explicitly stated otherwise in the detailed descriptions that follow. If an argument to a function has an invalid value (such as a value outside the domain of the function, or a pointer outside the address space of the program, or a null pointer), the behavior is undefined."

Note that "undefined" may return 0 in your particular C implementation. Or it may return 42. Or it may terminate the program. The only thing "undefined" is prevented from doing is anything worse than program termination (such as physically damaging the computer) unless the manufacturer documents that behaviour. 150.101.30.44 (talk) 03:43, 31 July 2008 (UTC)[reply]

"Undefined behaviour" exists not so that you can draw conclusions, but so that you can't and shouldn't attempt to draw conclusions. It is not necessary for an implementation to define undefined behaviour. It would seem you're confusing the concept with "implementation defined behaviour", for which the definition is required to be documented by compliant implementations. The C99 standard draft (n1256.pdf) states my point perfectly at section 3.4.3 where it defines "undefined behavior" as "behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements". Plebbeh (talk) 03:07, 4 July 2011 (UTC)[reply]

Skip "Multi-Byte approach"[edit]

Can we please delete the "Multi-Byte approach" section? Instead we can simply note that it is possible to implement strlen() by working with the architecture's natural word size. The explanation is confused. The interesting problems (i.e. taking care of unalignment) are left as TODO. It's not clear that this is faster than a naive char-by-char strlen. The article claims the code or algorithm is from HAKMEM, but doesn't give any specific entry, and I was unable to find it there. I've seen the general technique well implemented and correct, in (I think) OpenBSD's memcpy(). JöG (talk) 07:07, 4 January 2009 (UTC)[reply]

You're right. Converting a pointer to char to a pointer to short, int, long, etc may be undefined behaviour as those types may have alignment requirements that can't be guaranteed when working with strings. The only pointers that you can guarantee have suitable alignment are those returned by malloc and friends, or those implicitly converted from arrays. It should be noted that, whether copying word-by-word is faster or not, "This approach is not required to work on every C implementation^[1], and there are implementations in existence which will crash if the memory is not suitably aligned^[2]." Plebbeh (talk) 02:18, 4 July 2011 (UTC)[reply]

strlen vs O(1)[edit]

The current description about strlen() always being O(n) is wrong, unless one assumes that strings are all stored in bare C string format. Other string storage formats SHOULD be mentioned and the performance impact should be kept as well. I believe that my version which gets reverted have provided better vision in this topic and should be put back. —Preceding unsigned comment added by Delphij (talk • contribs) 21:49, 21 April 2010 (UTC)[reply]

Indeed, right you are. Perhaps an alternative might be "strlen() is academically implemented as an O(n) (constant time) operation, though there are no requirements or restrictions preventing the operation from being faster or slower.^[1]"

As it is, the statement that reads "No optimization can make strlen fast for significantly large inputs, so storing the length of strings is generally the recommended fix." contradicts itself; If no optimisation can make strlen faster then storing the length of strings won't make a difference either. It is entirely possible for an implementation to automatically store the length of strings (eg. simply storing the offset of a '\0' byte as it is assigned into an array), thus turning strlen into an O(1) operation, providing memory access is O(1) (which also isn't required by the C standard; Consider tape memory).

Plebbeh (talk) 00:27, 28 June 2011 (UTC)[reply]

Such an idea would make strlen(p+10) (where p is a char* pointer) not work, since the passed pointer is not pointing at the length. You would have to implement a memory model where p+10 produces an object from which the original p and the offset of 10 can be recovered in O(1) time. Certainly not impossible, but I suspect most implementations of such changes to standard C behavior also ditch nul-terminated strings. It may be that some C interpreters can do this, however.Spitzak (talk) 01:02, 28 June 2011 (UTC)[reply]

The point I was trying to make is that saying strlen is O(n) constant time is invalid, because it is possible for it to be incorrect. Consider an implementation that sleeps for SIZE_MAX - n before returning from strlen; There's another example of O(1) constant time. It would be silly, but it's perfectly compliant by the C standard. I think an implementation that is optimal enough to make the formerly suggested optimisations would also be smart enough to keep track of pointer arithmetic on strings. You may be right; They may ditch the '\0', but the implementation would still be required to evaluate "hello"[strlen("hello")] to '\0'.Plebbeh (talk) 05:38, 1 July 2011 (UTC)[reply]

Actually we should not mention the complexity of the algorithm at all, since it is not defined by the standard. As we do mention the complexity, we are talking about implementations and not the Standard already. Please point out any implementation or algorithm that can compute strlen faster than O(n), given that C string can be any array or subarray of characters ending with a NUL character. Saying that one exists without a reference is WP:OR. Alleged slower implementations should not be mentioned also, except if you can prove that one exists, because of WP:N.1exec1 (talk) 12:09, 7 July 2011 (UTC)[reply]

This discussion is getting really stupid. We are talking about the function with the six-letter name "STRLEN". It takes a C pointer to a char and finds the distance to the next NUL. There are NO instances where the system can do anything else, so unless you implement some memory design where the location of the next NUL after an arbitrary point is not an O(n) operation, it is O(n). Case closed.

Yes there are other ways to store strings. NONE of them have a function called "strlen" that takes a C pointer to a char. They are DIFFERENT functions and the whole point of the sentence "storing the length of the strings is the recommended fix" is to say that these DIFFERENT systems should be used. It is *not* an "optimization" because it is a DIFFERENT function that takes something other than a C pointer to char as input.Spitzak (talk) 19:11, 5 July 2011 (UTC)[reply]

strlcpy vs snprintf in Implementation section[edit]

My proposal is to use snprintf in this article, as opposed to strlcpy. strlcpy is non-standard, both in terms of POSIX and the C standard. As a result, it isn't required to exist, nor is it required to do the same thing as the WP article described.

snprintf is standardised, and can do everything strlcpy can do and more. For example:

char foo[4];

size_t bar = strlcpy(foo, "hello", sizeof foo);

can be rewritten as:

char foo[4];

size_t bar = snprintf(foo, sizeof foo, "%s", "hello");

Thus, I tend to encourage use of snprintf instead. Discuss, please? Plebbeh (talk) 01:12, 5 July 2011 (UTC)[reply]

You mean as an example where the O(1) is not a problem because the function does something else that is also O(n) and hides it? If so even sprintf is a good example (assuming the likely implementation of it doing %s by coyping the characters to the output at the same time it is looking for the NUL).Spitzak (talk) 19:06, 5 July 2011 (UTC)[reply]

If snprintf is C-compliant, then this shall describe it's return value: "Upon successful completion, the snprintf() function shall return the number of bytes that would be written to s had n been sufficiently large excluding the terminating null byte." If snprintf isn't C-compliant, then we're not talking about C. Plebbeh (talk) 02:09, 26 July 2011 (UTC)[reply]

There is no need to return a value for a function to demonstrate this. Therefore printf is as good as an example. To print a string, printf can do this to write a string, in effect "hiding" the O(n) strlen needed both to figure out how many characters to print and to figure out how to increase the return value, into the O(n) loop that is needed to copy characters to the output device:

 while (*string) { write(*string++); length++; }

The point being that because printf() is O(n), you can "hide" the fact that strlen is O(n) by doing as a side-effect of the required O(n) operation. I still feel that strlcpy is the best example because it is much simpler than printf, and much more likely to be optimized in this way. strlcpy *could* be done like this:

 strlcpy(char* dest, const char* src, size_t n) {
   size_t i = strlen(src);
   if (n) {
     size_t j = (i >= n) ? i : n-1;
     memcpy(dest, src, j);
     dest[j] = 0;
   }
   return i;
 }

However all implementations I have seen have been written to iterate over the string, thus merging the O(n) copy into the O(n) strlen so that the cost is mitigated. I don't think this is true of printf implementations, since the speed of I/O likely swamps any desire to add such efficiency to the code.Spitzak (talk) 02:44, 27 July 2011 (UTC)[reply]

Pages for each function and WP:NOTMANUAL[edit]

Hello. You might find this discussion relevant. Thanks!1exec1 (talk) 09:06, 4 October 2011 (UTC)[reply]

^ ^a ^b "The C99 standard draft + TC3" (PDF). Section 6.3.2.3p7. Retrieved 04 July 2011. {{cite web}}: Check date values in: |accessdate= (help)CS1 maint: location (link) Cite error: The named reference "n1256.pdf" was defined multiple times with different content (see the help page).
^ "Segmentation fault - Wikipedia". Bus error. Retrieved 04 July 2011. {{cite web}}: Check date values in: |accessdate= (help)

[n1256.pdf-1] "The C99 standard draft + TC3" (PDF). Section 6.3.2.3p7. Retrieved 04 July 2011. {{cite web}}: Check date values in: |accessdate= (help)CS1 maint: location (link) Cite error: The named reference "n1256.pdf" was defined multiple times with different content (see the help page).

[Wikipedia-2] "Segmentation fault - Wikipedia". Bus error. Retrieved 04 July 2011. {{cite web}}: Check date values in: |accessdate= (help)

[1]

[2]