Back to graph

Topic analysis

No way to parse integers in C (2022)

2022-10-27 , Categories: programming, unix There are a few ways to attempt to parse a string into a number in the C standard library. They are ALL broken. Update at the bottom: Actually C++’s std::from_chars() looks useful. Leaving aside the wide character versions, and staying with long (skipping int , long long or intmax_t , these variants all having the same problem) there are three ways I can think of: They are all broken. I’ll start by claiming a common sense “I know it when I see it”. The number that I see in the string with my eyeballs must be the numerical value stored in the appropriate data type. “123” must be turned into the number 123 . Another criteria is that the WHOLE number must be parsed. It is not OK to stop at the first sign of trouble, and return whatever maybe is right. “123timmy” is not a number, nor is the empty string. Failing to provide the above must be an error . Or at least as the user of the parser I must have the option to know if it happened. No. All wrong. And no way for the caller to know anything happened. For the LONG_MAX overflow case the manpage is unclear if it’s supposed to do that or return as many nines as it can, but empirically on Linux this is what it does. POSIX and C both say “if the value cannot be represented, the behavior is undefined”. What the hell? This makes atol() impossible to use on untrusted input. Now, in practice I don’t see compilers and libc ever triggering the scary parts of UB on bad input, but on paper atol() is allowed to wipe your hard drive if it gets bad input. Great. How am I supposed to know if the value can be represented if there is no way to check for errors? So if you pass a string to atol() then you’re basically getting a random value, with a bias towards being right most of the time. I can kinda forgive atol() . It’s from a simpler time, a time when gets() seemed like a good idea. gets() famously cannot be used correctly. Neither can atol() . I’ll now contradict the title of this post. strtol() can actually be used correctly. strtoul() cannot, but if you’re fine with signed types only, then this’ll actually work. But only carefully. The manpage has example code, but in function form it’s: It’s a matter of the API here if it’s OK to clobber *out in the error case, but that’s a minor detail. Yay, signed numbers are parsable! Unlike its sibling, this function cannot be used correctly. Example outputs on amd64 Linux: Phew, finally an error is reported. This is in no way useful. Or I should say: Maybe there are use cases where this is useful, but it’s absolutely not a function that returns the number I asked for. The title in the Linux manpage is convert a string to an unsigned long integer . It does that. Technically it converts it into an unsigned long integer. Not the obviously correct one, but it indeed returns an unsigned long. Interesting note that a non-empty input of just spaces is detectable as an error. It’s obviously the right thing to do, but it’s not clear that this is intentional. So check your implementation: If passed an input of all isspace() characters, is this correctly detected as an error? If not then strtol() is probably broken too. A bit less code needed, which is nice: As we can see here this is of course nonsense (except the first one). Extra fun that last one. You’d expect that from the two before it that it would be 0 , or at least an even number. But no. That last number is simply “out of range”, and that’s reported as ULONG_MAX . But you cannot know this. Getting ULONG_MAX as your value could be any one of: There is no way to detect the difference between these. So sscanf() is out, too. Garbage in, garbage out, right? Why does it matter that someone might give you -18446744073709551615 knowing you’ll parse it as 1 ? Maybe it’s a funny little trick, like ping 0 . First of all it matters because it’s wrong. That is not, in fact, the number provided. Maybe you’re parsing a bunch of data from a file. You really should stop on errors, or at least skip bad data. But incorrect parsing here will make you proceed with processing as if the data is correct. Maybe some ACL only allows you to provide negative numbers, and you use this trick to make it parse as negative in some contexts (e.g. Python), but positive in others ( strtoul() ). I even saw a comment saying “when you have requirements as specific as this” . As specific as “parse the number, correctly”? It should matter that programs do the right thing for any given input. It should matter that APIs can be used correctly. Knives should have handles. It’s fine if the knives are sharp, but no knife should be void of safe places to hold it. It should be possible to check for errors. You cannot even assemble the pieces here into a working parser for unsigned long. Maybe you think you can can filter out the incorrect cases, and parse the rest. But no. You can detect negative numbers with strtol() , range checked and all, and discard all these. But you can’t tell the difference between being off scale low between -2^64…-2^63, and perfectly valid upper half of unsigned long, 2^63-1…2^64-1. It’s not a solution to go one integer size bigger, either. long is long long is intmax_t on my system. Do you need to be able to parse the upper half of unsigned long ? If not, then: If all you need is unsigned int, then maybe on your system sizeof(int) , and this can work. Just cast to unsigned int in the last step. Do you need the upper half? Sorry, you’re screwed. Write your own parser. These numbers are very high, yes, and maybe you’ll be fine without them. But one day you’ll be asked to parse a 64bit flag field, and you can’t. 0xff02030405060708 cannot be unambiguously parsed by standard parsers, even though there’s ostensibly a perfectly cromulent strtoul() that handles hex numbers and unsigned longs. Code is much shorter, again, which is nice. Why is everything broken? I don’t think it’s too much to ask to turn a string into a number. In my day job I deal with complex systems with complex tradeoffs. There’s no tradeoff, and nothing complex, about parsing a number. In Python it’s just int("123") , and it does the obvious thing. But only signed. Maybe Google is right in saying just basically never use unsigned . I knew the reasons listed there, but I was not previously aware that the C and C++ standard library string to int parsers were also basically fundamentally broken for unsigned types. But even if you follow that advice sometimes you need to parse a bit field in integer form. And you’re screwed. That sounds right. No “minus allowed even for unsigned types” BS. After reading this post, Alejandro came up with what seems obvious to me in retrospect; just check for those negative values! So now we have this C version: He also added it as a candidate answer to the stackexchange question .

Heat score

1

Sources

1

Platforms

1

Relations

0
First seen
May 20, 2026, 6:28 PM
Last updated
May 21, 2026, 12:01 AM

Why this topic matters

No way to parse integers in C (2022) is currently shaped by signals from 1 source platforms. This page organizes AI analysis summaries, 1 timeline events, and 0 relationship edges so search engines and AI systems can understand the topic's factual basis and propagation arc.

News

Keywords

9 tags
wayparseintegers2022programmingunixarefewways

Source evidence

1 evidence items

Timeline

No way to parse integers in C (2022)

May 20, 2026, 6:28 PM

Related topics

No related topics have been aggregated yet, but this page still preserves the AI summary, source links, and timeline.