sacc

sacc (saccomys): simple gopher client.
Log | Files | Refs | LICENSE

commit edab539b23594219bbfc83729822da917a18a243
parent c416c8c73d0a33eb8c428b1a9b9eaaffc098ee5b
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date:   Tue,  5 Jan 2021 21:21:03 +0100

mbsprint: improve printing output when it has invalid UTF data

Reset the decode state when mbtowc returns -1. The OpenBSD mbtowc(3)
man page says: "If a call to mbtowc() resulted in an undefined internal
state, mbtowc() must be called with s set to NULL to reset the internal
state before it can safely be used again."

Print the UTF replacement character (codepoint 0xfffd) for the invalid
codepoint or incomplete sequence and continue printing the line
(instead of stopping).

Remove the 0 return code as it can't happen because we're already
checking the string length in the loop.

Diffstat:
Msacc.c | 12+++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/sacc.c b/sacc.c @@ -110,12 +110,18 @@ mbsprint(const char *s, size_t len) slen = strlen(s); for (i = 0; i < slen; i += rl) { - if ((rl = mbtowc(&wc, s + i, slen - i < 4 ? slen - i : 4)) <= 0) - break; + rl = mbtowc(&wc, s + i, slen - i < 4 ? slen - i : 4); + if (rl == -1) { + mbtowc(NULL, NULL, 0); /* reset state */ + fputs("\xef\xbf\xbd", stdout); /* replacement character */ + col++; + rl = 1; + continue; + } if ((w = wcwidth(wc)) == -1) continue; if (col + w > len || (col + w == len && s[i + rl])) { - fputs("\xe2\x80\xa6", stdout); + fputs("\xe2\x80\xa6", stdout); /* ellipsis */ col++; break; }