Page 1 of 2
Xml file UTF-8 BOM
Posted: Mon Apr 19, 2021 6:02 pm
by stecosta66
Hi All,
I'm having truble with some electronic invoice, in XML format, that are encoded in UTF-8 BOM (Byte Order Mark)
I use Fopen() to open the xml file, and FReadLine() to read every line step by step.
With some of these files I'm getting some strange characters at the beginning, and I discovered that are encoded with BOM
"<?xml version"
There is any method to remove the BOM encoding with VO?
Thanks
Xml file UTF-8 BOM
Posted: Mon Apr 19, 2021 6:22 pm
by Sherlock
https://www.w3.org/International/questi ... order-mark
What is do ,, is if that string found [  ] reduce to []
I have XML that does not have it, but my editor/hex editor adds it.
My XML code reader could not detect the <?xml version in the file.
You could detect ether as valid. "<?xml version" or "<?xml version"
The hexadecimal byte values in the file, the UTF-8 signature displays as EF BB BF
Xml file UTF-8 BOM
Posted: Mon Apr 19, 2021 6:39 pm
by robert
Stefano,
What you probably should do is:
- skip the BOM when it exists
- when the file has a BOM then then use the function Utf82Ansi() to translate the strings that you read from the file from UTF8 to Ansi. (This function is in the Util module inside System Library).
Robert
Robert
Xml file UTF-8 BOM
Posted: Mon Apr 19, 2021 7:22 pm
by Chris
I would just use .Net methods for file access, since those are a lot more powerful and can automatically handle BOM markers, encodings etc:
Code: Select all
USING System.IO
...
LOCAL oStream AS StreamReader
LOCAL cLine AS STRING
oStream := StreamReader{"c:testtestutf.txt", TRUE} // automatically detect encoding
DO WHILE oStream:Peek() != -1
cLine := oStream:ReadLine()
? cLine
END DO
Or even simpler:
Code: Select all
System.IO.File.ReadAllLines() // returns an array of strings
Edit: Oops, sorry, did not realize this is about VO!
Xml file UTF-8 BOM
Posted: Mon Apr 19, 2021 8:09 pm
by ic2
Hello Stefano,
Are you using VO or X#?
We read (and create) UBL files in VO and it works fine so far. But we read the XML string using this function and probably that is what could help for you as well.
Dick
FUNCTION StringReadZeroNoAnsi(cPath AS STRING) AS STRING PASCAL
//#s KB 24-1-2011
//#s Alternative for MemoRead that is not SetAnsi dependant
LOCAL cText AS STRING
LOCAL ptrHandle AS PTR
LOCAL dwFileSize AS DWORD
LOCAL dwError AS DWORD
cText := ""
dwError := 0
IF FFirst(String2Psz(cPath), FC_NORMAL)
dwFileSize := FSize()
IF dwFileSize > 0
cText := Buffer(dwFileSize)
ptrHandle := FOpen2(cPath, FO_READ + FO_SHARED)
dwError := FError()
IF dwError == 0
IF FRead(ptrHandle, @cText, dwFileSize) == dwFileSize
FClose(ptrHandle)
ENDIF
dwError := FError()
ENDIF
ENDIF
ELSE
dwError := FError()
ENDIF
IF dwError <> 0
// Error handling
ENDIF
RETURN cText
Xml file UTF-8 BOM
Posted: Tue Apr 20, 2021 4:06 am
by wriedmann
Ciao Stefano,
since the .NET XML functions are much more powerful, I have implemented the reading of the FPA/FPR invoices in X# and I'm using them in through a COM module in VO.
That also helps removing the eventual present signature in case of a p7m file.
If you are interested, I can give you a part of the code (my complete code includes also the sending and receiving code to the web service of my provider).
Wolfgang
Xml file UTF-8 BOM
Posted: Tue Apr 20, 2021 4:16 am
by stecosta66
Thanks all for the suggestions,
I'll try that
With FReadLine() I'm also getting a string lenght of 256 byte with no CRLF.
Tried to open the xml file with Notepad++ and I see, in the status bar, Unix (LF) UTF-8 BOM
This file is generated from a web based software for electronic invoice, in this case Aruba fatturazione elettronica.
With other xml files says Windows (CRLF) + UTF-8. This file gives me no problem
How can I workaround this with VO?
Xml file UTF-8 BOM
Posted: Tue Apr 20, 2021 4:24 am
by stecosta66
wriedmann wrote:Ciao Stefano,
since the .NET XML functions are much more powerful, I have implemented the reading of the FPA/FPR invoices in X# and I'm using them in through a COM module in VO.
That also helps removing the eventual present signature in case of a p7m file.
If you are interested, I can give you a part of the code (my complete code includes also the sending and receiving code to the web service of my provider).
Wolfgang
Hi Wolfgang,
thanks for the support.
I would be interested in trying what you have done.
Actually I'm un-singning the .p7m files with
openssl command using ShellExecute() and it is working fine.
Are you using a scraping tecnhique to send/receive files through web service?
Thanks
Stefano
Xml file UTF-8 BOM
Posted: Tue Apr 20, 2021 4:25 am
by wriedmann
Ciao Stefano,
you need to read the file entirely and then use MemoLine() to split the lines, and maybe split the lines using StrTran() replacing all LF with CRLF.
But please beware that received files may have several different formats: maybe even the entire data without any line break - I have seen a lot of different things now. Your read function should not depend on any newline.
Wolfgang
Xml file UTF-8 BOM
Posted: Tue Apr 20, 2021 4:29 am
by wriedmann
Ciao Stefano,
(for others: in Italy all the invoices need to be sent in a specific XML format through a system maintained by the ministry of the Finance):
to remove the signature I'm using a simple .NET call.
For sending and receiving the invoices I'm using an API that my provider has. AFAIK also Aruba has a sort of API, and it is much, much simpler do that in .NET than in plain VO.
Therefore I have all that functionality in a X# module that is used through COM in my VO applications.
Wolfgang