html_to_csv
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| html_to_csv [2023/06/23 16:58] – oso | html_to_csv [2024/10/17 21:42] (current) – external edit 127.0.0.1 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | ====== El problema ====== | ||
| + | Necesito importar a mi cuenta de Mercadopago en [[https:// | ||
| + | |||
| + | Firefly III tiene una utilidad, el [[https:// | ||
| + | |||
| + | Mercadopago tiene una función de conciliación, | ||
| + | |||
| + | Qué pasa si guardo el html, busco cómo se llama cada ' | ||
| + | |||
| + | Necesito entonces guardar como html cada página de la actividad (botón derecho > guardar como > solo html) | ||
| + | |||
| + | |||
| + | y ahora concatenar los html en un solo archivo grande que voy a procesar: | ||
| + | |||
| + | <code bash>cat *.html > bigHtmlFile.html</ | ||
| + | |||
| + | Acá *.html le llega como una lista de parámetros a '' | ||
| + | |||
| + | Me gustaría que ChatGPT haga la parte aburrida, así que le paso la consigna... | ||
| + | |||
| ====== Esta es la consigna: ====== | ====== Esta es la consigna: ====== | ||
| Line 5: | Line 25: | ||
| An excerpt of that html could be the following: | An excerpt of that html could be the following: | ||
| - | < | + | < |
| - | {" | + | |
| - | </ | + | |
| The CSV that I want, needs to have the following columns: | The CSV that I want, needs to have the following columns: | ||
| Line 22: | Line 40: | ||
| So, for example, processing that excerpt I shared before, the desired output would be: | So, for example, processing that excerpt I shared before, the desired output would be: | ||
| Compra de 7 productos Mercado Libre; Mercado Libre; 0; 61046; 07/06/2023 | Compra de 7 productos Mercado Libre; Mercado Libre; 0; 61046; 07/06/2023 | ||
| - | |||
| - | ---- | ||
| - | |||
| - | Por otra parte, necesito guardar como html cada página de la actividad (botón derecho > guardar como > solo html) | ||
| - | / | ||
| - | y concatenar los html en un solo archivo grande que voy a procesar con el script: | ||
| - | |||
| - | <code bash>cat *.html > bigHtmlFile.html</ | ||
| ====== este el resultado ====== | ====== este el resultado ====== | ||
| Line 61: | Line 71: | ||
| dates = pattern_date.findall(html_dump) | dates = pattern_date.findall(html_dump) | ||
| - | # Process the counterparts to remove "a " or "de " | + | # Process the counterparts to remove "a " or "de " |
| counterparts = [re.sub(r" | counterparts = [re.sub(r" | ||
| Line 67: | Line 77: | ||
| combined_columns = [title + " " + counterpart for title, counterpart in zip(titles, counterparts)] | combined_columns = [title + " " + counterpart for title, counterpart in zip(titles, counterparts)] | ||
| - | # Prepare the data for CSV writing | + | # Prepare the data for CSV writing |
| data = zip(combined_columns, | data = zip(combined_columns, | ||
| Line 91: | Line 101: | ||
| This code snippet extracts the required fields using regular expressions and performs the necessary conversions and calculations to determine the credit and debit values based on the ' | This code snippet extracts the required fields using regular expressions and performs the necessary conversions and calculations to determine the credit and debit values based on the ' | ||
| - | + | Había un tag '' | |
| - | If you want to ignore strings that begin with a curly brace {, you can modify the regular expressions in the Python code to exclude such strings. One way to achieve this is by using a negative lookahead assertion in the regular expressions. | + | If you want to ignore strings that begin with a curly brace {, you can modify the regular expressions in the Python code to exclude such strings. One way to achieve this is by using a negative lookahead assertion in the regular expressions. |
| - | the regular expressions regex_title, | + | |
| - | + | ||
| - | + | ||
| - | With this modification, | + | |
html_to_csv.1687539522.txt.gz · Last modified: 2024/10/17 21:42 (external edit)
