erlang Elixir -如何改进代码和风格

q3qa4bjr  于 2022-12-08  发布在  Erlang
关注(0)|答案(1)|浏览(169)

The goal was a script which reads the file, line by line, containing file paths (Windows and Linux). It strips the path leaving only a file name with an extension. Then replaces any special characters in the file name with an "_" - underscore and at the end reduces the consecutive underscores with just one. Like st__a___ck becomes st_a_ck. I got it working but I believe there may be a better/nicer looking way of doing this. I'm a very beginner and still learning to think the Elixir/functional way. What I want is to see different ways of doing this, ways of improving and elixifying a bit.
The test sample:

  1. c:\program files\mydir\mydir2\my&@Doc.doc
  2. c:\program files\mydir\mydir2\myD$oc2.doc\
  3. c:\\program files\\mydir\\mydir2\\myD;'oc2.doc
  4. c:\\program files\\mydir\mydir2\\my[Doc2.doc\\
  5. /home/python/projects/files.py
  6. /home/python/projects/files.py/
  7. //home//python//projects//files.py
  8. //home//python//projects//files.py//
  9. c:\program files\mydir\mydir2\my!D#oc.doc
  10. c:\program files\mydir\mydir2\myDoc2.doc\
  11. c:\\program files\\mydir\\mydir2\\my';Doc2.doc
  12. c:\\program files\\mydir\mydir2\\myD&$%oc2.doc\\
  13. /home/python/projects/f_)*iles.py
  14. /home/python/projects/files.py/
  15. //home//python//projects//fi=-les.py
  16. //home//python//projects//fil !%es.py//
  17. /home/python/projects/f_)* iles.py
  18. /home/python/projects/fi les.py/
  19. //home//python//projects//fii___kiii=- les.py
  20. //home//python//projects//ff###f!%#illfffl! %es.py//

The code:

  1. defmodule Paths do
  2. def read_file(filename) do
  3. File.stream!(filename)
  4. |> Enum.map( &(String.replace(&1,"\\","/")) )
  5. |> Enum.map( &(String.trim(&1,"\n")) )
  6. |> Enum.map( &(String.trim(&1,"/")) )
  7. |> Enum.map( &(String.split(&1,"/")) )
  8. |> Enum.map( &(List.last(&1)) )
  9. |> Enum.map( &(String.split(&1,".")) )
  10. |> Enum.map( &(remove_special)/1 )
  11. |> Enum.map( &(print_name_and_suffix)/1 )
  12. end
  13. defp print_name_and_suffix(str) do
  14. [h|t] = str
  15. IO.puts "Name: #{h}\t suffix: #{t}\t: #{h}.#{t}"
  16. end
  17. defp remove_special(str) do
  18. [h|t] = str
  19. h = String.replace(h, ~r/[\W]/, "_")
  20. h = String.replace(h, ~r/_+/, "_")
  21. [h]++t
  22. end
  23. end
  24. Paths.read_file("test.txt")

Any insights much appreciated.
EDIT: I refactored the code a little. Which version is more Elixir style like?

  1. defmodule Paths do
  2. def read_file(filename) do
  3. File.stream!(filename)
  4. |> Enum.map( &(format_path)/1 )
  5. |> Enum.map( &(remove_special)/1 )
  6. |> Enum.map( &(print_name_and_suffix)/1 )
  7. end
  8. defp format_path(path) do
  9. path
  10. |> String.replace("\\","/")
  11. |> String.trim("\n")
  12. |> String.trim("/")
  13. |> String.trim("\\")
  14. end
  15. defp print_name_and_suffix(str) do
  16. [h|t] = str
  17. IO.puts "Name: #{h}\t suffix: #{t}\t: #{h}#{t}"
  18. end
  19. defp remove_special(str) do
  20. ext = Path.extname(str)
  21. filename = Path.basename(str)
  22. |> String.trim(ext)
  23. |> String.replace(~r/[\W]/, "_")
  24. |> String.replace( ~r/_+/, "_")
  25. [filename]++ext
  26. end
  27. end
  28. Paths.read_file("test.txt")
balp4ylt

balp4ylt1#

I would point to the generic problems with the code in the first place.

  • File.stream!/3 produces a Stream explicitly designed to be processed lazily (so we don't keep the whole content of the file in memory). Passing it to Enum.map/2 makes zero sense. Use Stream.map/2 to keep processing the file lazily or use Flow.map/2 to parallelize the mapping operations and use all available cores (you keep the laziness too!).
  • Formatting matters. We use 2 spaces for the indent. Use Elixir Formatter (or mix task formatter ) to format your code.
  • Decompose directly in function head wherever possible (instead of defp print_name_and_suffix(str), do: [h|t] = str ... do directly defp print_name_and_suffix([h|t]) .
  • Minimize the number of calls to replacement in strings since each requires the separate string pass to substitute characters.
  • Use different function clauses with pattern matching to simplify life.
  • Try to use binary pattern matching and recursion wherever applicable.

That said, the most [opinionated] Elixirish approach would be:

  1. defmodule Paths do
  2. def read_file(filename) do
  3. filename
  4. |> File.stream!()
  5. # Uncomment next line and replace all Steam calls with Flow
  6. # to embrace multi core parallelism
  7. # |> Flow.from_enumerable()
  8. |> Stream.map(&right_trim/1)
  9. |> Stream.map(&strip_path/1)
  10. |> Stream.map(&split_and_cleanup/1)
  11. |> Stream.map(&name_and_suffix/1)
  12. |> Enum.to_list()
  13. end
  14. defp right_trim(str), do: Regex.replace(~r/\W+\z/, str, "")
  15. defp strip_path(input, acc \\ "")
  16. defp strip_path("", acc), do: acc
  17. defp strip_path(<<"\\", rest :: binary>>, acc), do: strip_path(rest, "")
  18. defp strip_path(<<"/", rest :: binary>>, acc), do: strip_path(rest, "")
  19. defp strip_path(<<chr :: binary-size(1), rest :: binary>>, acc),
  20. do: strip_path(rest, acc <> chr)
  21. defp split_and_cleanup(str) do
  22. str
  23. |> String.split(".")
  24. |> Enum.map(&String.replace(&1, ~r/[_\W]+/, "_"))
  25. end
  26. defp name_and_suffix([file, ext]) do
  27. IO.puts "Name: #{file}\t suffix: .#{ext}\t: #{file}.#{ext}"
  28. end
  29. end
  30. Paths.read_file("/tmp/test.txt")

Please pay attention mostly to strip_path/2 function, it does recursively parse the input string, returning the part after the last slash, forward or backward. I could use String.split/2 or any internal function from String module but I explicitly had it implemented with a most functional approach.

展开查看全部

相关问题