erlang Elixir -如何改进代码和风格

q3qa4bjr  于 2022-12-08  发布在  Erlang
关注(0)|答案(1)|浏览(138)

The goal was a script which reads the file, line by line, containing file paths (Windows and Linux). It strips the path leaving only a file name with an extension. Then replaces any special characters in the file name with an "_" - underscore and at the end reduces the consecutive underscores with just one. Like st__a___ck becomes st_a_ck. I got it working but I believe there may be a better/nicer looking way of doing this. I'm a very beginner and still learning to think the Elixir/functional way. What I want is to see different ways of doing this, ways of improving and elixifying a bit.
The test sample:

c:\program files\mydir\mydir2\my&@Doc.doc 
c:\program files\mydir\mydir2\myD$oc2.doc\ 
c:\\program files\\mydir\\mydir2\\myD;'oc2.doc
c:\\program files\\mydir\mydir2\\my[Doc2.doc\\
/home/python/projects/files.py
/home/python/projects/files.py/
//home//python//projects//files.py
//home//python//projects//files.py//
c:\program files\mydir\mydir2\my!D#oc.doc 
c:\program files\mydir\mydir2\myDoc2.doc\ 
c:\\program files\\mydir\\mydir2\\my';Doc2.doc
c:\\program files\\mydir\mydir2\\myD&$%oc2.doc\\
/home/python/projects/f_)*iles.py
/home/python/projects/files.py/
//home//python//projects//fi=-les.py
//home//python//projects//fil !%es.py//
/home/python/projects/f_)* iles.py
/home/python/projects/fi les.py/
//home//python//projects//fii___kiii=- les.py 
//home//python//projects//ff###f!%#illfffl! %es.py//

The code:

defmodule Paths do

     def read_file(filename) do
         File.stream!(filename)
         |> Enum.map( &(String.replace(&1,"\\","/")) )
         |> Enum.map( &(String.trim(&1,"\n")) )
         |> Enum.map( &(String.trim(&1,"/")) )
         |> Enum.map( &(String.split(&1,"/")) )
         |> Enum.map( &(List.last(&1)) )
         |> Enum.map( &(String.split(&1,".")) )
         |> Enum.map( &(remove_special)/1 )
         |> Enum.map( &(print_name_and_suffix)/1 )

     end
     defp print_name_and_suffix(str) do
         [h|t] = str
         IO.puts "Name: #{h}\t suffix: #{t}\t: #{h}.#{t}"
     end
     defp remove_special(str) do
         [h|t] = str
         h = String.replace(h, ~r/[\W]/, "_")
         h = String.replace(h, ~r/_+/, "_")
         [h]++t
     end

end

Paths.read_file("test.txt")

Any insights much appreciated.
EDIT: I refactored the code a little. Which version is more Elixir style like?

defmodule Paths do

     def read_file(filename) do
         File.stream!(filename)
         |> Enum.map( &(format_path)/1 )
         |> Enum.map( &(remove_special)/1 )
         |> Enum.map( &(print_name_and_suffix)/1 )

     end

     defp format_path(path) do
             path
             |> String.replace("\\","/")
             |> String.trim("\n")
             |> String.trim("/")
             |> String.trim("\\")
     end

     defp print_name_and_suffix(str) do
         [h|t] = str
         IO.puts "Name: #{h}\t suffix: #{t}\t: #{h}#{t}"
     end

     defp remove_special(str) do
         ext = Path.extname(str)
         filename = Path.basename(str)
             |> String.trim(ext)
             |> String.replace(~r/[\W]/, "_")
             |> String.replace( ~r/_+/, "_")

         [filename]++ext
     end

end

Paths.read_file("test.txt")
balp4ylt

balp4ylt1#

I would point to the generic problems with the code in the first place.

  • File.stream!/3 produces a Stream explicitly designed to be processed lazily (so we don't keep the whole content of the file in memory). Passing it to Enum.map/2 makes zero sense. Use Stream.map/2 to keep processing the file lazily or use Flow.map/2 to parallelize the mapping operations and use all available cores (you keep the laziness too!).
  • Formatting matters. We use 2 spaces for the indent. Use Elixir Formatter (or mix task formatter ) to format your code.
  • Decompose directly in function head wherever possible (instead of defp print_name_and_suffix(str), do: [h|t] = str ... do directly defp print_name_and_suffix([h|t]) .
  • Minimize the number of calls to replacement in strings since each requires the separate string pass to substitute characters.
  • Use different function clauses with pattern matching to simplify life.
  • Try to use binary pattern matching and recursion wherever applicable.

That said, the most [opinionated] Elixirish approach would be:

defmodule Paths do
  def read_file(filename) do
    filename
    |> File.stream!()
    # Uncomment next line and replace all Steam calls with Flow 
    # to embrace multi core parallelism
    # |> Flow.from_enumerable()  
    |> Stream.map(&right_trim/1)
    |> Stream.map(&strip_path/1)
    |> Stream.map(&split_and_cleanup/1)
    |> Stream.map(&name_and_suffix/1)
    |> Enum.to_list()
  end

  defp right_trim(str), do: Regex.replace(~r/\W+\z/, str, "")

  defp strip_path(input, acc \\ "")
  defp strip_path("", acc), do: acc
  defp strip_path(<<"\\", rest :: binary>>, acc), do: strip_path(rest, "")
  defp strip_path(<<"/", rest :: binary>>, acc), do: strip_path(rest, "")
  defp strip_path(<<chr :: binary-size(1), rest :: binary>>, acc),
    do: strip_path(rest, acc <> chr)

  defp split_and_cleanup(str) do
    str
    |> String.split(".")
    |> Enum.map(&String.replace(&1, ~r/[_\W]+/, "_"))
  end

  defp name_and_suffix([file, ext]) do
    IO.puts "Name: #{file}\t suffix: .#{ext}\t: #{file}.#{ext}"
  end
end

Paths.read_file("/tmp/test.txt")

Please pay attention mostly to strip_path/2 function, it does recursively parse the input string, returning the part after the last slash, forward or backward. I could use String.split/2 or any internal function from String module but I explicitly had it implemented with a most functional approach.

相关问题