Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

With Pandas, typically the important data types are the series data type for columns in a DataFrame or what not. Which Python will infer when you read in the data, or manually construct. These are data types provided by the Numpy library. And you can manually convert columns as needed with Pandas functions.

If you mean the object itself, then Python is strongly typed and will throw an error if you try to run the wrong method or access the wrong property on an object (although it will let you assign new ones, and it will let you add new columns to DataFrames of any data type, because that's very useful).

If you mean in the method or function calls, then often a parameter will allow different types because you might want to pass in the column name, a list of columns, a function or something else that makes sense for that particular parameter depending on what the method/function does. Pandas has a lot of overloaded parameters because it's a big library that covers a lot of use cases for tabular data.



Sorry, I meant static types, like mypy


Right, my point was that static typing doesn't seem like a good fit for a library like Pandas, which needs to be flexible enough to handle a wide range of use cases for tabular data where you're cleaning, reshaping, etc. At least not without a total rewrite.

With Flow in JS, how would a JSON library for similar purposes work where you don't know what kind of data it will be and you will be doing a lot of transformations on it?


I think designing for static types changes the design of a library’s api. So, in this case, Pandas would likely have to evolve (handwave) in some fashion, if it were to support Mypy types.

One way Mypy could help do this is by implementing a feature like Type Providers https://docs.microsoft.com/en-us/dotnet/fsharp/tutorials/typ...


If you _really_ don't know what type of data you have, you type it as `mixed`. But if you _do_ know the shape of your data, you write your own types, and tell the library about those types (eg; with generics). Think like `my_df: DataFrame<MyDataShape> = pandas.DataFrame(my_data)` where you manually define the shape of `MyDataShape`.

Scala is a language that is often used for data processing, and is statically typed.

Depending on the situation, codegen can also be useful in these situations, eg; https://github.com/typeorm/typeorm

EDIT: in any case, this certainly answers my question :-) sounds like they're not there yet, and perhaps not even moving in that direction yet.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: